graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2025-11-30 17:11:00 +00:00

Author	SHA1	Message	Date
Alonso Guevara	fb56b7aed0	Fix circular dependency on prompt tune api (#1054 )	2024-08-29 12:11:07 -06:00
guangxiangdebizi	1e8bb409f6	Update indexer_adapters.py (#895 ) Update the lines 71 and 72 before： entity_df["community"] = entity_df["community"].fillna(-1) entity_df["community"] = entity_df["community"].astype(int) after： entity_df.loc[:, "community"] = entity_df["community"].fillna(-1) entity_df.loc[:, "community"] = entity_df["community"].astype(int) Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 17:53:33 -06:00
Ikko Eltociear Ashimine	26bcdf39ed	docs: update manual_prompt_tuning.md (#963 ) paramater -> parameter Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 17:49:35 -06:00
fantom845	a3048487a1	fix for issue 515 (#925 ) * fix for issue 515 * semver impact document --------- Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 17:47:48 -06:00
Alonso Guevara	480181769c	Fix/entity extraction strategy (#1046 ) * fix strategy config in entity_extraction * update init content --------- Co-authored-by: KylinMountain <kose2livs@gmail.com>	2024-08-28 17:33:05 -06:00
dependabot[bot]	ee734e6003	Bump textual from 0.76.0 to 0.78.0 (#1038 ) Bumps [textual](https://github.com/Textualize/textual) from 0.76.0 to 0.78.0. - [Release notes](https://github.com/Textualize/textual/releases) - [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md) - [Commits](https://github.com/Textualize/textual/compare/v0.76.0...v0.78.0) --- updated-dependencies: - dependency-name: textual dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-28 16:38:40 -06:00
dependabot[bot]	2f59701836	Bump lancedb from 0.11.0 to 0.12.0 (#1024 ) Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.11.0 to 0.12.0. - [Release notes](https://github.com/lancedb/lancedb/releases) - [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md) - [Commits](https://github.com/lancedb/lancedb/compare/python-v0.11.0...python-v0.12.0) --- updated-dependencies: - dependency-name: lancedb dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-28 16:11:35 -06:00
dependabot[bot]	89d1f02551	Bump json-repair from 0.26.0 to 0.28.4 (#1044 ) Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.26.0 to 0.28.4. - [Release notes](https://github.com/mangiucugna/json_repair/releases) - [Commits](https://github.com/mangiucugna/json_repair/compare/0.26.0...v0.28.4) --- updated-dependencies: - dependency-name: json-repair dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-28 15:34:51 -06:00
dependabot[bot]	da440f749b	Bump pytest-asyncio from 0.23.8 to 0.24.0 (#1022 ) Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.8 to 0.24.0. - [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases) - [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.23.8...v0.24.0) --- updated-dependencies: - dependency-name: pytest-asyncio dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 14:41:53 -06:00
TLP	1b51827c66	Fix INIT_YAML embeddings default settings (#1039 ) Co-authored-by: Thanh Long Phan <long.phan@dida.do> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 14:18:59 -06:00
Alonso Guevara	22df2f80d0	Fix/text unit code cleanup (#1040 ) * Optimized _build_text_unit_context function for improved time and space complexity Refactored the _build_text_unit_context function to enhance its performance and efficiency. Key optimizations include: 1. Set for Text Unit IDs: Replaced list-based membership checks with a set (text_unit_ids_set) to achieve constant-time complexity for membership checks, reducing overall time complexity. 2. Direct Attribute Removal: Utilized pop with a default value (None) to directly remove attributes entity_order and num_relationships from text units, minimizing overhead and avoiding potential KeyError. 3. Default Dictionary for Entity Orders: Implemented defaultdict for managing entity orders, simplifying the ranking process and improving readability. These improvements result in a more efficient function with better performance, especially when handling large datasets or numerous selected entities. The refactoring ensures that the core functionality remains unchanged while enhancing both time and space complexity. * Format * Ruff fixes * semver --------- Co-authored-by: arjun-234 <arjun.darji@yudiz.com> Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com>	2024-08-27 16:15:16 -06:00
Konstantin Gukov	5d8e60ceb7	Add source URL to the package (#927 ) Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-27 14:41:21 -06:00
longyunfeigu	44fd35c84f	Update VectorStoreSearchResult score value range (#937 ) update VectorStoreSearchResult score comment Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-27 14:40:47 -06:00
Alonso Guevara	75735bd103	Release v0.3.2 (#1034 ) v0.3.2	2024-08-26 17:57:16 -06:00
Alonso Guevara	32c0cdfcc0	Patch "past" dependency issues (#1033 ) * Patch "past" dependency issues * Semver	2024-08-26 17:03:51 -06:00
Josh Bradley	a90d210497	Improve search type hint (#1031 ) * update get_local_search_engine and get_global_search_engine return annotation * add semversioner file * reorder imports * fix pyright errors * revert change and ignore previous pyright error --------- Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: longyunfeigu <2514553187@qq.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-26 15:31:46 -06:00
Alonso Guevara	4c2f5376a8	Add missing config parameter for prompt tuning docs (#1017 )	2024-08-26 14:38:59 -06:00
Josh Bradley	fd8e56ce6f	Update developer guide (#1029 )	2024-08-26 12:28:03 -04:00
Alonso Guevara	55e74a0c2e	Fix weight casting during graph extraction (#1016 ) * Fix weight casting during graph extraction * Format * Format	2024-08-23 20:51:59 -06:00
Alonso Guevara	e15df44f0d	Ensure entity types to be str in prompt tune (#1015 )	2024-08-23 18:35:24 -06:00
dependabot[bot]	13e17d2dac	Bump ruff from 0.5.7 to 0.6.2 (#1014 ) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.7 to 0.6.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/ruff/compare/0.5.7...0.6.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-23 18:00:11 -06:00
dependabot[bot]	b1d4ddd799	Bump micromatch from 4.0.5 to 4.0.8 in /docsite (#1013 ) Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8. - [Release notes](https://github.com/micromatch/micromatch/releases) - [Changelog](https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md) - [Commits](https://github.com/micromatch/micromatch/compare/4.0.5...4.0.8) --- updated-dependencies: - dependency-name: micromatch dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-23 17:38:26 -06:00
Alonso Guevara	cb0aae7e6b	Add graphrag_import_neo4j_cypher Notebook (#593 ) * Added graphrag_import_neo4j_cypher Notebook * changed to procedure for setting embedding property to save disk space * Reformat and cleanup * semver * Poetry lock update * Update AAIS docs * Rename contrib folder * Merge from main * Revert "Merge from main" This reverts commit a399dde97b689a5b5c62dc2e9c2290cb2503b3a4. * Fix ruff check * Add readme and fix tests * Fix community reports --------- Co-authored-by: Michael Hunger <github@jexp.de>	2024-08-23 15:18:35 -06:00
KennyZhang1	dd71135995	Change lancedb placement (#996 ) * changed placement of lancedb dir to under /artifacts * ruff checks and semversioner * added support for static paths * added support for streaming * more ruff changes * ruff format changes * removed string concat for path formation * added more ruff checks * removed os.join usage * more ruff fixes and removed unneccesary path creations * replaced cast calls with str() --------- Co-authored-by: Kenny Zhang <zhangken@microsoft.com>	2024-08-22 11:39:55 -06:00
Josh Bradley	4b9fdc0dfe	Add context data to query responses (#1003 ) * add context data to query responses * add semversioner file * ignore typechecking ruff suggestion	2024-08-22 12:07:50 -04:00
Alonso Guevara	9c6f5e090a	Release v0.3.1 (#1001 ) v0.3.1	2024-08-21 17:03:55 -06:00
Nathan Evans	f5b4d2fea5	Ci streamline (#988 ) * Remove excess vars from gh-pages build * Delete redundant javascript ci * Pull apart testing CI * Clean up integration tests build * Move storage tests to integration CI * Take py 3.10 out of smoke tests matrix * Use minimum supported python version for most tests * Re-run main CI on any test change * Add Josh and Kenny to author list * Update auto-resolve perms	2024-08-21 15:16:15 -06:00
Nathan Evans	98cabba38b	Notebook tests (#978 ) * Fix notebook test runs * Delete old issue template * Add notebook CI action * Print temp directories * Print more env * Move printing up * Use runner_temp * Try using current directory * Try TMP env * Re-write TMP * Wrong yml * Fix echo * Only export if windows * More logging * Move export * Reformat env write * Fix braces * Switch to in-memory execution * Downgrade action perms * Unused import	2024-08-20 17:19:37 -06:00
dependabot[bot]	8a9a2f7574	Bump uvloop from 0.19.0 to 0.20.0 (#969 ) Bumps [uvloop](https://github.com/MagicStack/uvloop) from 0.19.0 to 0.20.0. - [Release notes](https://github.com/MagicStack/uvloop/releases) - [Commits](https://github.com/MagicStack/uvloop/compare/v0.19.0...v0.20.0) --- updated-dependencies: - dependency-name: uvloop dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-20 16:18:45 -06:00
Derek Worthen	6b4de3d841	Index API (#953 ) * Initial Index API - Implement main API entry point: build_index - Rely on GraphRagConfig instead of PipelineConfig - This unifies the API signature with the promt_tune and query API entry points - Derive cache settings, config, and resuming from the config and other arguments to simplify/reduce arguments to build_index - Add preflight config file validations - Add semver change * fix smoke tests * fix smoke tests * Use asyncio * Add e2e artifacts in GH actions * Remove unnecessary E2E test, and add skip_validations flag to cli * Nicer imports * Reorganize API functions. * Add license headers and module docstrings * Fix ignored ruff rule --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-20 15:42:20 -06:00
dependabot[bot]	5a781dd234	Bump nltk from 3.8.1 to 3.9.1 (#966 ) * Bump nltk from 3.8.1 to 3.9.1 Bumps [nltk](https://github.com/nltk/nltk) from 3.8.1 to 3.9.1. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.8.1...3.9.1) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Download punk_tab * Semver * Add missing installs * Add missing installs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-20 14:49:39 -06:00
Josh Bradley	62546a3c14	Add streaming support for local/global search (#944 ) * Added streaming output support for global search. Introduce `--streaming` flag to enable or disable streaming mode * ran ruff format --preview * update * cleanup code and streaming api * update cli argument * remove whitespace * checkpoint - add context data to streaming api * cleanup help menu * ruff format update * add context data to streaming response * add semversioner file * rename variable for better readability * rename variable for better readability * ruff fixes * fix abstract class type annotation * add documentation for --streaming CLI flag --------- Co-authored-by: 6GOD <55304045+6ixGODD@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-20 13:44:48 -06:00
longyunfeigu	a6238c654a	Move embeddings target position (#938 ) move embeddings target position Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-20 13:02:52 -06:00
Alonso Guevara	e4daf358b9	Fix gh-pages publishing (#976 ) * Remove indexer run from gh-pages, and use a local zip to avoid running * Semver	2024-08-19 16:30:55 -06:00
Nayeon Kim	84f9bae129	Update 0-architecture.md (#961 )	2024-08-19 12:21:40 -06:00
KennyZhang1	3c0a98c2d8	Add preflight config file validations (#952 ) Co-authored-by: Kenny Zhang <zhangken@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>	2024-08-16 17:53:32 -04:00
Nathan Evans	4040f02508	Update general_issue.yml (#956 ) Copy checklist from bug/feature to general	2024-08-16 13:26:24 -07:00
Nathan Evans	bd5be7bb1a	Update issues-autoresolve.yml (#955 ) Add write permissions for actions so it can update the cache	2024-08-16 13:17:23 -07:00
Alonso Guevara	0b7c5a6ae9	Add cast check on schema validation for community reports (#932 ) * Add support for both float and int on schema validation for community report generation * Cast instead of type check * Add mising file * Add prompt with ints to smoke tests * Fix unit tests * Fix unit tests	2024-08-14 16:40:47 -06:00
dependabot[bot]	36facbd000	Bump textual from 0.74.0 to 0.76.0 (#901 ) Bumps [textual](https://github.com/Textualize/textual) from 0.74.0 to 0.76.0. - [Release notes](https://github.com/Textualize/textual/releases) - [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md) - [Commits](https://github.com/Textualize/textual/compare/v0.74.0...v0.76.0) --- updated-dependencies: - dependency-name: textual dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-14 13:06:55 -06:00
dependabot[bot]	1ec1d2f920	Bump azure-storage-blob from 12.21.0 to 12.22.0 (#900 ) Bumps [azure-storage-blob](https://github.com/Azure/azure-sdk-for-python) from 12.21.0 to 12.22.0. - [Release notes](https://github.com/Azure/azure-sdk-for-python/releases) - [Changelog](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/esrp_release.md) - [Commits](https://github.com/Azure/azure-sdk-for-python/compare/azure-storage-blob_12.21.0...azure-storage-blob_12.22.0) --- updated-dependencies: - dependency-name: azure-storage-blob dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-13 22:48:07 -06:00
dependabot[bot]	ba63eda7a4	Bump pyyaml from 6.0.1 to 6.0.2 (#898 ) Bumps [pyyaml](https://github.com/yaml/pyyaml) from 6.0.1 to 6.0.2. - [Release notes](https://github.com/yaml/pyyaml/releases) - [Changelog](https://github.com/yaml/pyyaml/blob/main/CHANGES) - [Commits](https://github.com/yaml/pyyaml/compare/6.0.1...6.0.2) --- updated-dependencies: - dependency-name: pyyaml dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-13 18:48:51 -06:00
Nathan Evans	ac504e31a0	Add stricter filtering and tests for cli data directory discovery (#910 ) * Add stricter filtering and tests for cli data directory discovery * Semver * Ignore ruff on error type * Format * Fix for windows paths * Fix for windows paths * Uncomment blob tests * Sort by timestamp name instead of modified date * Format * Add additional folder name test	2024-08-13 17:34:14 -06:00
Alonso Guevara	d68e323193	Disable fail fast on tests (#911 )	2024-08-13 12:20:14 -06:00
Alonso Guevara	f9c1bdd748	Release v0.3.0 (#912 ) v0.3.0	2024-08-12 18:14:52 -06:00
Alonso Guevara	4b9f268604	Fix/query embedding (#909 ) * fix strategy config in entity_extraction * should not post token list to the embedding model * fix embedding in local query * add sembersioner * remove strategy --------- Co-authored-by: KylinMountain <kose2livs@gmail.com>	2024-08-12 17:12:51 -06:00
benx13	3f31af80d2	typo summarize prompt (#907 ) * typo in entity_summarization prompt * typo in summarize prompt --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-12 16:03:08 -06:00
Andres Morales	5a7dbaa051	Fix sort_context max_tokens & max_tokens param in verb (#888 ) * Fix sort_context max_tokens & max_tokens param in verb * Fix sort_context for windows test * add semversioner file --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-12 15:55:31 -06:00
Josh Bradley	238f1c2adc	Implement prompt tuning API (#855 ) * initial setup commit * cleanup API and CLI interfaces * move datatype definition to types.py * code cleanup * add semversioner file * remove unused import --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-12 15:09:00 -06:00
Josh Bradley	4bcbfd10eb	Implement query api (#839 ) * initial API redesign * typo fix * update docstring * update docsring * remove artifacts caused by the merge from main * minor typo updates * add semversioner check * switch API to async function calls --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-12 13:40:10 -06:00

... 3 4 5 6 7

342 Commits