graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2025-12-04 10:59:54 +00:00

Author	SHA1	Message	Date
Alonso Guevara	ac234f47bd	Fix prompt tune output path on cli (#1157 )	2024-09-19 09:22:17 -06:00
Derek Worthen	3b09df6e07	Migrate towards using static output directories (#1113 ) * Migrate towards using static output directories - Fixes load_config eagering resolving directories. Directories are only resolved when the output directories are local. - Add support for `--output` and `--reporting` flags for index CLI. To achieve previous output structure `index --output run1/artifacts --reports run1/reports`. - Use static output directories when initializing a new project. - Maintains backward compatibility for those using timestamp outputs locally. * fix smoke tests * update query cli to work with static directories * remove eager path resolution from load_config. Support CLI overrides that can be resolved. * add docs and output logs/artifacts to same directory * use match statement * switch back to if statement --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-09-18 17:36:50 -06:00
Alonso Guevara	10910797d0	Fix seed init in clustering (#1156 )	2024-09-18 17:22:52 -06:00
Alonso Guevara	6cee670617	Merge branch 'main' into incremental_indexing/main	2024-09-18 16:17:26 -06:00
Josh Bradley	594084f156	Improve and cleanup logging output of indexing (#1144 )	2024-09-18 14:38:13 -04:00
Nathan Evans	aa5b426f1d	Collapse final communities workflow (#1150 ) * Collapse create_final_communities * Semver * Spellcheck * Clean up filtering * Add space in title * Format * Cleanup imports and format * Spruce up the tests * Update dictionary.txt * Spellcheck --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-09-17 17:04:42 -07:00
Nathan Evans	a473265580	Collapse verbs: create_final_text_units (#1143 ) * Load default config in verb tests * Load proper workflow config * Collapse text unit pre-embedding steps * Format * Update smoke tests * Semver * Format * Merge join* subflows into create_final_text_units * Remove join_text_units_to_covariate_ids * Format * Remove join_text_units_to_entity_ids * Remove join_text_units_to_relationship_ids * Clean up merges and aggregations * Remove unnecessary cast	2024-09-17 10:32:25 -07:00
Josh Bradley	f7f96c31bb	Cleanup cli (#1127 )	2024-09-17 01:37:27 -04:00
Nathan Evans	d22c0e7836	Covariate collapse (#1142 ) * Setup basic verb test runner * Replace join_text_units_to_entity_ids with subflow * Update comments * Replace join_text_units_to_relationship_ids subflow * Roll in final select * Reuse assertion util * Small fix + format * Format/typing * Semver * Format/typing * Semver * Revert format changes * Fix smoke test subworkflow count * Edit subworkflows for another smoke test * Update test parquets for covariates * Collapse covariate join * Rework subtasks for per-flow customization * Format * Semver * Fix smoke test	2024-09-16 12:35:45 -07:00
Nathan Evans	2de302ff0d	Verb merge nre1 (#1140 ) * Setup basic verb test runner * Replace join_text_units_to_entity_ids with subflow * Update comments * Replace join_text_units_to_relationship_ids subflow * Roll in final select * Reuse assertion util * Small fix + format * Format/typing * Semver * Format/typing * Semver * Revert format changes * Fix smoke test subworkflow count * Edit subworkflows for another smoke test	2024-09-16 12:10:29 -07:00
Alonso Guevara	b440e836bd	Merge branch 'main' into incremental_indexing/main	2024-09-12 17:46:11 -06:00
Alonso Guevara	cb4f2b43a7	Fix seeded random gen on clustering step (#1132 )	2024-09-12 17:42:50 -06:00
Alonso Guevara	8c7f0dfc1b	Fix duplicates in community context builder (#1131 ) * fix: fix the bug that community context builder will cause a report to be repeated twice in local mode. * Fix duplicates in community context builder * Small tweaks on code --------- Co-authored-by: jarlor <zjl58960902@outlook.com>	2024-09-12 15:47:08 -06:00
Roberto Corno	fcfa7b1329	Update factories.py to allow the usage of the request timeout ChatOpe… (#1115 ) Update factories.py to allow the usage of the request timeout ChatOpenAI parameter allow the usage of the request timeout ChatOpenAI parameter Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-09-12 13:51:48 -06:00
JunHo Kim (김준호)	7b8f5ba51f	Correct links to datashaper verbs in comments (#1068 ) Correct links to verbs in comments Updated the links in comments to reflect new paths for 'derive' and 'aggregate' verbs. This improves documentation and ensures that references are up to date for future developers. Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-09-12 12:44:38 -06:00
Alonso Guevara	8a0bc0535f	Release v0.3.4 (#1125 )	2024-09-11 16:45:43 -06:00
Alonso Guevara	c0d535d0c2	Fix summarization including empty descriptions (#1124 ) * Fix summarization including empty descriptions * Update	2024-09-11 16:30:49 -06:00
Alonso Guevara	8f71a0224c	Incremental indexing/file delta (#1123 ) * Calculate new inputs and deleted inputs on update * Semver * Clear ruff checks * Fix pyright * Fix PyRight * Ruff again	2024-09-11 15:33:09 -06:00
Alonso Guevara	87fb93f562	Merge branch 'main' into incremental_indexing/main	2024-09-11 14:22:52 -06:00
Alonso Guevara	cdf5fc4d67	Deep copy txt units on local search to avoid race conditions (#1118 ) * Deep copy txt units on local search to avoid race conditions * Format	2024-09-11 14:12:03 -06:00
Alonso Guevara	67f4b02ecd	Merge branch 'main' into incremental_indexing/main	2024-09-10 16:04:01 -06:00
Derek Worthen	e7ee8cb8a5	release v0.3.3 (#1116 ) v0.3.3	2024-09-10 13:07:07 -07:00
Doug Orbaker	1b559726ac	Update create_pipeline_config.py (#1108 ) * Update create_pipeline_config.py Order switched to ensure that user settings at runtime take precedence. * Updated semversioner.	2024-09-10 11:35:47 -06:00
KennyZhang1	27c5468a8b	Load query from blob (#1095 ) * Moved query loading from file to helper function * added loading parquets from blob to function * resolved adlfs async error * debugging cleanup and small fixes * added connection string support * semversioner and ruff fixes * completed testing for merge with main * more ruff changes * fixed unbound vars warning * rewrote function to use storage utils * removed unused vars --------- Co-authored-by: Kenny Zhang <zhangken@microsoft.com>	2024-09-05 18:17:22 -04:00
Alonso Guevara	3295e2b861	Merge from main	2024-09-05 11:22:07 -06:00
Alonso Guevara	044516f538	Clean and organize run index code (#1090 ) * Create entypoint for cli and api (#1067) * Add cli and api entrypoints for update index * Semver * Update docs * Run tests on feature branch main * Better /main handling in tests * Clean and organize run index code * Ruff fix * Pyright fix * Format fixes * Pyright fix * Format * Fix integ tests * Fix ruff * Reorganize and clean up	2024-09-05 08:15:10 -06:00
Alonso Guevara	3399e6e3b8	Merge branch 'main' into incremental_indexing/main	2024-09-04 12:48:34 -06:00
Derek Worthen	2d45ece9b6	fix setting base_dir to full paths when not using file system. (#1096 ) * fix setting base_dir to full paths when not using file system. * add general resolve_path	2024-09-04 11:33:44 -07:00
Alonso Guevara	41ea554fda	Merge from main	2024-09-03 16:34:52 -06:00
Derek Worthen	ab29cc2a7e	Consistent config load_config (#1065 ) * Consistent config load_config - Provide a consistent way to load configuration - Resolve potential timestamp directories upfront upon config object creation - Add unit tests for resolving timestamp directories - Resolves #599 - Resolves #1049 * fix formatting issues * remove unnecessary path resolution * fix smoke tests * update prompts to use load_config * Update none checks * Update none checks * Update searching for config method signature * Update unit tests * fix formatting issues	2024-09-03 16:33:16 -06:00
Alonso Guevara	82d1c4a97b	Create entypoint for cli and api (#1067 ) * Add cli and api entrypoints for update index * Semver * Update docs * Run tests on feature branch main * Better /main handling in tests	2024-08-30 15:26:18 -06:00
Alonso Guevara	3f9800230f	Fix img width (#1061 )	2024-08-29 17:02:47 -06:00
Alonso Guevara	7ffce8d7ba	Fix img for autotune (#1060 ) * Fix img for autotune * Add line breaks to tune docs * More line breaks	2024-08-29 16:56:34 -06:00
Alonso Guevara	6fc452b954	Update bash example in docs for prompt tune (#1059 ) * Semver * Update bash command	2024-08-29 16:35:32 -06:00
Alonso Guevara	e023882033	Update Prompt Tuning docs (#1057 ) * Update Prompt Tuning docs * Semver	2024-08-29 16:00:07 -06:00
dependabot[bot]	d13aec5dca	Bump jupyterlab from 4.2.4 to 4.2.5 (#1056 ) Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 4.2.4 to 4.2.5. - [Release notes](https://github.com/jupyterlab/jupyterlab/releases) - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/lsp@4.2.5/CHANGELOG.md) - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/lsp@4.2.4...@jupyterlab/lsp@4.2.5) --- updated-dependencies: - dependency-name: jupyterlab dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-29 12:53:41 -06:00
dependabot[bot]	0b1f7db7d8	Bump notebook from 7.2.1 to 7.2.2 (#1055 ) Bumps [notebook](https://github.com/jupyter/notebook) from 7.2.1 to 7.2.2. - [Release notes](https://github.com/jupyter/notebook/releases) - [Changelog](https://github.com/jupyter/notebook/blob/@jupyter-notebook/tree@7.2.2/CHANGELOG.md) - [Commits](https://github.com/jupyter/notebook/compare/@jupyter-notebook/tree@7.2.1...@jupyter-notebook/tree@7.2.2) --- updated-dependencies: - dependency-name: notebook dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-29 12:37:12 -06:00
Alonso Guevara	fb56b7aed0	Fix circular dependency on prompt tune api (#1054 )	2024-08-29 12:11:07 -06:00
guangxiangdebizi	1e8bb409f6	Update indexer_adapters.py (#895 ) Update the lines 71 and 72 before： entity_df["community"] = entity_df["community"].fillna(-1) entity_df["community"] = entity_df["community"].astype(int) after： entity_df.loc[:, "community"] = entity_df["community"].fillna(-1) entity_df.loc[:, "community"] = entity_df["community"].astype(int) Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 17:53:33 -06:00
Ikko Eltociear Ashimine	26bcdf39ed	docs: update manual_prompt_tuning.md (#963 ) paramater -> parameter Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 17:49:35 -06:00
fantom845	a3048487a1	fix for issue 515 (#925 ) * fix for issue 515 * semver impact document --------- Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 17:47:48 -06:00
Alonso Guevara	480181769c	Fix/entity extraction strategy (#1046 ) * fix strategy config in entity_extraction * update init content --------- Co-authored-by: KylinMountain <kose2livs@gmail.com>	2024-08-28 17:33:05 -06:00
dependabot[bot]	ee734e6003	Bump textual from 0.76.0 to 0.78.0 (#1038 ) Bumps [textual](https://github.com/Textualize/textual) from 0.76.0 to 0.78.0. - [Release notes](https://github.com/Textualize/textual/releases) - [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md) - [Commits](https://github.com/Textualize/textual/compare/v0.76.0...v0.78.0) --- updated-dependencies: - dependency-name: textual dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-28 16:38:40 -06:00
dependabot[bot]	2f59701836	Bump lancedb from 0.11.0 to 0.12.0 (#1024 ) Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.11.0 to 0.12.0. - [Release notes](https://github.com/lancedb/lancedb/releases) - [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md) - [Commits](https://github.com/lancedb/lancedb/compare/python-v0.11.0...python-v0.12.0) --- updated-dependencies: - dependency-name: lancedb dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-28 16:11:35 -06:00
dependabot[bot]	89d1f02551	Bump json-repair from 0.26.0 to 0.28.4 (#1044 ) Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.26.0 to 0.28.4. - [Release notes](https://github.com/mangiucugna/json_repair/releases) - [Commits](https://github.com/mangiucugna/json_repair/compare/0.26.0...v0.28.4) --- updated-dependencies: - dependency-name: json-repair dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-28 15:34:51 -06:00
dependabot[bot]	da440f749b	Bump pytest-asyncio from 0.23.8 to 0.24.0 (#1022 ) Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.8 to 0.24.0. - [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases) - [Commits](https://github.com/pytest-dev/pytest-asyncio/compare/v0.23.8...v0.24.0) --- updated-dependencies: - dependency-name: pytest-asyncio dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 14:41:53 -06:00
TLP	1b51827c66	Fix INIT_YAML embeddings default settings (#1039 ) Co-authored-by: Thanh Long Phan <long.phan@dida.do> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-28 14:18:59 -06:00
Alonso Guevara	22df2f80d0	Fix/text unit code cleanup (#1040 ) * Optimized _build_text_unit_context function for improved time and space complexity Refactored the _build_text_unit_context function to enhance its performance and efficiency. Key optimizations include: 1. Set for Text Unit IDs: Replaced list-based membership checks with a set (text_unit_ids_set) to achieve constant-time complexity for membership checks, reducing overall time complexity. 2. Direct Attribute Removal: Utilized pop with a default value (None) to directly remove attributes entity_order and num_relationships from text units, minimizing overhead and avoiding potential KeyError. 3. Default Dictionary for Entity Orders: Implemented defaultdict for managing entity orders, simplifying the ranking process and improving readability. These improvements result in a more efficient function with better performance, especially when handling large datasets or numerous selected entities. The refactoring ensures that the core functionality remains unchanged while enhancing both time and space complexity. * Format * Ruff fixes * semver --------- Co-authored-by: arjun-234 <arjun.darji@yudiz.com> Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com>	2024-08-27 16:15:16 -06:00
Konstantin Gukov	5d8e60ceb7	Add source URL to the package (#927 ) Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-27 14:41:21 -06:00
longyunfeigu	44fd35c84f	Update VectorStoreSearchResult score value range (#937 ) update VectorStoreSearchResult score comment Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-08-27 14:40:47 -06:00

1 2 3 4 5

229 Commits