graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2025-06-26 23:19:58 +00:00

Author	SHA1	Message	Date
dependabot[bot]	97949ff014	Bump cryptography from 43.0.0 to 44.0.1 in /unified-search-app Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.0 to 44.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/43.0.0...44.0.1) --- updated-dependencies: - dependency-name: cryptography dependency-version: 44.0.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-04-07 19:04:37 +00:00
gaudyb	0e1a6e3770	Unified search added to graphrag (#1862 ) * unified search app added to graphrag repository * ignore print statements * update words for unified-search * fix lint errors * fix lint error * fix module name --------- Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>	2025-04-07 11:59:02 -06:00
KennyZhang1	61769dd47e	Vector Store Integration Tests (#1856 ) * Add vector store id reference to embeddings config. * generated initial vector store pytests * cleaned up cosmosdb vector store test * fixed class name typo and debugged cosmosdb vector store test * reset emulator connection string * remove unneccessary comments * removed extra comments from azure ai search test * ruff * semversioner * fix cicd issues * bypass diskANN policy for test env * handle floating point inprecisions --------- Co-authored-by: Derek Worthen <worthend.derek@gmail.com>	2025-04-01 11:05:04 -04:00
Gabriel Nieves-Ponce	ffd8db7104	Gnievesponce prompt tune embedd chunking (#1826 ) * Added support for embeddings chunking as defined by the config. * ran semvisor -t patch * Eliminated redunant code by using the embed_text strategy directly * Added fix to support brakets within the corpus text; For example, inline LaTeX within a markdown file --------- Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>	2025-03-31 12:38:01 -04:00
Alonso Guevara	b7b2b562ce	fnllm version fix (#1835 ) * Fix fnllm version * Semver	2025-03-21 22:13:56 -07:00
Nathan Evans	3b1e70c06b	Update config docs (2.1.0) (#1818 ) * Align docs with config * Semver * Spelling * Format * Spelling	2025-03-18 12:39:30 -07:00
Nathan Evans	813b4de99f	Fix API key reference for gh-pages (#1821 )	2025-03-18 11:10:11 -07:00
Nathan Evans	ddc6541ab6	Add docs page about input formats (#1784 ) * Add docs page about input formats * Add json example * Spelling	2025-03-11 17:37:46 -07:00
Nathan Evans	321d479ab6	Update notebooks for 2.0 (#1785 ) * Update API overview * Fix global search example * Fix local search example * Fix global dynamic example * Fix drift example * Update multi-index example * Semver	2025-03-11 17:23:49 -07:00
Alonso Guevara	0d363e6957	Release v2.1.0 (#1800 ) v2.1.0	2025-03-11 18:16:08 -06:00
Alonso Guevara	53950f8442	Fix/model provider key injection check (#1799 ) * Check available models for type validation * Semver * Fix ruff and pyright * Apply feedback	2025-03-11 17:48:30 -06:00
Gabriel Nieves-Ponce	e39d869bed	Added support for verbose logging and csv-metadata to the prompt tune… (#1789 ) * Added support for verbose logging and csv-metadata to the prompt tune client. * Updated community report summarization file name and prompt template * updated semversioner * ran ruff linter * Ran poe format * Fix Ruff complains * Fix a new ruff complain :P * Pyright * Fix tests --------- Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-03-11 14:55:02 -06:00
Nathan Evans	66c2cfb3ce	Support JSON input files (#1777 ) * Add csv loader tests * Add test loader tests * Add json input support * Remove temp path constraint * Reuse loader cose * Semver * Set file pattern automatically based on type, if empty * Remove pattern from smoke test config * Spelling --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-03-10 14:04:07 -07:00
Nathan Evans	bcb74789f1	Next release docs (#1627 ) * Wordind updates * Update yam lconfig and add notes to "deprecated" env * Add basic search section * Update versioning docs * Minor edits for clarity * Update init command * Update init to add --force in docs * Add NLP extraction params * Move vector_store to root * Add workflows to config * Add FastGraphRAG docs * add metadata column changes * Added documentation for multi index search. * Minor fixes. * Add config and table renames * Update migration notebook and comments to specify v1 * Add frequency to entity table docs * add new chunking options for metadata * Update output docs * Minor edits and cleanup * Add model ids to search configs * Spruce up migration notebook * Lint/format multi-index notebook * SpaCy model note * Update SpaCy footnote * Updated multi_index_search.ipynb to remove ruff errors. * add spacy to dictionary --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Dayenne Souza <ddesouza@microsoft.com> Co-authored-by: dorbaker <dorbaker@microsoft.com>	2025-03-03 14:46:00 -08:00
Nathan Evans	bd06d8b4f0	Context property bag ("state") (#1774 ) * Add pipeline state property bag to run context * Move state creation out of context util * Move callbacks into PipelineRunContext * Semver * Rename state.json to context.json to avoid confusion with stats.json * Expand smoke test row count * Add util to create storage and cache	2025-02-28 09:31:48 -08:00
Nathan Evans	a15942629b	Add more verb tests (#1773 ) * Add NLP verb test * Add finalize_graph tests * Add more thorough final column assertions	2025-02-27 09:31:46 -08:00
Alonso Guevara	b4b8b81c0a	Remove spacy model from toml (#1771 ) * Remove spacy model from toml * Semver	2025-02-26 10:58:02 -06:00
Alonso Guevara	716f93dd8b	Release v2.0.0 (#1769 ) * Release v2.0.0 * snspshots... v2.0.0	2025-02-25 17:52:30 -06:00
Alonso Guevara	facf68148a	Fix summarization and relationship grouping on Inc Indexing (#1768 ) * Finx sumarization for large descriptions on incremental indexing * Semver * Ruff	2025-02-25 17:29:55 -06:00
Nathan Evans	ede6a74546	Pipeline callbacks (#1729 ) * Add pipeline_start and pipeline_end callbacks * Collapse redundant callback/logger logic * Remove redundant reporting config classes * Remove a few out-of-date type ignores * Semver --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-25 15:07:51 -08:00
Nathan Evans	e40476153d	Speed up smoke tests (#1736 ) * Move verb tests to regular CI * Clean up env vars * Update smoke runtime expectations * Rework artifact assertions * Fix plural in name * remove redundant artifact len check * Remove redundant artifact len check * Adjust graph output expectations * Update community expectations * Include all workflow output * Adjust text unit expectations * Adjust assertions per dataset * Fix test config param name * Update nan allowed for optional model fields --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-25 13:24:35 -08:00
Nathan Evans	61a309b182	Incremental model alignment (#1766 ) * Used shared schema lists for all final columns * Semver	2025-02-25 13:14:42 -06:00
Alonso Guevara	0144b3fd88	Update FNLLM (#1738 ) * Add ModelProvider to Query package. * Spellcheck + others * Semver * Fix tests * Format * Fix Pyright * Fix tests * Fix for smoke tests * Update fnllm version * Semver * Ruff	2025-02-24 20:30:45 -06:00
Nathan Evans	5dd9fc53cd	Move embeddings snapshots (#1737 ) * Move embedding snapshots to the workflow runner * Semver * Rename input tables	2025-02-24 17:38:01 -08:00
Alonso Guevara	e0d233fe10	Feat/llm provider query (#1735 ) * Add ModelProvider to Query package. * Spellcheck + others * Semver * Fix tests * Format * Fix Pyright * Fix tests * Fix for smoke tests	2025-02-24 18:35:51 -06:00
Nathan Evans	faa05b691f	Fix text unit incremental ID updates (#1734 ) * Increment text_unit ids during incremental * Semver	2025-02-24 14:58:00 -08:00
Nathan Evans	a932b2d342	Fix StopAsyncIteration catch (#1730 )	2025-02-21 11:46:44 -08:00
Derek Worthen	54885b8ab1	Refactor config defaults (#1723 ) * Refactor config defaults - Implement type-safe, hierarchical dataclass for config defaults instead of namespaced constants. - Allow for instantiating config directly from defaults data structure. * fix vector_store db_uri default --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-20 13:01:29 -06:00
Alonso Guevara	7bdeaee94a	Create Language Model Providers and Registry methods. Remove fnllm coupling (#1724 ) * Base structure * Add fnllm providers and Mock LLM * Remove fnllm coupling, introduce llm providers * Ruff + Tests fix * Spellcheck * Semver * Format * Default MockChat params * Fix more tests * Fix embedding smoke test * Fix embeddings smoke test * Fix MockEmbeddingLLM * Rename LLM to model. Package organization * Fix prompt tuning * Oops * Oops II	2025-02-20 08:56:20 -06:00
Nathan Evans	a42772d368	Query callbacks (#1721 ) * Add callbacks to global search * Add callbacks to local search * Add streaming callbacks in local search CLI * Add callbacks to basic search * Add callbacks to DRIFT search * Semver * Return generators directly in API * Guard callbacks	2025-02-19 13:00:07 -08:00
Nathan Evans	efcaf9636d	Tuck flow functions under their workflows (#1720 ) * Move flow functions to workflow * Remove redundant workflow_name variable * Semver	2025-02-18 15:33:36 -06:00
Alonso Guevara	7f020826be	Fix/json mode community reports (#1713 ) * Patch json mode on Community Reports * Semversioner * Wording oopsie	2025-02-14 16:51:42 -06:00
Nathan Evans	96219a2182	Register workflows (#1691 ) * Add workflow registration * Add ability to mutate config by workflows * Separate graph finalization * Separate graph pruning * Semver * Update tests * Update smoke tests * Fix iterrows on create_graph * Remove prune_graph from llm construction * Update test data * Remove prune_graph from smoke tests	2025-02-14 13:21:31 -08:00
Nathan Evans	981fd31963	Community children (#1704 ) * Add children to the community tables * Replace NaN children with empty list * Replace subcommunity logic with built-in parent/child fields * Remove restore_community_hierarchy * Add children and frequency to migration notebook * Format * Semver * Add children to reports * Update tests --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-13 17:03:51 -08:00
Nathan Evans	35b639399b	Incremental flow rework (#1696 ) * Rework update output structure * Semver * Fix unit test * Update frequency in incremental --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-13 18:22:32 -06:00
Alonso Guevara	5ef2399a6f	Chore/remove iterrows (#1708 ) * Remove most iterrow usages * Semver * Ruff * Pyright * Format	2025-02-13 17:32:54 -06:00
Josh Bradley	f14cda2b6d	Improve default llm retry logic to be more optimized (#1701 )	2025-02-13 16:56:37 -05:00
Josh Bradley	b8b949f3bb	Cleanup query api - remove code duplication (#1690 ) * consolidate query api functions and remove code duplication * refactor and remove more code duplication * Add semversioner file * fix basic search * fix drift search and update base class function names * update example notebooks	2025-02-13 16:31:08 -05:00
Nathan Evans	fe461417b5	Export NLP community reports prompt (#1697 ) * Properly export the NLP community reports prompt * Semver * Fix verb tests	2025-02-12 10:41:39 -08:00
Dayenne Souza	b94290ec2b	add option to add metadata into text chunks (#1681 ) * add new options * add metadata json into input document * remove doc change * add metadata column into text loader * prepend_metadata * run fix * fix tests and patch * fix test * add watrning for metadata tokens > config size * fix typo and run fix * fix test_integration * fix test * run check * rename and fix chunking * fix * fix * fiz test verbs * fix * fix tests * fix chunking * fix index * fix cosmos test * fix vars * fix after PR * fix	2025-02-12 09:38:03 -08:00
KennyZhang1	b9dc7b90d5	Fix/streamline workflow miq bugs (#1694 ) * Add vector store id reference to embeddings config. * added communities to links and maxvals * Consistent naming * Update entity_ids to include index_name * added consistent logging messages to miq cli * semversioner --------- Co-authored-by: Derek Worthen <worthend.derek@gmail.com> Co-authored-by: Nathan Evans <github@talkswithnumbers.com>	2025-02-11 16:13:28 -05:00
Nathan Evans	a6a78d5897	Nlp cache (#1689 ) * Add cache to build_noun_graph * Semver	2025-02-10 11:00:51 -08:00
Nathan Evans	c02ab0984a	Streamline workflows (#1674 ) * Remove create_final_nodes * Rename final entity output to "entities" * Remove duplicate code from graph extraction * Rename create_final_relationships output to "relationships" * Rename create_final_communities output to "communities" * Combine compute_communities and create_final_communities * Rename create_final_covariates output to "covariates" * Rename create_final_community_reports output to "community_reports" * Rename create_final_text_units output to "text_units" * Rename create_final_documents output to "documents" * Remove transient snapshots config * Move create_final_entities to finalize_entities operation * Move create_final_relationships flow to finalize_relationships operation * Reuse some community report functions * Collapse most of graph and text unit-based report generation * Unify schemas files * Move community reports extractor * Move NLP report prompt to prompts folder * Fix a few pandas warnings * Rename embeddings config to embed_text * Rename claim_extraction config to extract_claims * Remove nltk from standard graph extraction * Fix verb tests * Fix extract graph config naming * Fix moved file reference * Create v1-to-v2 migration notebook * Semver * Fix smoke test artifact count * Raise tpm/rpm on smoke tests * Update drift settings for smoke tests * Reuse project directory var in api notebook * Format * Format	2025-02-07 11:11:03 -08:00
KennyZhang1	83cc2daf91	Multi-index query CLI support (#1675 ) * Add vector store id reference to embeddings config. * changed structure of output config section * added cli integration for multi index global * added cli integration for multi index local * added cli integration for multi index drift and basic * finished local testing of multi-index cli * ruff fixes * partially refactored test code to align with new output section * more test changes for new output structure * semversioner * refactored to align with new multi index config proposal * locally tested new multi-index output proposal * cleaned up tests to align with new structure --------- Co-authored-by: Derek Worthen <worthend.derek@gmail.com>	2025-02-07 12:56:48 -05:00
Alonso Guevara	0805924a35	Fix/drift n depth (#1676 ) * Fix n_depth param * Semver * Change smoke tests params for drift * Reduce log printing for expected exceptions	2025-02-05 17:22:34 -06:00
JunHo Kim (김준호)	a4d35bc66f	Fix typo in DEVELOPING.md instructions (#1631 ) Corrected "this values" to "these values" for improved clarity. This ensures the documentation is more accurate and professional. Co-authored-by: Nathan Evans <github@talkswithnumbers.com>	2025-02-04 13:16:57 -08:00
JunHo Kim (김준호)	30f36316af	Fix typo in table formatting in env_vars documentation (#1632 ) Corrected a missing backtick in a note within the `GRAPHRAG_API_KEY` description. This ensures proper code formatting and improves readability in the documentation. No content was altered aside from formatting adjustments. Co-authored-by: Nathan Evans <github@talkswithnumbers.com>	2025-02-04 13:14:58 -08:00
Dayenne Souza	ad5b5120ec	remove unused columns and rename document_attribute_columns (#1672 ) * remove unused columns and change property document_attribute_columns to metadata * format file * fix 'metadata' column on output * run check * fix test on nltk * remove docs changes	2025-02-03 14:37:06 -03:00
Nathan Evans	907d271f4e	Fix recursive report generation (#1669 )	2025-01-30 11:03:25 -08:00
Nathan Evans	53b06aa2ac	Add generate_text_embeddings to FGR (#1667 )	2025-01-29 14:31:48 -08:00

1 2 3 4 5 ...

365 Commits