graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2025-07-04 15:41:17 +00:00

Author	SHA1	Message	Date
Nathan Evans	bd06d8b4f0	Context property bag ("state") (#1774 ) * Add pipeline state property bag to run context * Move state creation out of context util * Move callbacks into PipelineRunContext * Semver * Rename state.json to context.json to avoid confusion with stats.json * Expand smoke test row count * Add util to create storage and cache	2025-02-28 09:31:48 -08:00
Nathan Evans	981fd31963	Community children (#1704 ) * Add children to the community tables * Replace NaN children with empty list * Replace subcommunity logic with built-in parent/child fields * Remove restore_community_hierarchy * Add children and frequency to migration notebook * Format * Semver * Add children to reports * Update tests --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-13 17:03:51 -08:00
Dayenne Souza	b94290ec2b	add option to add metadata into text chunks (#1681 ) * add new options * add metadata json into input document * remove doc change * add metadata column into text loader * prepend_metadata * run fix * fix tests and patch * fix test * add watrning for metadata tokens > config size * fix typo and run fix * fix test_integration * fix test * run check * rename and fix chunking * fix * fix * fiz test verbs * fix * fix tests * fix chunking * fix index * fix cosmos test * fix vars * fix after PR * fix	2025-02-12 09:38:03 -08:00
Nathan Evans	c02ab0984a	Streamline workflows (#1674 ) * Remove create_final_nodes * Rename final entity output to "entities" * Remove duplicate code from graph extraction * Rename create_final_relationships output to "relationships" * Rename create_final_communities output to "communities" * Combine compute_communities and create_final_communities * Rename create_final_covariates output to "covariates" * Rename create_final_community_reports output to "community_reports" * Rename create_final_text_units output to "text_units" * Rename create_final_documents output to "documents" * Remove transient snapshots config * Move create_final_entities to finalize_entities operation * Move create_final_relationships flow to finalize_relationships operation * Reuse some community report functions * Collapse most of graph and text unit-based report generation * Unify schemas files * Move community reports extractor * Move NLP report prompt to prompts folder * Fix a few pandas warnings * Rename embeddings config to embed_text * Rename claim_extraction config to extract_claims * Remove nltk from standard graph extraction * Fix verb tests * Fix extract graph config naming * Fix moved file reference * Create v1-to-v2 migration notebook * Semver * Fix smoke test artifact count * Raise tpm/rpm on smoke tests * Update drift settings for smoke tests * Reuse project directory var in api notebook * Format * Format	2025-02-07 11:11:03 -08:00
Derek Worthen	c644338bae	Refactor config (#1593 ) * Refactor config - Add new ModelConfig to represent LLM settings - Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode - Add top level models config that is a list of available LLM ModelConfigs - Remove LLMConfig inheritance and delete LLMConfig - Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config - Remove all fallbacks and hydration logic from create_graphrag_config - This removes the automatic env variable overrides - Support env variables within config files using Templating - This requires "$" to be escaped with extra "$" so ".\\.txt$" becomes ".\\.txt$$" - Update init content to initialize new config file with the ModelConfig structure * Use dict of ModelConfig instead of list * Add model validations and unit tests * Fix ruff checks * Add semversioner change * Fix unit tests * validate root_dir in pydantic model * Rename ModelConfig to LanguageModelConfig * Rename ModelConfigMissingError to LanguageModelConfigMissingError * Add validationg for unexpected API keys * Allow skipping pydantic validation for testing/mocking purposes. * Add default lm configs to verb tests * smoke test * remove config from flows to fix llm arg mapping * Fix embedding llm arg mapping * Remove timestamp from smoke test outputs * Remove unused "subworkflows" smoke test properties * Add models to smoke test configs * Update smoke test output path * Send logs to logs folder * Fix output path * Fix csv test file pattern * Update placeholder * Format * Instantiate default model configs * Fix unit tests for config defaults * Fix migration notebook * Remove create_pipeline_config * Remove several unused config models * Remove indexing embedding and input configs * Move embeddings function to config * Remove skip_workflows * Remove skip embeddings in favor of explicit naming * fix unit test spelling mistake * self.models[model_id] is already a language model. Remove redundant casting. * update validation errors to instruct users to rerun graphrag init * instantiate LanguageModelConfigs with validation * skip validation in unit tests * update verb tests to use default model settings instead of skipping validation * test using llm settings * cleanup verb tests * remove unsafe default model config * remove the ability to skip pydantic validation * remove None union types when default values are set * move vector_store from embeddings to top level of config and delete resolve_paths * update vector store settings * fix vector store and smoke tests * fix serializing vector_store settings * fix vector_store usage * fix vector_store type * support cli overrides for loading graphrag config * rename storage to output * Add --force flag to init * Remove run_id and resume, fix Drift config assignment * Ruff --------- Co-authored-by: Nathan Evans <github@talkswithnumbers.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-01-21 17:52:06 -06:00
Nathan Evans	7ec9ef0261	Refactor callbacks (#1583 ) * Unify Workflow and Verb callbacks interfaces * Semver * Fix storage class instantiation (#1582) --------- Co-authored-by: Josh Bradley <joshbradley@microsoft.com>	2025-01-06 10:58:59 -08:00
Nathan Evans	a35cb12741	Remove datashaper strip code (#1581 ) Remove datashaper	2025-01-03 13:59:26 -08:00
Nathan Evans	a2647da473	Simplify flow config (#1554 ) * Flatten compute_communities config * Remove cluster strategy type * Flatten create_base_text_units config * Move cluster seed to config default, leave as None in functions * Remove "prechunked" logic * Remove hard-coded encoding model * Remove unused variables * Strongly type embed_config * Simplify layout_graph config * Semver * Fix integration test * Fix config unit tests: ignore new config defaults * Remove pipeline integ test	2024-12-27 16:38:36 -08:00
Alonso Guevara	1c3b0f34c3	Chore/lib updates (#1477 ) * Update dependencies and fix issues * Format * Semver * Fix Pyright * Pyright * More Pyright * Pyright	2024-12-06 14:08:24 -06:00
Nathan Evans	d17dfd01f9	Graph collapse (#1464 ) * Refactor graph creation * Semver * Spellcheck * Update integ pipeline * Fix cast * Improve pandas chaining * Cleaner apply * Use list comprehensions --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-12-05 11:57:26 -06:00
Nathan Evans	634e3ed62a	Transient entity graph (#1349 ) * Make base_entity_graph transient * Add transient snapshots * Semver * Fix unit test * Fix smoke tests	2024-11-04 17:23:29 -08:00
Nathan Evans	94f1e62e5c	Rework workflow architecture (#1311 ) * Rename pipeline_storage file * Add runtime storage option to context * Fix import * Switch to memory storage for runtime * Infra for workflow runtime storage * Migrate base_text_units to runtime storage * Fix comment * Semver * Remove whitespace * Remove subflow smoke tests and ignore transient artifacts * Remove entity graph from transient list (not yet implemented) * Increase smoke runtime allotment for create_base_entity_graph * Revert format fix * Remove noqa	2024-10-24 10:20:03 -07:00
Nathan Evans	1755afbdec	Collapse create base text units (#1178 ) * Collapse non-attribute verbs * Include document_column_attributes in collapse * Remove merge_override verb * Semver * Setup initial test and config * Collapse create_base_text_units * Semver * Spelling * Fix smoke tests * Addres PR comments --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-09-23 16:55:53 -07:00

13 Commits