graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2025-06-26 23:19:58 +00:00

Author	SHA1	Message	Date
Nathan Evans	27c6de846f	Update docs for 2.0+ (#1984 ) * Update docs * Fix prompt links	2025-06-23 13:49:47 -07:00
Nathan Evans	1df89727c3	Pipeline registration (#1940 ) * Move covariate run conditional * All pipeline registration * Fix method name construction * Rename context storage -> output_storage * Rename OutputConfig as generic StorageConfig * Reuse Storage model under InputConfig * Move input storage creation out of document loading * Move document loading into workflows * Semver * Fix smoke test config for new workflows * Fix unit tests --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-06-12 16:14:39 -07:00
Nathan Evans	17e431cf42	Update typer (#1958 )	2025-06-02 14:20:21 -07:00
Alonso Guevara	4a42ac81af	Release v2.3.0 (#1951 ) v2.3.0	2025-05-23 15:19:29 -06:00
Alonso Guevara	f1e2041f07	Fix/drift search reduce (#1948 ) * Fix Reduce Response for non streaming calls * Semver	2025-05-23 08:07:09 -06:00
Alonso Guevara	7fba9522d4	Task/raw model answer (#1947 ) * Add full_response to llm provider output * Semver * Small leftover cleanup * Add pyi to suppress Pyright errors. full_content is optional * Format * Add missing stubs	2025-05-22 08:22:44 -06:00
Alonso Guevara	fb4fe72a73	Fix/global reduce prompt (#1942 ) * Add missing string formatter * Semver	2025-05-20 17:00:32 -06:00
Copilot	f5a472ab14	Upgrade pyarrow dependency to >=17.0.0 to fix CVE-2024-52338 (#1939 )	2025-05-20 18:34:28 -04:00
Alonso Guevara	24018c6155	Task/remove dynamic retries (#1941 ) * Remove max retries. Update Typer args * Format * Semver * Fix typo * Ruff and Typos * Format	2025-05-20 11:48:27 -06:00
Nathan Evans	36948b8d2e	Various minor updates (#1932 ) * Add text unit ids to Community model * Add graph utilities * Turn off LCC for clustering by default * Simplify embeddings config/flow * Semver	2025-05-16 14:48:53 -07:00
Alonso Guevara	ee1b2db4a0	Update to latest fnllm (#1930 ) * Update to latest fnllm * Semver + smoke tests * Add --method to smoke tests indexing * format... * Adjust embeddings limiter	2025-05-15 14:57:01 -06:00
Alonso Guevara	56a865bff0	Release v2.2.1 (#1910 ) v2.2.1	2025-04-30 18:15:01 -06:00
Alonso Guevara	8fb95a6209	Fix/community report tuning (#1909 ) * Fix community report prompt tuning * Semver * Format ...	2025-04-30 17:44:31 -06:00
Andres Morales	8c81cc1563	Update Index as workflows (#1908 ) * Incremental index as workflow * Update function docs * fix state management * Remove update workflows when specifying workflows in the config * Fix ruff errors * Add semver * Remove callbacks param	2025-04-30 16:25:36 -06:00
Nathan Evans	832abf1e0c	Fix graph creation (#1905 ) * Add edge weight to all graph creation * Semver	2025-04-29 18:18:49 -07:00
Nathan Evans	25bbae8642	Docs: Add models page (#1842 ) * Add models page * Update config docs for new params * Spelling * Add comment on CoT with o-series * Add notes about managed identity * Update the viz guide * Spruce up the getting started wording * Capitalization * Add BYOG page * More BYOG edits * Update dictionary * Change example model name	2025-04-28 17:36:08 -07:00
Alonso Guevara	c8621477ed	Release/v2.2.0 (#1897 ) * Release v2.2.0 * Missing patch v2.2.0	2025-04-25 18:19:29 -06:00
Nathan Evans	fbf11f3a7b	Optional embeddings (#1890 ) * Make all tables optional for embeddings * Semver --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-04-25 16:20:56 -07:00
Nathan Evans	56e0fad218	NLP graph parity (#1888 ) * Update stopwords config * Minor edits * Update PMI * Format * Perf improvements * Semver * Remove edge collection apply * Remove source/target apply * Add edge weight to graph snapshot * Revert breaking optimizations * Add perf fixes back in * Format/types * Update defaults * Fix source/target ordering * Fix test	2025-04-25 17:09:06 -06:00
Nathan Evans	25b605b6cd	Snapshot full graph (#1889 ) * Snapshot un-merged entities and relationships * Semver * Fix raw df modification	2025-04-25 14:14:48 -07:00
Nathan Evans	e2a448170a	Fix/minor query fixes (#1893 ) * fixed token count for drift search * basic search fixes * updated basic search prompt * fixed text splitting logic * Lint/format * Semver * Fix text splitting tests --------- Co-authored-by: ha2trinh <trinhha@microsoft.com>	2025-04-25 14:12:18 -07:00
Nathan Evans	ad4cdd685f	Support OpenAI reasoning models (#1841 ) * Update tiktoken * Add max_completion_tokens to model config * Update/remove outdated comments * Remove max_tokens from report generation * Remove max_tokens from entity summarization * Remove logit_bias from graph extraction * Remove logit_bias from claim extraction * Swap params if reasoning model * Add reasoning model support to basic search * Add reasoning model support for local and global search * Support reasoning models with dynamic community selection * Support reasoning models in DRIFT search * Remove unused num_threads entry * Semver * Update openai * Add reasoning_effort param	2025-04-22 14:15:26 -07:00
Dayenne Souza	74ad1d4a0c	Update .vsts-ci.yml (#1874 )	2025-04-10 10:31:03 -06:00
Dayenne Souza	89381296c3	fix yaml path in unified-search-app (#1873 )	2025-04-10 13:02:00 -03:00
Dayenne Souza	66aab4267e	add vsts deploy file for unified search app (#1869 ) * add vsts deploy file for un ified search app * fix file name * remove unused tasks * remove unused file	2025-04-08 17:08:02 -03:00
gaudyb	0e1a6e3770	Unified search added to graphrag (#1862 ) * unified search app added to graphrag repository * ignore print statements * update words for unified-search * fix lint errors * fix lint error * fix module name --------- Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>	2025-04-07 11:59:02 -06:00
KennyZhang1	61769dd47e	Vector Store Integration Tests (#1856 ) * Add vector store id reference to embeddings config. * generated initial vector store pytests * cleaned up cosmosdb vector store test * fixed class name typo and debugged cosmosdb vector store test * reset emulator connection string * remove unneccessary comments * removed extra comments from azure ai search test * ruff * semversioner * fix cicd issues * bypass diskANN policy for test env * handle floating point inprecisions --------- Co-authored-by: Derek Worthen <worthend.derek@gmail.com>	2025-04-01 11:05:04 -04:00
Gabriel Nieves-Ponce	ffd8db7104	Gnievesponce prompt tune embedd chunking (#1826 ) * Added support for embeddings chunking as defined by the config. * ran semvisor -t patch * Eliminated redunant code by using the embed_text strategy directly * Added fix to support brakets within the corpus text; For example, inline LaTeX within a markdown file --------- Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>	2025-03-31 12:38:01 -04:00
Alonso Guevara	b7b2b562ce	fnllm version fix (#1835 ) * Fix fnllm version * Semver	2025-03-21 22:13:56 -07:00
Nathan Evans	3b1e70c06b	Update config docs (2.1.0) (#1818 ) * Align docs with config * Semver * Spelling * Format * Spelling	2025-03-18 12:39:30 -07:00
Nathan Evans	813b4de99f	Fix API key reference for gh-pages (#1821 )	2025-03-18 11:10:11 -07:00
Nathan Evans	ddc6541ab6	Add docs page about input formats (#1784 ) * Add docs page about input formats * Add json example * Spelling	2025-03-11 17:37:46 -07:00
Nathan Evans	321d479ab6	Update notebooks for 2.0 (#1785 ) * Update API overview * Fix global search example * Fix local search example * Fix global dynamic example * Fix drift example * Update multi-index example * Semver	2025-03-11 17:23:49 -07:00
Alonso Guevara	0d363e6957	Release v2.1.0 (#1800 ) v2.1.0	2025-03-11 18:16:08 -06:00
Alonso Guevara	53950f8442	Fix/model provider key injection check (#1799 ) * Check available models for type validation * Semver * Fix ruff and pyright * Apply feedback	2025-03-11 17:48:30 -06:00
Gabriel Nieves-Ponce	e39d869bed	Added support for verbose logging and csv-metadata to the prompt tune… (#1789 ) * Added support for verbose logging and csv-metadata to the prompt tune client. * Updated community report summarization file name and prompt template * updated semversioner * ran ruff linter * Ran poe format * Fix Ruff complains * Fix a new ruff complain :P * Pyright * Fix tests --------- Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-03-11 14:55:02 -06:00
Nathan Evans	66c2cfb3ce	Support JSON input files (#1777 ) * Add csv loader tests * Add test loader tests * Add json input support * Remove temp path constraint * Reuse loader cose * Semver * Set file pattern automatically based on type, if empty * Remove pattern from smoke test config * Spelling --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-03-10 14:04:07 -07:00
Nathan Evans	bcb74789f1	Next release docs (#1627 ) * Wordind updates * Update yam lconfig and add notes to "deprecated" env * Add basic search section * Update versioning docs * Minor edits for clarity * Update init command * Update init to add --force in docs * Add NLP extraction params * Move vector_store to root * Add workflows to config * Add FastGraphRAG docs * add metadata column changes * Added documentation for multi index search. * Minor fixes. * Add config and table renames * Update migration notebook and comments to specify v1 * Add frequency to entity table docs * add new chunking options for metadata * Update output docs * Minor edits and cleanup * Add model ids to search configs * Spruce up migration notebook * Lint/format multi-index notebook * SpaCy model note * Update SpaCy footnote * Updated multi_index_search.ipynb to remove ruff errors. * add spacy to dictionary --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Dayenne Souza <ddesouza@microsoft.com> Co-authored-by: dorbaker <dorbaker@microsoft.com>	2025-03-03 14:46:00 -08:00
Nathan Evans	bd06d8b4f0	Context property bag ("state") (#1774 ) * Add pipeline state property bag to run context * Move state creation out of context util * Move callbacks into PipelineRunContext * Semver * Rename state.json to context.json to avoid confusion with stats.json * Expand smoke test row count * Add util to create storage and cache	2025-02-28 09:31:48 -08:00
Nathan Evans	a15942629b	Add more verb tests (#1773 ) * Add NLP verb test * Add finalize_graph tests * Add more thorough final column assertions	2025-02-27 09:31:46 -08:00
Alonso Guevara	b4b8b81c0a	Remove spacy model from toml (#1771 ) * Remove spacy model from toml * Semver	2025-02-26 10:58:02 -06:00
Alonso Guevara	716f93dd8b	Release v2.0.0 (#1769 ) * Release v2.0.0 * snspshots... v2.0.0	2025-02-25 17:52:30 -06:00
Alonso Guevara	facf68148a	Fix summarization and relationship grouping on Inc Indexing (#1768 ) * Finx sumarization for large descriptions on incremental indexing * Semver * Ruff	2025-02-25 17:29:55 -06:00
Nathan Evans	ede6a74546	Pipeline callbacks (#1729 ) * Add pipeline_start and pipeline_end callbacks * Collapse redundant callback/logger logic * Remove redundant reporting config classes * Remove a few out-of-date type ignores * Semver --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-25 15:07:51 -08:00
Nathan Evans	e40476153d	Speed up smoke tests (#1736 ) * Move verb tests to regular CI * Clean up env vars * Update smoke runtime expectations * Rework artifact assertions * Fix plural in name * remove redundant artifact len check * Remove redundant artifact len check * Adjust graph output expectations * Update community expectations * Include all workflow output * Adjust text unit expectations * Adjust assertions per dataset * Fix test config param name * Update nan allowed for optional model fields --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-25 13:24:35 -08:00
Nathan Evans	61a309b182	Incremental model alignment (#1766 ) * Used shared schema lists for all final columns * Semver	2025-02-25 13:14:42 -06:00
Alonso Guevara	0144b3fd88	Update FNLLM (#1738 ) * Add ModelProvider to Query package. * Spellcheck + others * Semver * Fix tests * Format * Fix Pyright * Fix tests * Fix for smoke tests * Update fnllm version * Semver * Ruff	2025-02-24 20:30:45 -06:00
Nathan Evans	5dd9fc53cd	Move embeddings snapshots (#1737 ) * Move embedding snapshots to the workflow runner * Semver * Rename input tables	2025-02-24 17:38:01 -08:00
Alonso Guevara	e0d233fe10	Feat/llm provider query (#1735 ) * Add ModelProvider to Query package. * Spellcheck + others * Semver * Fix tests * Format * Fix Pyright * Fix tests * Fix for smoke tests	2025-02-24 18:35:51 -06:00
Nathan Evans	faa05b691f	Fix text unit incremental ID updates (#1734 ) * Increment text_unit ids during incremental * Semver	2025-02-24 14:58:00 -08:00

1 2 3 4 5 ...

389 Commits