389 Commits

Author SHA1 Message Date
Nathan Evans
27c6de846f
Update docs for 2.0+ (#1984)
* Update docs

* Fix prompt links
2025-06-23 13:49:47 -07:00
Nathan Evans
1df89727c3
Pipeline registration (#1940)
* Move covariate run conditional

* All pipeline registration

* Fix method name construction

* Rename context storage -> output_storage

* Rename OutputConfig as generic StorageConfig

* Reuse Storage model under InputConfig

* Move input storage creation out of document loading

* Move document loading into workflows

* Semver

* Fix smoke test config for new workflows

* Fix unit tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-06-12 16:14:39 -07:00
Nathan Evans
17e431cf42
Update typer (#1958) 2025-06-02 14:20:21 -07:00
Alonso Guevara
4a42ac81af
Release v2.3.0 (#1951) v2.3.0 2025-05-23 15:19:29 -06:00
Alonso Guevara
f1e2041f07
Fix/drift search reduce (#1948)
* Fix Reduce Response for non streaming calls

* Semver
2025-05-23 08:07:09 -06:00
Alonso Guevara
7fba9522d4
Task/raw model answer (#1947)
* Add full_response to llm provider output

* Semver

* Small leftover cleanup

* Add pyi to suppress Pyright errors. full_content is optional

* Format

* Add missing stubs
2025-05-22 08:22:44 -06:00
Alonso Guevara
fb4fe72a73
Fix/global reduce prompt (#1942)
* Add missing string formatter

* Semver
2025-05-20 17:00:32 -06:00
Copilot
f5a472ab14
Upgrade pyarrow dependency to >=17.0.0 to fix CVE-2024-52338 (#1939) 2025-05-20 18:34:28 -04:00
Alonso Guevara
24018c6155
Task/remove dynamic retries (#1941)
* Remove max retries. Update Typer args

* Format

* Semver

* Fix typo

* Ruff and Typos

* Format
2025-05-20 11:48:27 -06:00
Nathan Evans
36948b8d2e
Various minor updates (#1932)
* Add text unit ids to Community model

* Add graph utilities

* Turn off LCC for clustering by default

* Simplify embeddings config/flow

* Semver
2025-05-16 14:48:53 -07:00
Alonso Guevara
ee1b2db4a0
Update to latest fnllm (#1930)
* Update to latest fnllm

* Semver + smoke tests

* Add --method to smoke tests indexing

* format...

* Adjust embeddings limiter
2025-05-15 14:57:01 -06:00
Alonso Guevara
56a865bff0
Release v2.2.1 (#1910) v2.2.1 2025-04-30 18:15:01 -06:00
Alonso Guevara
8fb95a6209
Fix/community report tuning (#1909)
* Fix community report prompt tuning

* Semver

* Format ...
2025-04-30 17:44:31 -06:00
Andres Morales
8c81cc1563
Update Index as workflows (#1908)
* Incremental index as workflow

* Update function docs

* fix state management

* Remove update workflows when specifying workflows in the config

* Fix ruff errors

* Add semver

* Remove callbacks param
2025-04-30 16:25:36 -06:00
Nathan Evans
832abf1e0c
Fix graph creation (#1905)
* Add edge weight to all graph creation

* Semver
2025-04-29 18:18:49 -07:00
Nathan Evans
25bbae8642
Docs: Add models page (#1842)
* Add models page

* Update config docs for new params

* Spelling

* Add comment on CoT with o-series

* Add notes about managed identity

* Update the viz guide

* Spruce up the getting started wording

* Capitalization

* Add BYOG page

* More BYOG edits

* Update dictionary

* Change example model name
2025-04-28 17:36:08 -07:00
Alonso Guevara
c8621477ed
Release/v2.2.0 (#1897)
* Release v2.2.0

* Missing patch
v2.2.0
2025-04-25 18:19:29 -06:00
Nathan Evans
fbf11f3a7b
Optional embeddings (#1890)
* Make all tables optional for embeddings

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-04-25 16:20:56 -07:00
Nathan Evans
56e0fad218
NLP graph parity (#1888)
* Update stopwords config

* Minor edits

* Update PMI

* Format

* Perf improvements

* Semver

* Remove edge collection apply

* Remove source/target apply

* Add edge weight to graph snapshot

* Revert breaking optimizations

* Add perf fixes back in

* Format/types

* Update defaults

* Fix source/target ordering

* Fix test
2025-04-25 17:09:06 -06:00
Nathan Evans
25b605b6cd
Snapshot full graph (#1889)
* Snapshot un-merged entities and relationships

* Semver

* Fix raw df modification
2025-04-25 14:14:48 -07:00
Nathan Evans
e2a448170a
Fix/minor query fixes (#1893)
* fixed token count for drift search

* basic search fixes

* updated basic search prompt

* fixed text splitting logic

* Lint/format

* Semver

* Fix text splitting tests

---------

Co-authored-by: ha2trinh <trinhha@microsoft.com>
2025-04-25 14:12:18 -07:00
Nathan Evans
ad4cdd685f
Support OpenAI reasoning models (#1841)
* Update tiktoken

* Add max_completion_tokens to model config

* Update/remove outdated comments

* Remove max_tokens from report generation

* Remove max_tokens from entity summarization

* Remove logit_bias from graph extraction

* Remove logit_bias from claim extraction

* Swap params if reasoning model

* Add reasoning model support to basic search

* Add reasoning model support for local and global search

* Support reasoning models with dynamic community selection

* Support reasoning models in DRIFT search

* Remove unused num_threads entry

* Semver

* Update openai

* Add reasoning_effort param
2025-04-22 14:15:26 -07:00
Dayenne Souza
74ad1d4a0c
Update .vsts-ci.yml (#1874) 2025-04-10 10:31:03 -06:00
Dayenne Souza
89381296c3
fix yaml path in unified-search-app (#1873) 2025-04-10 13:02:00 -03:00
Dayenne Souza
66aab4267e
add vsts deploy file for unified search app (#1869)
* add vsts deploy file for un ified search app

* fix file name

* remove unused tasks

* remove unused file
2025-04-08 17:08:02 -03:00
gaudyb
0e1a6e3770
Unified search added to graphrag (#1862)
* unified search app added to graphrag repository

* ignore print statements

* update words for unified-search

* fix lint errors

* fix lint error

* fix module name

---------

Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-04-07 11:59:02 -06:00
KennyZhang1
61769dd47e
Vector Store Integration Tests (#1856)
* Add vector store id reference to embeddings config.

* generated initial vector store pytests

* cleaned up cosmosdb vector store test

* fixed class name typo and debugged cosmosdb vector store test

* reset emulator connection string

* remove unneccessary comments

* removed extra comments from azure ai search test

* ruff

* semversioner

* fix cicd issues

* bypass diskANN policy for test env

* handle floating point inprecisions

---------

Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
2025-04-01 11:05:04 -04:00
Gabriel Nieves-Ponce
ffd8db7104
Gnievesponce prompt tune embedd chunking (#1826)
* Added support for embeddings chunking as defined by the  config.

* ran semvisor -t patch

* Eliminated redunant code by using the embed_text strategy directly

* Added fix to support brakets within the corpus text; For example, inline LaTeX within a markdown file

---------

Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>
2025-03-31 12:38:01 -04:00
Alonso Guevara
b7b2b562ce
fnllm version fix (#1835)
* Fix fnllm version

* Semver
2025-03-21 22:13:56 -07:00
Nathan Evans
3b1e70c06b
Update config docs (2.1.0) (#1818)
* Align docs with config

* Semver

* Spelling

* Format

* Spelling
2025-03-18 12:39:30 -07:00
Nathan Evans
813b4de99f
Fix API key reference for gh-pages (#1821) 2025-03-18 11:10:11 -07:00
Nathan Evans
ddc6541ab6
Add docs page about input formats (#1784)
* Add docs page about input formats

* Add json example

* Spelling
2025-03-11 17:37:46 -07:00
Nathan Evans
321d479ab6
Update notebooks for 2.0 (#1785)
* Update API overview

* Fix global search example

* Fix local search example

* Fix global dynamic example

* Fix drift example

* Update multi-index example

* Semver
2025-03-11 17:23:49 -07:00
Alonso Guevara
0d363e6957
Release v2.1.0 (#1800) v2.1.0 2025-03-11 18:16:08 -06:00
Alonso Guevara
53950f8442
Fix/model provider key injection check (#1799)
* Check available models for type validation

* Semver

* Fix ruff and pyright

* Apply feedback
2025-03-11 17:48:30 -06:00
Gabriel Nieves-Ponce
e39d869bed
Added support for verbose logging and csv-metadata to the prompt tune… (#1789)
* Added support for verbose logging and csv-metadata to the prompt tune client.

* Updated community report summarization file name and prompt template

* updated semversioner

* ran ruff linter

* Ran poe format

* Fix Ruff complains

* Fix a new ruff complain :P

* Pyright

* Fix tests

---------

Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-03-11 14:55:02 -06:00
Nathan Evans
66c2cfb3ce
Support JSON input files (#1777)
* Add csv loader tests

* Add test loader tests

* Add json input support

* Remove temp path constraint

* Reuse loader cose

* Semver

* Set file pattern automatically based on type, if empty

* Remove pattern from smoke test config

* Spelling

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-03-10 14:04:07 -07:00
Nathan Evans
bcb74789f1
Next release docs (#1627)
* Wordind updates

* Update yam lconfig and add notes to "deprecated" env

* Add basic search section

* Update versioning docs

* Minor edits for clarity

* Update init command

* Update init to add --force in docs

* Add NLP extraction params

* Move vector_store to root

* Add workflows to config

* Add FastGraphRAG docs

* add metadata column changes

* Added documentation for multi index search.

* Minor fixes.

* Add config and table renames

* Update migration notebook and comments to specify v1

* Add frequency to entity table docs

* add new chunking options for metadata

* Update output docs

* Minor edits and cleanup

* Add model ids to search configs

* Spruce up migration notebook

* Lint/format multi-index notebook

* SpaCy model note

* Update SpaCy footnote

* Updated multi_index_search.ipynb to remove ruff errors.

* add spacy to dictionary

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Dayenne Souza <ddesouza@microsoft.com>
Co-authored-by: dorbaker <dorbaker@microsoft.com>
2025-03-03 14:46:00 -08:00
Nathan Evans
bd06d8b4f0
Context property bag ("state") (#1774)
* Add pipeline state property bag to run context

* Move state creation out of context util

* Move callbacks into PipelineRunContext

* Semver

* Rename state.json to context.json to avoid confusion with stats.json

* Expand smoke test row count

* Add util to create storage and cache
2025-02-28 09:31:48 -08:00
Nathan Evans
a15942629b
Add more verb tests (#1773)
* Add NLP verb test

* Add finalize_graph tests

* Add more thorough final column assertions
2025-02-27 09:31:46 -08:00
Alonso Guevara
b4b8b81c0a
Remove spacy model from toml (#1771)
* Remove spacy model from toml

* Semver
2025-02-26 10:58:02 -06:00
Alonso Guevara
716f93dd8b
Release v2.0.0 (#1769)
* Release v2.0.0

* snspshots...
v2.0.0
2025-02-25 17:52:30 -06:00
Alonso Guevara
facf68148a
Fix summarization and relationship grouping on Inc Indexing (#1768)
* Finx sumarization for large descriptions on incremental indexing

* Semver

* Ruff
2025-02-25 17:29:55 -06:00
Nathan Evans
ede6a74546
Pipeline callbacks (#1729)
* Add pipeline_start and pipeline_end callbacks

* Collapse redundant callback/logger logic

* Remove redundant reporting config classes

* Remove a few out-of-date type ignores

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 15:07:51 -08:00
Nathan Evans
e40476153d
Speed up smoke tests (#1736)
* Move verb tests to regular CI

* Clean up env vars

* Update smoke runtime expectations

* Rework artifact assertions

* Fix plural in name

* remove redundant artifact len check

* Remove redundant artifact len check

* Adjust graph output expectations

* Update community expectations

* Include all workflow output

* Adjust text unit expectations

* Adjust assertions per dataset

* Fix test config param name

* Update nan allowed for optional model fields

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 13:24:35 -08:00
Nathan Evans
61a309b182
Incremental model alignment (#1766)
* Used shared schema lists for all final columns

* Semver
2025-02-25 13:14:42 -06:00
Alonso Guevara
0144b3fd88
Update FNLLM (#1738)
* Add ModelProvider to Query package.

* Spellcheck + others

* Semver

* Fix tests

* Format

* Fix Pyright

* Fix tests

* Fix for smoke tests

* Update fnllm version

* Semver

* Ruff
2025-02-24 20:30:45 -06:00
Nathan Evans
5dd9fc53cd
Move embeddings snapshots (#1737)
* Move embedding snapshots to the workflow runner

* Semver

* Rename input tables
2025-02-24 17:38:01 -08:00
Alonso Guevara
e0d233fe10
Feat/llm provider query (#1735)
* Add ModelProvider to Query package.

* Spellcheck + others

* Semver

* Fix tests

* Format

* Fix Pyright

* Fix tests

* Fix for smoke tests
2025-02-24 18:35:51 -06:00
Nathan Evans
faa05b691f
Fix text unit incremental ID updates (#1734)
* Increment text_unit ids during incremental

* Semver
2025-02-24 14:58:00 -08:00