403 Commits

Author SHA1 Message Date
Dayenne Souza
2f2cfa7b70
Test and unify text splitter functionality (#1547)
* add text_splitting unit test

* change folder test text splitting

* fix chunk fn

* test new function

* run formatter

* run spell check

* run semver

* remove tiktoken mocked from tests

* change progress ticker

* fix ruff check
2025-01-13 18:42:44 -03:00
Nathan Evans
0e7d22bfb0
Jan documentation updates (#1612)
* Update workflow docs

* Docs cleanup
2025-01-10 11:36:27 -08:00
Nathan Evans
63042d22f3
Limiter defaults (#1611)
* Edit rate limit defaults

* Semver
2025-01-10 10:09:12 -08:00
Alonso Guevara
e69abc7f5d
Release/v1.1.2 (#1607)
* Release v1.1.2

* Change from minor to patch
v1.1.2
2025-01-09 16:50:04 -06:00
gaudyb
37fd7a7762
fix basic search minor bug (#1606)
* fix basic search minor bug

* version change

---------

Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-01-09 14:48:01 -06:00
Alonso Guevara
2682c7102f
Release v1.1.1 (#1595) v1.1.1 2025-01-08 16:18:39 -06:00
Alonso Guevara
368acc18c1
Fix/dynamic search hierarchy maps (#1591)
* Fix community hierarchy maps creation

* Semver
2025-01-08 15:40:26 -06:00
Alonso Guevara
6eca5ec69f
Chore/increase search community prop def (#1589)
* Increase LOCAL_SEARCH_COMMUNITY_PROP

* Semver
2025-01-08 09:33:36 -06:00
Alonso Guevara
f000309829
Release v1.1.0 (#1588) v1.1.0 2025-01-07 16:16:17 -06:00
Nathan Evans
7ec9ef0261
Refactor callbacks (#1583)
* Unify Workflow and Verb callbacks interfaces

* Semver

* Fix storage class instantiation (#1582)

---------

Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-01-06 10:58:59 -08:00
Josh Bradley
cbb8f8788e
Fix storage class instantiation (#1582) 2025-01-03 17:39:44 -05:00
Nathan Evans
a35cb12741
Remove datashaper strip code (#1581)
Remove datashaper
2025-01-03 13:59:26 -08:00
dependabot[bot]
58f646a019
Bump ruff from 0.8.4 to 0.8.5 (#1579)
* Bump ruff from 0.8.4 to 0.8.5

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.4 to 0.8.5.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.8.4...0.8.5)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix ruff

* Semver

* Another ruff

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-02 17:45:52 -06:00
Derek Worthen
80367be018
Remove config input models (#1570)
* Remove config input models

* remove unit tests related to config input models

* add semversioner change

* Merge branch 'main' into config-remove-input-models
2025-01-02 15:25:10 -08:00
gaudyb
185f513ca7
Basic search implementation (#1563)
* basic search implementation

* basic streaming functionality

* format check

* check fix

* release change

* Chore/gleanings any encoding (#1569)

* Make claims and entities independent of encoding

* Semver

* Change semver release type

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-02 13:49:11 -06:00
Alonso Guevara
5f9ad0d003
Chore/gleanings any encoding (#1569)
* Make claims and entities independent of encoding

* Semver

* Change semver release type
2025-01-02 11:44:21 -06:00
Alonso Guevara
2abd6c5f5c
Update blog posts (#1571) 2024-12-30 17:16:08 -06:00
Alonso Guevara
5258bc5f4f
Fix/gleanings loop (#1564)
* Fix gleaning output parsing

* Semver
2024-12-30 12:57:33 -06:00
Nathan Evans
a2647da473
Simplify flow config (#1554)
* Flatten compute_communities config

* Remove cluster strategy type

* Flatten create_base_text_units config

* Move cluster seed to config default, leave as None in functions

* Remove "prechunked" logic

* Remove hard-coded encoding model

* Remove unused variables

* Strongly type embed_config

* Simplify layout_graph config

* Semver

* Fix integration test

* Fix config unit tests: ignore new config defaults

* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
Theo Beigbeder
e6de713f25
Fix in load_llm.py (#1508)
Fixed an issue where the "proxy" setting was passed to the PublicOpenAPI constructor instead of the  "api_base" parameter, disabling the use of on-premise OpenAI-based LLM servers

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:51:01 -06:00
joeyhacker
c450f85edd
Solved graphrag index can't use other llm problem (#1507)
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-19 13:49:47 -06:00
KennyZhang1
8368b12532
Add Cosmos DB storage/cache option (#1431)
* added cosmosdb constructor and database methods

* added rest of abstract method headers

* added cosmos db container methods

* implemented has and delete methods

* finished implementing abstract class methods

* integrated class into storage factory

* integrated cosmosdb class into cache factory

* added support for new config file fields

* replaced primary key cosmosdb initialization with connection strings

* modified cosmosdb setter to require json

* Fix non-default emitters

* Format

* Ruff

* ruff

* first successful run of cosmosdb indexing

* removed extraneous container_name setting

* require base_dir to be typed as str

* reverted merged changed from closed branch

* removed nested try statement

* readded initial non-parquet emitter fix

* added basic support for parquet emitter using internal conversions

* merged with main and resolved conflicts

* fixed more merge conflicts

* added cosmosdb functionality to query pipeline

* tested query for cosmosdb

* collapsed cosmosdb schema to use minimal containers and databases

* simplified create_database and create_container functions

* ruff fixes and semversioner

* spellcheck and ci fixes

* updated pyproject toml and lock file

* apply fixes after merge from main

* add temporary comments

* refactor cache factory

* refactored storage factory

* minor formatting

* update dictionary

* fix spellcheck typo

* fix default value

* fix pydantic model defaults

* update pydantic models

* fix init_content

* cleanup how factory passes parameters to file storage

* remove unnecessary output file type

* update pydantic model

* cleanup code

* implemented clear method

* fix merge from main

* add test stub for cosmosdb

* regenerate lock file

* modified set method to collapse parquet rows

* modified get method to collapse parquet rows

* updated has and delete methods and docstrings to adhere to new schema

* added prefix helper function

* replaced delimiter for prefixed id

* verified empty tests are passing

* fix merges from main

* add find test

* update cicd step name

* tested querying for new schema

* resolved errors from merge conflicts

* refactored set method to handle cache in new schema

* refactored get method to handle cache in new schema

* force unique ids to be written to cosmos for nodes

* found bug with has and delete methods

* modified has and delete to work with cache in new schema

* fix the merge from main

* minor typo fixes

* update lock file

* spellcheck fix

* fix init function signature

* minor formatting updates

* remove https protocol

* change localhost to 127.0.0.1 address

* update pytest to use bacj engine

* verified cache tests

* improved speed of has function

* resolved pytest error with find function

* added test for child method

* make container_name variable private as _container_name

* minor variable name fix

* cleanup cosmos pytest and make the cosmosdb storage class operations more efficient

* update cicd to use different cosmosdb emulator

* test with http protocol

* added pytest for clear()

* add longer timeout for cosmosdb emulator startup

* revert http connection back to https

* add comments to cicd code for future dev usage

* set to container and database clients to none upon deletion

* ruff changes

* add comments to cicd code

* removed unneeded None statements and ruff fixes

* more ruff fixes

* Update test_run.py

* remove unnecessary call to delete container

* ruff format updates

* Reverted test_run.py

* fix ruff formatter errors

* cleanup variable names to be more consistent

* remove extra semversioner file

* revert pydantic model changes

* revert pydantic model change

* revert pydantic model change

* re-enable inline formatting rule

* update documentation in dev guide

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:43:21 -06:00
Nathan Evans
c1c09bab80
Flow cleanup (#1510)
* Move snapshots out of flows into verbs

* Move degree compute out of extract_graph

* Move entity/relationship df merging into extract

* Move "title" to extraction source

* Move text_unit_ids agg closer to extraction

* Move data definition

* Update test data

* Semver

* Update smoke tests

* Fix empty degree field and update smoke tests and verb data

* Move extractors (#1516)

* Consolidate graph embedding and umap

* Consolidate claim extraction

* Consolidate graph extractor

* Move graph utils

* Move summarizers

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Fix syntax typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 18:07:44 -08:00
Nathan Evans
d0543d1fd6
Move extractors (#1516)
* Consolidate graph embedding and umap

* Consolidate claim extraction

* Consolidate graph extractor

* Move graph utils

* Move summarizers

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 16:21:41 -08:00
ex0ns
d59b397fd2
feat: move py.typed to root (#1529)
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 17:40:45 -06:00
Alonso Guevara
aa467f462a
Release v1.0.1 (#1534) v1.0.1 2024-12-18 17:24:43 -06:00
Alonso Guevara
cfe2082669
Fix/llm bugs empty extraction (#1533)
* Add llm singleton and check for empty extraction

* Semver

* Tests and spellcheck

* Move the singletons to a proper place

* Leftover print

* Ruff
2024-12-18 17:07:29 -06:00
Alonso Guevara
f7cd155dbc
Fix/encoding model config (#1527)
* fix: include encoding_model option when initializing LLMParameters

* chore: add semver patch description

* Fix encoding model parsing

* Fix unit tests

---------

Co-authored-by: Nico Reinartz <nico.reinartz@rwth-aachen.de>
2024-12-16 21:03:56 -06:00
Alonso Guevara
329b83cf7f
Fix on_error Callbacks (#1526) 2024-12-16 14:56:41 -06:00
Josh Bradley
983664397b
Update doc site with api overview notebook (#1509)
update doc site
2024-12-12 16:08:24 -05:00
Alonso Guevara
2d1c27d748
Release v1.0.0 (#1501) v1.0.0 2024-12-11 17:47:28 -06:00
Nathan Evans
1d68af308b
Community workflow (#1495)
* Create separate communities workflow

* Add test for new workflow

* Rename workflows

* Collapse subflows into parents

* Rename flows, reuse variables

* Semver

* Fix integration test

* Fix smoke tests

* Fix megapipeline format

* Rename missed files

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-11 15:41:16 -06:00
Alonso Guevara
de12521405
Dependency updates (#1494)
* Dependency updates

* Semver
2024-12-10 17:25:38 -06:00
Josh Bradley
823342188d
Cleanup factory methods (#1482)
* cleanup factory methods to have similar design pattern across codebase

* add semversioner file

* cleanup logging factory

* update developer guide

* add comment

* typo fix

* cleanup reporter terminology

* renmae reporter to logger

* fix comments

* update comment

* instantiate factory classes correctly and update index api callback parameter

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-10 16:11:11 -06:00
Alonso Guevara
04405803db
Add Parent to communities in data model (#1491)
* Add Parent to communities in data model

* Semver

* Pyright

* Update docs

* Use leiden cluster parent id

* Format
2024-12-10 14:38:11 -06:00
Nathan Evans
61816e076f
Migration notebook (#1492)
* Add migration notebook

* Update migration instructions

* Semver

* Rename item in relationships table

* Remove indexing vector store shim

* Remove query shims

* Remove columns from migrated data

* Format

* Add community parents
2024-12-10 14:23:26 -06:00
Alonso Guevara
1a13e0fd93
Release v0.9.0 (#1479)
* Release v0.9.0

* Spellcheck
v0.9.0
2024-12-06 14:29:55 -06:00
Alonso Guevara
1c3b0f34c3
Chore/lib updates (#1477)
* Update dependencies and fix issues

* Format

* Semver

* Fix Pyright

* Pyright

* More Pyright

* Pyright
2024-12-06 14:08:24 -06:00
volksen
b1f2ca785e
deduplicate sources in local search context (#1468)
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-06 13:05:00 -06:00
Chris Trevino
5ff2d3c76d
Remove graphrag.llm, replace with fnllm (#1315)
* add fnllm; remove llm folder

* remove llm unit tests

* update imports

* update imports

* formatting

* enable autosave

* update mockllm

* update community reports extractor

* move most llm usage to fnllm

* update type issues

* fix unit tests

* type updates

* update dictionary

* semver

* update llm construction, get integration tests working

* load from llmparameters model

* move ruff settings to ruff.toml

* add gitattributes file

* ignore ruff.toml spelling

* update .gitattributes

* update gitignore

* update config construction

* update prompt var usage

* add cache adapter

* use cache adapter in embeddings calls

* update embedding strategy

* add fnllm

* add pytest-dotenv

* fix some verb tests

* get verbtests running

* update ruff.toml for vscode

* enable ruff native server in vscode

* update artifact inspecting code

* remove local-test update

* use string.replace instead of string.format in community reprots etxractor

* bump timeout

* revert ruff.toml, vscode settings for another pr

* revert cspell config

* revert gitignore

* remove json-repair, update fnllm

* use fnllm generic type interfaces

* update load_llm to use target models

* consolidate chat parameters

* add 'extra_attributes' prop to community report response

* formatting

* update fnllm

* formatting

* formatting

* Add defaults to some llm params to avoid null on params hash

* Formatting

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 18:07:47 -06:00
Alonso Guevara
d43124e576
Refactor Create Final Community reports to simplify code (#1456)
* Optimize prep claims

* Optimize community hierarchy restore

* Partial optimization of prepare_community_reports

* More optimization code

* Fix context string generation

* Filter community -1

* Fix cache, add more optimization fixes

* Fix local search community ids

* Cleanup

* Format

* Semver

* Remove perf counter

* Unused import

* Format

* Fix edge addition to reports

* Add edge by edge context creation

* Re-org of the optimization code

* Format

* Ruff

* Some Ruff fixes

* More pyright

* More pyright

* Pyright

* Pyright

* Update tests
2024-12-05 17:13:05 -06:00
Josh Bradley
b00142260d
Update index API + a notebook that provides a general API overview (#1454)
* update index api to accept callbacks

* fix hardcoded folder name that was creating an empty folder

* add API notebook

* add semversioner file

* filename change

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 15:34:21 -06:00
KennyZhang1
10f84c91eb
Replace md5 hash (#1470)
* switched hashing function helper to sha256

* refactored references to hashing util

* semversioner

* switched from sha256 to sha512

* new semversioner

* updated tests/verbs/data folder

* generated fresh parquet files in data folder

* moved ignore flag
2024-12-05 13:24:35 -06:00
Nathan Evans
d17dfd01f9
Graph collapse (#1464)
* Refactor graph creation

* Semver

* Spellcheck

* Update integ pipeline

* Fix cast

* Improve pandas chaining

* Cleaner apply

* Use list comprehensions

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 11:57:26 -06:00
Gijs Segerink
756f5c38a7
Update search.py (#1457)
Missing query in astream_search
2024-12-04 16:52:08 -06:00
Josh Bradley
dad2176b3c
Miscellaneous code cleanup procedures (#1452) 2024-11-27 13:27:43 -05:00
Nathan Evans
0b2120ca45
Docs and notebooks update (#1451)
* Fix local question gen and example notebook

* Update global search notebook

* Add lazy blog post

* Update breaking changes doc for migration notes

* Simplify Getting Started page

* Semver

* Spellcheck

* Fix types

* Add comments on cache-free migration

* Update wording

* Spelling

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-27 09:56:48 -08:00
Yuan Chai
2b7d28944d
Fix encoding issue: Ensure non-ASCII characters are correctly represe… (#1446)
Fix encoding issue: Ensure non-ASCII characters are correctly represented in entity name key

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-26 17:47:06 -06:00
Alonso Guevara
ae796b99cb
Fix dynamic community selection in global search (#1450)
* Fix dynamic community selection in global search

* Format

* Ruff fix
2024-11-26 15:19:50 -06:00
Alonso Guevara
6d21ef2683
Release v0.5.0 (#1415) v0.5.0 2024-11-18 00:06:54 -06:00