47 Commits

Author SHA1 Message Date
Alonso Guevara
8a10f4a598 Fix format from main 2024-10-03 15:42:49 -06:00
Alonso Guevara
43ec92e173 merge from main 2024-10-02 13:08:02 -06:00
Nathan Evans
f5c5876dde
Reorganize flows (#1240)
* Extract base docs and entity graph

* Move extracted entities and text units

* Move communities and community reports

* Move covariates and final documents

* Move entities, nodes, relationships

* Move text_units and summarized entities

* Assert all snapshot null cases

* Remove disabled steps util

* Remove incorrect use of input "others"

* Convert text_embed_df to just return the embeddings, not update the df

* Convert snapshot functions to noops

* Semver

* Remove lingering covariates_enabled param

* Name consistency

* Syntax cleanup
2024-10-02 08:57:08 -07:00
Nathan Evans
d501813181 Collapse create base extracted entities (#1235)
* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests
2024-10-01 15:10:02 -06:00
Nathan Evans
3103ae3435 Collapse create summarized entities (#1237)
* Collapse entity summarize

* Semver
2024-10-01 15:10:02 -06:00
Nathan Evans
f259d0c81c Collapse create base entity graph (#1233)
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
2024-10-01 15:10:02 -06:00
Nathan Evans
a44788bfad Collapse create final community reports (#1227)
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
2024-10-01 15:10:02 -06:00
Nathan Evans
9070ea5c3c
Collapse create base extracted entities (#1235)
* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests
2024-09-30 17:32:56 -07:00
Nathan Evans
630679f8e3
Collapse create summarized entities (#1237)
* Collapse entity summarize

* Semver
2024-09-30 17:17:44 -07:00
Nathan Evans
5220bb7ecc
Collapse create base entity graph (#1233)
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
2024-09-30 15:39:42 -07:00
Nathan Evans
00d5e77568
Collapse create final community reports (#1227)
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
2024-09-30 10:46:07 -07:00
Nathan Evans
ce71bcf7fb
Collapse create final entities (#1220)
* Collapse create_final_entities

* Update smoke tests

* Semver

* Remove prints

* Update embedding assertions
2024-09-25 17:35:44 -07:00
Nathan Evans
3217013019
Revisit create final text units (#1216)
* Add embeddings to collapsed subflow

* Semver

* Fix smoke tests
2024-09-25 16:55:27 -07:00
Nathan Evans
73e709b686
Collapse create final covariates (#1215)
* Add covariate test

* Add detailed mock assertions

* Collapse create_final_covariates

* Delete unused doc_id field

* Semver

* Update smoke test

* Remove unused subject/object type columns
2024-09-25 16:30:22 -07:00
Nathan Evans
14750f4d37
Collapse create final documents (#1217)
* Collapse create_final_documents

* Semver
2024-09-25 15:50:46 -07:00
Nathan Evans
f518c8b80b
Collapse relationship embeddings (#1199)
* Merge text_embed into a single relationships subflow

* Update smoke tests

* Semver

* Spelling
2024-09-24 15:03:26 -07:00
Nathan Evans
1755afbdec
Collapse create base text units (#1178)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Setup initial test and config

* Collapse create_base_text_units

* Semver

* Spelling

* Fix smoke tests

* Addres PR comments

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-23 16:55:53 -07:00
Nathan Evans
fbc483e4e5
Collapse create base documents (#1176)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Clean up some df/tests
2024-09-23 13:24:06 -07:00
Nathan Evans
f8ab1b30dc
Collapse create_final_nodes (#1171)
* Collapse create_final_nodes

* Update smoke tests

* Typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 13:48:56 -07:00
Nathan Evans
ae094bb144
Collapse create final relationships (#1158)
* Collapse pre/post embedding workflows

* Semver

* Fix smoke tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 17:38:01 -06:00
Derek Worthen
3b09df6e07
Migrate towards using static output directories (#1113)
* Migrate towards using static output directories

- Fixes load_config eagering resolving directories.
    Directories are only resolved when the output
    directories are local.
- Add support for `--output` and `--reporting` flags
    for index CLI. To achieve previous output structure
    `index --output run1/artifacts --reports run1/reports`.
- Use static output directories when initializing
    a new project.
- Maintains backward compatibility for those using
    timestamp outputs locally.

* fix smoke tests

* update query cli to work with static directories

* remove eager path resolution from load_config. Support CLI overrides that can be resolved.

* add docs and output logs/artifacts to same directory

* use match statement

* switch back to if statement

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-18 17:36:50 -06:00
Nathan Evans
aa5b426f1d
Collapse final communities workflow (#1150)
* Collapse create_final_communities

* Semver

* Spellcheck

* Clean up filtering

* Add space in title

* Format

* Cleanup imports and format

* Spruce up the tests

* Update dictionary.txt

* Spellcheck

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-17 17:04:42 -07:00
Nathan Evans
a473265580
Collapse verbs: create_final_text_units (#1143)
* Load default config in verb tests

* Load proper workflow config

* Collapse text unit pre-embedding steps

* Format

* Update smoke tests

* Semver

* Format

* Merge join* subflows into create_final_text_units

* Remove join_text_units_to_covariate_ids

* Format

* Remove join_text_units_to_entity_ids

* Remove join_text_units_to_relationship_ids

* Clean up merges and aggregations

* Remove unnecessary cast
2024-09-17 10:32:25 -07:00
Nathan Evans
d22c0e7836
Covariate collapse (#1142)
* Setup basic verb test runner

* Replace join_text_units_to_entity_ids with subflow

* Update comments

* Replace join_text_units_to_relationship_ids subflow

* Roll in final select

* Reuse assertion util

* Small fix + format

* Format/typing

* Semver

* Format/typing

* Semver

* Revert format changes

* Fix smoke test subworkflow count

* Edit subworkflows for another smoke test

* Update test parquets for covariates

* Collapse covariate join

* Rework subtasks for per-flow customization

* Format

* Semver

* Fix smoke test
2024-09-16 12:35:45 -07:00
Nathan Evans
2de302ff0d
Verb merge nre1 (#1140)
* Setup basic verb test runner

* Replace join_text_units_to_entity_ids with subflow

* Update comments

* Replace join_text_units_to_relationship_ids subflow

* Roll in final select

* Reuse assertion util

* Small fix + format

* Format/typing

* Semver

* Format/typing

* Semver

* Revert format changes

* Fix smoke test subworkflow count

* Edit subworkflows for another smoke test
2024-09-16 12:10:29 -07:00
Derek Worthen
2d45ece9b6
fix setting base_dir to full paths when not using file system. (#1096)
* fix setting base_dir to full paths when not using file system.

* add general resolve_path
2024-09-04 11:33:44 -07:00
Derek Worthen
ab29cc2a7e
Consistent config load_config (#1065)
* Consistent config load_config

- Provide a consistent way to load configuration
- Resolve potential timestamp directories upfront
    upon config object creation
- Add unit tests for resolving timestamp directories
- Resolves #599
- Resolves #1049

* fix formatting issues

* remove unnecessary path resolution

* fix smoke tests

* update prompts to use load_config

* Update none checks

* Update none checks

* Update searching for config method signature

* Update unit tests

* fix formatting issues
2024-09-03 16:33:16 -06:00
Alonso Guevara
cb0aae7e6b
Add graphrag_import_neo4j_cypher Notebook (#593)
* Added graphrag_import_neo4j_cypher Notebook

* changed to procedure for setting embedding property to save disk space

* Reformat and cleanup

* semver

* Poetry lock update

* Update AAIS docs

* Rename contrib folder

* Merge from main

* Revert "Merge from main"

This reverts commit a399dde97b689a5b5c62dc2e9c2290cb2503b3a4.

* Fix ruff check

* Add readme and fix tests

* Fix community reports

---------

Co-authored-by: Michael Hunger <github@jexp.de>
2024-08-23 15:18:35 -06:00
Nathan Evans
f5b4d2fea5
Ci streamline (#988)
* Remove excess vars from gh-pages build

* Delete redundant javascript ci

* Pull apart testing CI

* Clean up integration tests build

* Move storage tests to integration CI

* Take py 3.10 out of smoke tests matrix

* Use minimum supported python version for most tests

* Re-run main CI on any test change

* Add Josh and Kenny to author list

* Update auto-resolve perms
2024-08-21 15:16:15 -06:00
Nathan Evans
98cabba38b
Notebook tests (#978)
* Fix notebook test runs

* Delete old issue template

* Add notebook CI action

* Print temp directories

* Print more env

* Move printing up

* Use runner_temp

* Try using current directory

* Try TMP env

* Re-write TMP

* Wrong yml

* Fix echo

* Only export if windows

* More logging

* Move export

* Reformat env write

* Fix braces

* Switch to in-memory execution

* Downgrade action perms

* Unused import
2024-08-20 17:19:37 -06:00
Alonso Guevara
0b7c5a6ae9
Add cast check on schema validation for community reports (#932)
* Add support for both float and int on schema validation for community report generation

* Cast instead of type check

* Add mising file

* Add prompt with ints to smoke tests

* Fix unit tests

* Fix unit tests
2024-08-14 16:40:47 -06:00
Nathan Evans
ac504e31a0
Add stricter filtering and tests for cli data directory discovery (#910)
* Add stricter filtering and tests for cli data directory discovery

* Semver

* Ignore ruff on error type

* Format

* Fix for windows paths

* Fix for windows paths

* Uncomment blob tests

* Sort by timestamp name instead of modified date

* Format

* Add additional folder name test
2024-08-13 17:34:14 -06:00
Andres Morales
5a7dbaa051
Fix sort_context max_tokens & max_tokens param in verb (#888)
* Fix sort_context max_tokens & max_tokens param in verb

* Fix sort_context for windows test

* add semversioner file

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 15:55:31 -06:00
Alonso Guevara
7fd23fa79c
Stabilize smoke tests for query community context building (#908)
* Stabilize smoke tests for query community context building

* Fix CODEOWNERS
2024-08-12 13:17:40 -06:00
Alonso Guevara
c451aa0093
Update smoke tests (#861)
* Run smoke tests on 4o

* Shorten dulce for smoke tests

* Update secrets for consistency
2024-08-08 13:07:44 -06:00
Dayenne Souza
1e10bd342e
Re-enable smoke tests (#848)
* add smoke tests again

* add smoke tests separated action

* add patch version

* disable blob test

* blob conn again

* add file as cache type

* remove cache type enterely

* increase timeout

* remove comment

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-07 12:23:46 -06:00
Chris Trevino
56db78ae38
system -> assistant (#773)
* system -> assistant

* semver
2024-07-29 14:56:55 -07:00
Chris Trevino
9d99f323ea
Add encoding model to entity/claim extraction config sections (#740)
* Add encoding-model configuration to entity & claim extraction

* add change note

* pr updates

* test fix

* disable GH-based smoke tests
2024-07-26 15:05:08 -07:00
Chris Trevino
4c229afec8
add encoding model to text-chunking config (#743)
* add encoding model to text-chunking config

* revert groupby fix, handled in other pr

* revert environment reader update for other pr
2024-07-26 14:15:17 -07:00
Chris Trevino
f5c9c2bee0
Add History input to cache-key, cache data (#736)
* Update caching llm to use history inputs

* formatting

* linting

* update glean sections to have continuous history
2024-07-26 09:26:37 -07:00
Chris Trevino
4e6589b614
fix config reader to allow for zero gleans (#735) 2024-07-26 09:11:34 -07:00
Chris Trevino
41451675ba
Add user input to history tracking (#734)
add user input to history tracking
2024-07-26 09:11:18 -07:00
Josh Bradley
2ddee65c29
Read/write files as binary utf-8 (#639) 2024-07-24 13:28:22 -04:00
Alonso Guevara
ce462515d8
Local search llm params (#533)
* initialize config with  LocalSearchConfig and GlobalSearchConfig

* init_content LocalSearchConfig and GlobalSearchConfig

* rollback MAP_SYSTEM_PROMPT

* Small changes before merging. Notebook rollback

* Semver

---------

Co-authored-by: glide-the <2533736852@qq.com>
2024-07-15 13:01:56 -06:00
Kylin
e2572c7fab
[bug fix]Fix community_report config doesn't work in settings.yaml (#405)
* fix community_report doesn't work in settings.yaml

* add semversioner

* fix unittest about community report to community reports of env

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-08 22:48:02 -06:00
Alonso Guevara
b912081f1b
Add N parameter support (#390)
* Add N parameter support

* Fix unit tests

* Add new env vars to param testing
2024-07-08 14:04:49 -06:00
Alonso Guevara
81b81cf60b Initial Release 2024-07-01 15:25:30 -06:00