217 Commits

Author SHA1 Message Date
gaudyb
493600fbab lint checks fixed 2024-10-10 16:17:21 -06:00
gaudyb
a456e01864 lint checks fixed 2024-10-10 15:39:40 -06:00
gaudyb
e859260d9d script update 2024-10-10 15:36:07 -06:00
gaudyb
f3ac603ac6 script update 2024-10-10 15:35:58 -06:00
gaudyb
5041113a21 script update 2024-10-10 15:34:47 -06:00
gaudyb
6e24503dc7 script update 2024-10-10 15:31:46 -06:00
gaudyb
5eac54ce02 update script 2024-10-10 15:29:35 -06:00
gaudyb
4d15434c7e Merge remote-tracking branch 'origin/main' into migration-scripts 2024-10-10 12:05:54 -06:00
gaudyb
375daaaf14 generic script to migrate embeddings 2024-10-10 12:05:41 -06:00
9prodhi
ce8749bd19
Fix: Add await to LLM execution for async handling (#1206)
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 17:26:28 -06:00
Sumit K Bhuttan
cd4f1fa9ba
Adding fix per comment on Issue-692 (#1255)
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 17:09:17 -06:00
Alonso Guevara
9fa6b91684
Chore/community context clean (#1262)
* Update community_context.py to check conversation_history_context's value

For the following code (line 90 - 96), conversation_history_context is concatenated with community_context, but the case where conversation_history_context is empty("") has not been considered. When conversation_history_context is empty (""), concatenation should not be performed, as it would result in community_context or each element in community_context having an extra "\n\n".

Therefore, by introducing a context_prefix to check the state of conversation_history_context, concatenation can be handled appropriately. When conversation_history_context is empty (""), the following code will use "" for concatenation. When conversation_history_context is not empty (""), the functionality will be similar to the previous code.

* Format and semver

* Code cleanup

---------

Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com>
2024-10-09 17:01:54 -06:00
JunHo Kim (김준호)
d4a0a590f4
Change config.json references to settings.json in the configuration document. (#1221)
Updated the configuration documentation to reflect the default filename for configuration file.

Default config files are `["settings.yaml", "settings.yml", "settings.json"]`

ce71bcf7fb/graphrag/config/config_file_loader.py (L15)

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 15:20:18 -06:00
JunHo Kim (김준호)
d66901e67e
Update description of GRAPHRAG_CACHE_BASE_DIR in env_vars.md (#1213)
* Update description of GRAPHRAG_CACHE_BASE_DIR in env_vars.md

Clarified that `GRAPHRAG_CACHE_BASE_DIR` refers to the base directory path for cache files rather than reporting outputs. This improves the accuracy of the documentation and helps users understand the correct usage of this environment variable.

* Update description of `GRAPHRAG_CACHE_BASE_DIR`

Simplified the description of `GRAPHRAG_CACHE_BASE_DIR` to make it clearer. Changed "base directory path" to "base path" for conciseness.

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 15:16:50 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs (#1248)
* Remove genid

* Move snapshot_rows

* Move snapshot

* Delete spread_json

* Delete unzip

* Delete zip

* Move unpack_graph

* Move compute_edge_combined_degree

* Delete create_graph

* Delete concat

* Delete text replace

* Delete text_translate

* Move text_split

* Inline aggregate override

* Move cluster_graph

* Move merge_graphs

* Semver

* Move text_chunk

* Move layout_graph and fix some __init__s

* Move extract_covariates

* Rename text_split -> split_text

* Move extract_entities

* Move summarize_descriptions

* Rename text_chunk -> chunk_text

* Move community report creation

* Remove verb-level packing operators

* Streamline some naming

* Streamline param name/order

* Move mock LLM data to tests

* Fixed missed rename

* Update some strategy refs

* Rename run_gi

* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
gaudyb
6352ca400e Merge remote-tracking branch 'origin/main' into migration-scripts 2024-10-09 08:16:45 -06:00
Nathan Evans
718d1ef441
Migrate embedding operations (#1242)
* Move text_embed to verb-less operation

* Move embed_graph to verb-less operation

* Return embeddings from embed_graph instead of modifying df

* Semver

* Use config existence instead of bool for graph embedding

* Send clustering strategy directly
2024-10-03 16:01:39 -07:00
gaudyb
7eec469ee4 text and graph embeddings extraction scripts 2024-10-02 22:07:08 -06:00
gaudyb
0acea3e737 text and graph embeddings extraction scripts 2024-10-02 22:00:54 -06:00
Nathan Evans
f5c5876dde
Reorganize flows (#1240)
* Extract base docs and entity graph

* Move extracted entities and text units

* Move communities and community reports

* Move covariates and final documents

* Move entities, nodes, relationships

* Move text_units and summarized entities

* Assert all snapshot null cases

* Remove disabled steps util

* Remove incorrect use of input "others"

* Convert text_embed_df to just return the embeddings, not update the df

* Convert snapshot functions to noops

* Semver

* Remove lingering covariates_enabled param

* Name consistency

* Syntax cleanup
2024-10-02 08:57:08 -07:00
Nathan Evans
9070ea5c3c
Collapse create base extracted entities (#1235)
* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests
2024-09-30 17:32:56 -07:00
Nathan Evans
630679f8e3
Collapse create summarized entities (#1237)
* Collapse entity summarize

* Semver
2024-09-30 17:17:44 -07:00
Nathan Evans
5220bb7ecc
Collapse create base entity graph (#1233)
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
2024-09-30 15:39:42 -07:00
Nathan Evans
00d5e77568
Collapse create final community reports (#1227)
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
2024-09-30 10:46:07 -07:00
Alonso Guevara
0d348d6070
Remove unused cols from final entities (#1226)
* Remove unused cols from final entities

* Move verbs test to integ

* Move verbs test to integ

* Move to smoke tests
2024-09-27 17:10:52 -06:00
Alonso Guevara
737a471d18
Pandas-ify Create Final Entities (#1225) 2024-09-26 15:09:40 -06:00
Nathan Evans
ce71bcf7fb
Collapse create final entities (#1220)
* Collapse create_final_entities

* Update smoke tests

* Semver

* Remove prints

* Update embedding assertions
2024-09-25 17:35:44 -07:00
Nathan Evans
3217013019
Revisit create final text units (#1216)
* Add embeddings to collapsed subflow

* Semver

* Fix smoke tests
2024-09-25 16:55:27 -07:00
Nathan Evans
73e709b686
Collapse create final covariates (#1215)
* Add covariate test

* Add detailed mock assertions

* Collapse create_final_covariates

* Delete unused doc_id field

* Semver

* Update smoke test

* Remove unused subject/object type columns
2024-09-25 16:30:22 -07:00
Alonso Guevara
0952014fa9
Fix issue 1173 - Nested json parsing (#1218) 2024-09-25 17:11:49 -06:00
Nathan Evans
14750f4d37
Collapse create final documents (#1217)
* Collapse create_final_documents

* Semver
2024-09-25 15:50:46 -07:00
Alonso Guevara
dda4edd0fd
Pandas-ify Create Base Documents (#1209) 2024-09-24 18:37:45 -06:00
Nathan Evans
f518c8b80b
Collapse relationship embeddings (#1199)
* Merge text_embed into a single relationships subflow

* Update smoke tests

* Semver

* Spelling
2024-09-24 15:03:26 -07:00
Nathan Evans
1755afbdec
Collapse create base text units (#1178)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Setup initial test and config

* Collapse create_base_text_units

* Semver

* Spelling

* Fix smoke tests

* Addres PR comments

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-23 16:55:53 -07:00
Alonso Guevara
be7d3eb189
Remove aggregate_df from final coomunities and final text units (#1179)
* Remove aggregate_df from final coomunities and final text units

* Semver

* Ruff and format

* Format

* Format

* Fix tests, ruff and checks

* Remove some leftover prints

* Removed _final_join method
2024-09-23 16:54:15 -06:00
Nathan Evans
fbc483e4e5
Collapse create base documents (#1176)
* Collapse non-attribute verbs

* Include document_column_attributes in collapse

* Remove merge_override verb

* Semver

* Clean up some df/tests
2024-09-23 13:24:06 -07:00
JunHo Kim (김준호)
ea468204bc
Fix typo in documentation for customizability (#1160)
Corrected a misspelling of 'customizability' in the env_vars.md documentation. This change ensures clarity and accuracy in the description of input data handling configurations.

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 14:52:44 -06:00
Nathan Evans
f8ab1b30dc
Collapse create_final_nodes (#1171)
* Collapse create_final_nodes

* Update smoke tests

* Typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 13:48:56 -07:00
Alonso Guevara
fb65989c05
Incremental indexing/update old outputs (#1155)
* Create entypoint for cli and api (#1067)

* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests

* Incremental indexing/file delta (#1123)

* Calculate new inputs and deleted inputs on update

* Semver

* Clear ruff checks

* Fix pyright

* Fix PyRight

* Ruff again

* Update Final Entities merging in new and existing entities from delta

* Update formatting

* Pyright

* Ruff

* Fix for pyright

* Yet Another Pyright test

* Pyright

* Format
2024-09-20 14:21:50 -06:00
Chris Trevino
1dbcc42b81
Remove redundant code from error-handling code in GlobalSearch (#1170)
* remove a redundant retry

* semver

* formatting

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-20 11:29:56 -06:00
Alonso Guevara
16b4ea5dc9
Release v0.3.6 (#1172) v0.3.6 2024-09-19 18:29:52 -06:00
dependabot[bot]
b61c4ec737
Bump JamesIves/github-pages-deploy-action from 4.6.3 to 4.6.4 (#1104)
Bumps [JamesIves/github-pages-deploy-action](https://github.com/jamesives/github-pages-deploy-action) from 4.6.3 to 4.6.4.
- [Release notes](https://github.com/jamesives/github-pages-deploy-action/releases)
- [Commits](https://github.com/jamesives/github-pages-deploy-action/compare/v4.6.3...v4.6.4)

---
updated-dependencies:
- dependency-name: JamesIves/github-pages-deploy-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 18:07:44 -06:00
Nathan Evans
ae094bb144
Collapse create final relationships (#1158)
* Collapse pre/post embedding workflows

* Semver

* Fix smoke tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 17:38:01 -06:00
dependabot[bot]
bd2c1da9a8
Bump path-to-regexp from 6.2.1 to 6.3.0 in /docsite (#1130)
Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) from 6.2.1 to 6.3.0.
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v6.2.1...v6.3.0)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-19 15:55:31 -06:00
Alonso Guevara
84fb14ce4d
Chore/dependency cleanup (#1169)
* fix dependencies with deptry

* change order in pyproject.toml

* fix

* Dependency updates and cleanup

* Future required

---------

Co-authored-by: Florian Maas <fpgmaas@gmail.com>
2024-09-19 15:08:13 -06:00
Alonso Guevara
96a2460375
Release v0.3.5 (#1166) v0.3.5 2024-09-19 11:34:49 -06:00
longyunfeigu
95409ff4bf
Remove lancedb_dir redundant assignments (#1163)
Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-19 09:25:10 -06:00
Alonso Guevara
ac234f47bd
Fix prompt tune output path on cli (#1157) 2024-09-19 09:22:17 -06:00
Derek Worthen
3b09df6e07
Migrate towards using static output directories (#1113)
* Migrate towards using static output directories

- Fixes load_config eagering resolving directories.
    Directories are only resolved when the output
    directories are local.
- Add support for `--output` and `--reporting` flags
    for index CLI. To achieve previous output structure
    `index --output run1/artifacts --reports run1/reports`.
- Use static output directories when initializing
    a new project.
- Maintains backward compatibility for those using
    timestamp outputs locally.

* fix smoke tests

* update query cli to work with static directories

* remove eager path resolution from load_config. Support CLI overrides that can be resolved.

* add docs and output logs/artifacts to same directory

* use match statement

* switch back to if statement

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-09-18 17:36:50 -06:00
Alonso Guevara
10910797d0
Fix seed init in clustering (#1156) 2024-09-18 17:22:52 -06:00