342 Commits

Author SHA1 Message Date
Kylin
baa261c8e9
[bugfix]Fix query error with --streaming (#1368)
* fix streaming output error

* add semversioner

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-06 17:49:06 -06:00
Alonso Guevara
3d79de96d1
Raise error on empty deltas for incremental indexing (#1375)
* Raise error on empty deltas for incremental indexing

* Format
2024-11-06 17:33:35 -06:00
Alonso Guevara
1661672569
Fix optional covariates check in incremental indexing (#1374)
* Fix optional covariates check in incremental indexing

* Oopsie fix
2024-11-06 17:22:11 -06:00
Josh Bradley
a8ccded83c
Fix file path issue in the viz guide (#1372)
* Fix a file paths issue in the viz guide.

* fix formatting
2024-11-06 14:42:07 -08:00
Alonso Guevara
2047c1561c
Fix styling and misalignment on drift docs (#1373) 2024-11-06 16:29:53 -06:00
Josh Bradley
0394b55086
Update CI/CD - skip running unit tests on documentation-only PRs (#1371) 2024-11-06 14:19:21 -05:00
Josh Bradley
9762f33c1a
Add visualization guide (#1340) 2024-11-06 14:06:50 -05:00
Alonso Guevara
a6d9b0ce3d
Release v0.4.0 (#1361)
* Release v0.4.0

* Missing change track
v0.4.0
2024-11-05 18:44:07 -06:00
Alonso Guevara
635c21109f
Fix Community ID loading for DRIFT search over existing indexes (#1360) 2024-11-05 18:21:36 -06:00
Alonso Guevara
80c0c7bdd1
Update Incremental Indexing to new embeddings workflow (#1359) 2024-11-05 16:54:02 -06:00
Alonso Guevara
83bd5cefe5
Fix content embedding container name (#1358) 2024-11-05 15:56:32 -06:00
Alonso Guevara
1557ce34f9
Fix init defaults for vector store and img in drift docs (#1357)
* Fix init defaults for vector store and img in drift docs

* Adde more doc

* Spellcheck

* Remove example
2024-11-05 14:14:17 -06:00
Alonso Guevara
d9f985ae52
Drift Search CLI, API, Docs and Example Notebook (#1348)
* Drift CLI and backwards compat

* Adding DRIFT Cli, Docs and example notebook

* Update tests and fix ruff

* Format

* Small cleanup

* Fix smoke tests

* Update notebook

* Oopsie fix

* Delete duplicate img
2024-11-05 12:05:19 -06:00
Gabriel Nieves-Ponce
68dfceef21
Updated the variable names within the for-loop to differentiate betwe… (#1356)
* Updated the variable names within the for-loop to differentiate between them and the original title variable used in the dataframe. This avoids corrupting the original column-name defined in the title variable.

* Semver and formart

---------

Co-authored-by: Gabriel Nieves-Ponce <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-05 11:45:29 -06:00
Nathan Evans
634e3ed62a
Transient entity graph (#1349)
* Make base_entity_graph transient

* Add transient snapshots

* Semver

* Fix unit test

* Fix smoke tests
2024-11-04 17:23:29 -08:00
gaudyb
17658c5df8
New workflow to generate embeddings in a single workflow (#1296)
* New workflow to generate embeddings in a single workflow

* New workflow to generate embeddings in a single workflow

* version change

* clean tests without any embeddings references

* clean tests without any embeddings references

* remove code

* feedback implemented

* changes in logic

* feedback implemented

* store in table bug fixed

* smoke test for generate_text_embeddings workflow

* smoke test fix

* add generate_text_embeddings to the list of transient workflows

* smoke tests

* fix

* ruff formatting updates

* fix

* smoke test fixed

* smoke test fixed

* fix lancedb import

* smoke test fix

* ignore sorting

* smoke test fixed

* smoke test fixed

* check smoke test

* smoke test fixed

* change config for vector store

* format fix

* vector store changes

* revert debug profile back to empty filepath

* merge conflict solved

* merge conflict solved

* format fixed

* format fixed

* fix return dataframe

* snapshot fix

* format fix

* embeddings param implemented

* validation fixes

* fix map

* fix map

* fix properties

* config updates

* smoke test fixed

* settings change

* Update collection config and rework back-compat

* Repalce . with - for embedding store

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2024-11-01 15:01:35 -07:00
Chris Trevino
8302920ac8
move mkdocs-typer to devdeps (#1331)
* move mkdocs-typer to devdeps

* add .gitattributes for toml parsing issues on Windows CI

* bump timeout

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-30 14:49:30 -07:00
Alonso Guevara
7235c6faf5
Add Incremental Indexing v1 (#1318)
* Create entypoint for cli and api (#1067)

* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests

* Incremental indexing/file delta (#1123)

* Calculate new inputs and deleted inputs on update

* Semver

* Clear ruff checks

* Fix pyright

* Fix PyRight

* Ruff again

* Update relationships after inc index (#1236)

* Collapse create final community reports (#1227)

* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set

* Collapse create base entity graph (#1233)

* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment

* Collapse create summarized entities (#1237)

* Collapse entity summarize

* Semver

* Collapse create base extracted entities (#1235)

* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests

* Incremental indexing/update final text units (#1241)

* Update final text units

* Format

* Address comments

* Add v1 community merge using time period (#1257)

* Add naive community merge using time period

* formatting

* Query fixes

* Add descriptions from merged_entities

* Add summarization and embeddings

* Use iso format

* Ruff

* Pyright and smoke tests

* Pyright

* Pyright

* Update parquet for verb tests

* Fix smoke tests

* Remove sorting

* Update smoke tests

* Smoke tests

* Smoke tests

* Updated verb test to ack for latest changes on covariates

* Add config for incremental index + Bug fixes (#1317)

* Add config for incremental index + Bug fixes

* Ruff

* Fix smoke tests

* Semversioner

* Small refactor

* Remove unused file

* Ruff

* Update verb tests inputs

* Update verb tests inputs

---------

Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2024-10-30 11:59:44 -06:00
Josh Bradley
0cc79b9cf7
Add backwards compatibility patch for vector store (#1334) 2024-10-29 14:54:08 -04:00
Alonso Guevara
83026bdb26
Remove duplicated entried from relationships and nodes (#1333) 2024-10-29 00:56:07 -04:00
Josh Bradley
083de12bcf
Auto-generate CLI doc pages (#1325) 2024-10-25 19:00:24 -04:00
Josh Bradley
d6e6f5c077
Convert CLI to Typer app (#1305) 2024-10-24 14:22:32 -04:00
Nathan Evans
94f1e62e5c
Rework workflow architecture (#1311)
* Rename pipeline_storage file

* Add runtime storage option to context

* Fix import

* Switch to memory storage for runtime

* Infra for workflow runtime storage

* Migrate base_text_units to runtime storage

* Fix comment

* Semver

* Remove whitespace

* Remove subflow smoke tests and ignore transient artifacts

* Remove entity graph from transient list (not yet implemented)

* Increase smoke runtime allotment for create_base_entity_graph

* Revert format fix

* Remove noqa
2024-10-24 10:20:03 -07:00
Alonso Guevara
ac09e0a740
Feature/optimize count relationships (#1312)
* refactor build text unit context for better performance

* Further optimization and styling

* Remove TODO

---------

Co-authored-by: Brad Firesheets <v-bradleyf@microsoft.com>
Co-authored-by: bfirems <162185685+bfirems@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-10-23 12:03:57 -06:00
Josh Bradley
3df6f8c65b
Allow ci/cd to skip draft PRs (#1314) 2024-10-23 12:46:00 -04:00
Alonso Guevara
77e77775ad
Fix drift search edge cases over small input sets (#1310)
* Fix edge cases over small input sets

* Ruff
2024-10-22 16:24:41 -06:00
JunHo Kim (김준호)
8d8c67d503
fix typo. Update documentation URLs for consistency (#1298)
Update documentation URLs for consistency

Revised links in documentation files to remove the "posts" subdirectory for consistency and correctness.

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-21 17:24:17 -06:00
Alonso Guevara
8a6d4e66fe
DRIFT Search (#1285)
* drift search

* args for drift global query in local search

* accept drift context in search base

* optionally parse embeddings from df when creating CommunityReport

* abstract class for drift context

* pathing for drift config

* drift config

* add defs for drift config

* formatting

* capture generated tokens in token count

* semversion

* Formatting and ruff

* Some algorithmic refactors

* Ruff

* Format

* Use asdict()

* Address comments

* Update smoke tests

* Update smoke tests

* Update smoke tests part 2

---------

Co-authored-by: Julian Whiting <j2whitin@gmail.com>
2024-10-21 17:22:11 -06:00
KennyZhang1
e0840a2dc4
Fix vector store logic and refactor audience parameter (#1259) 2024-10-21 16:56:56 -04:00
Matthieu Maitre
6aae386b30
Perf optimizations in map_query_to_entities() (#1276)
* Address perf issue in map_query_to_entities()

* Add semver

---------

Co-authored-by: Matthieu Maitre <mmaitre@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-21 12:03:48 -06:00
Nathan Evans
1f70d42572
Empty workflow returns (#1291)
* Skip emitting empty dataframes

* Semver

* Better empty df check
2024-10-17 09:25:36 -07:00
Andres Morales
fc502ee029
Fix cookie consent script missing (#1292) 2024-10-17 09:44:14 -06:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows (#1284)
* Copy base documents logic into final documents

* Delete create_base_documents

* Combine graph creation under create_base_entity_graph

* Delete collapsed workflows

* Migrate most graph internals to nx.Graph

* Fix None edge case

* Semver

* Remove comment typo

* Fix smoke tests
2024-10-15 13:58:58 -06:00
Andres Morales
137a5cd550
Fix/docs auto prompt img (#1283)
* Fix auto prompt tuning image path
2024-10-14 09:02:31 -06:00
Alonso Guevara
cb052a742f
Dependency updates (#1272)
* Dependency updates

* Pyright update
2024-10-11 18:06:11 -06:00
Andres Morales
fc9895f793
Replace current docs by mkdocs (#1263)
* Replace docs by mkdocs-material

* Fix markdown

* Fix verions in gh-pages workflow

* remove whitespaces

* add semver

* Add build docs check on python-ci

* Fix command in index cli

* Spellcheck

* Spellcheck

* remove docsite paths

* clear outputs from notebook

* remove dependabot npm for docsite

* remove more docsite left overs

* execute notebooks

* Update notebooks

* update poetry lock

* Remove notebook build from ci

* Revert dep update

* Navigation tabs

* Fix stylesheet

* add kwds to dictionary

* Turn on notebook execution

* Update gitignore

* Add MSR Blog posts

* spellcheck

* Accessibility Changes

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-11 13:39:03 -06:00
Josh Bradley
d9a005c9b8
Reorganize python package structure (#1214) 2024-10-10 17:01:42 -04:00
9prodhi
ce8749bd19
Fix: Add await to LLM execution for async handling (#1206)
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 17:26:28 -06:00
Sumit K Bhuttan
cd4f1fa9ba
Adding fix per comment on Issue-692 (#1255)
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 17:09:17 -06:00
Alonso Guevara
9fa6b91684
Chore/community context clean (#1262)
* Update community_context.py to check conversation_history_context's value

For the following code (line 90 - 96), conversation_history_context is concatenated with community_context, but the case where conversation_history_context is empty("") has not been considered. When conversation_history_context is empty (""), concatenation should not be performed, as it would result in community_context or each element in community_context having an extra "\n\n".

Therefore, by introducing a context_prefix to check the state of conversation_history_context, concatenation can be handled appropriately. When conversation_history_context is empty (""), the following code will use "" for concatenation. When conversation_history_context is not empty (""), the functionality will be similar to the previous code.

* Format and semver

* Code cleanup

---------

Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com>
2024-10-09 17:01:54 -06:00
JunHo Kim (김준호)
d4a0a590f4
Change config.json references to settings.json in the configuration document. (#1221)
Updated the configuration documentation to reflect the default filename for configuration file.

Default config files are `["settings.yaml", "settings.yml", "settings.json"]`

ce71bcf7fb/graphrag/config/config_file_loader.py (L15)

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 15:20:18 -06:00
JunHo Kim (김준호)
d66901e67e
Update description of GRAPHRAG_CACHE_BASE_DIR in env_vars.md (#1213)
* Update description of GRAPHRAG_CACHE_BASE_DIR in env_vars.md

Clarified that `GRAPHRAG_CACHE_BASE_DIR` refers to the base directory path for cache files rather than reporting outputs. This improves the accuracy of the documentation and helps users understand the correct usage of this environment variable.

* Update description of `GRAPHRAG_CACHE_BASE_DIR`

Simplified the description of `GRAPHRAG_CACHE_BASE_DIR` to make it clearer. Changed "base directory path" to "base path" for conciseness.

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-10-09 15:16:50 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs (#1248)
* Remove genid

* Move snapshot_rows

* Move snapshot

* Delete spread_json

* Delete unzip

* Delete zip

* Move unpack_graph

* Move compute_edge_combined_degree

* Delete create_graph

* Delete concat

* Delete text replace

* Delete text_translate

* Move text_split

* Inline aggregate override

* Move cluster_graph

* Move merge_graphs

* Semver

* Move text_chunk

* Move layout_graph and fix some __init__s

* Move extract_covariates

* Rename text_split -> split_text

* Move extract_entities

* Move summarize_descriptions

* Rename text_chunk -> chunk_text

* Move community report creation

* Remove verb-level packing operators

* Streamline some naming

* Streamline param name/order

* Move mock LLM data to tests

* Fixed missed rename

* Update some strategy refs

* Rename run_gi

* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Nathan Evans
718d1ef441
Migrate embedding operations (#1242)
* Move text_embed to verb-less operation

* Move embed_graph to verb-less operation

* Return embeddings from embed_graph instead of modifying df

* Semver

* Use config existence instead of bool for graph embedding

* Send clustering strategy directly
2024-10-03 16:01:39 -07:00
Nathan Evans
f5c5876dde
Reorganize flows (#1240)
* Extract base docs and entity graph

* Move extracted entities and text units

* Move communities and community reports

* Move covariates and final documents

* Move entities, nodes, relationships

* Move text_units and summarized entities

* Assert all snapshot null cases

* Remove disabled steps util

* Remove incorrect use of input "others"

* Convert text_embed_df to just return the embeddings, not update the df

* Convert snapshot functions to noops

* Semver

* Remove lingering covariates_enabled param

* Name consistency

* Syntax cleanup
2024-10-02 08:57:08 -07:00
Nathan Evans
9070ea5c3c
Collapse create base extracted entities (#1235)
* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests
2024-09-30 17:32:56 -07:00
Nathan Evans
630679f8e3
Collapse create summarized entities (#1237)
* Collapse entity summarize

* Semver
2024-09-30 17:17:44 -07:00
Nathan Evans
5220bb7ecc
Collapse create base entity graph (#1233)
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
2024-09-30 15:39:42 -07:00
Nathan Evans
00d5e77568
Collapse create final community reports (#1227)
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
2024-09-30 10:46:07 -07:00
Alonso Guevara
0d348d6070
Remove unused cols from final entities (#1226)
* Remove unused cols from final entities

* Move verbs test to integ

* Move verbs test to integ

* Move to smoke tests
2024-09-27 17:10:52 -06:00