* Update community_context.py to check conversation_history_context's value
For the following code (line 90 - 96), conversation_history_context is concatenated with community_context, but the case where conversation_history_context is empty("") has not been considered. When conversation_history_context is empty (""), concatenation should not be performed, as it would result in community_context or each element in community_context having an extra "\n\n".
Therefore, by introducing a context_prefix to check the state of conversation_history_context, concatenation can be handled appropriately. When conversation_history_context is empty (""), the following code will use "" for concatenation. When conversation_history_context is not empty (""), the functionality will be similar to the previous code.
* Format and semver
* Code cleanup
---------
Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com>
Updated the configuration documentation to reflect the default filename for configuration file.
Default config files are `["settings.yaml", "settings.yml", "settings.json"]`
ce71bcf7fb/graphrag/config/config_file_loader.py (L15)
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Update description of GRAPHRAG_CACHE_BASE_DIR in env_vars.md
Clarified that `GRAPHRAG_CACHE_BASE_DIR` refers to the base directory path for cache files rather than reporting outputs. This improves the accuracy of the documentation and helps users understand the correct usage of this environment variable.
* Update description of `GRAPHRAG_CACHE_BASE_DIR`
Simplified the description of `GRAPHRAG_CACHE_BASE_DIR` to make it clearer. Changed "base directory path" to "base path" for conciseness.
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Move text_embed to verb-less operation
* Move embed_graph to verb-less operation
* Return embeddings from embed_graph instead of modifying df
* Semver
* Use config existence instead of bool for graph embedding
* Send clustering strategy directly
* Extract base docs and entity graph
* Move extracted entities and text units
* Move communities and community reports
* Move covariates and final documents
* Move entities, nodes, relationships
* Move text_units and summarized entities
* Assert all snapshot null cases
* Remove disabled steps util
* Remove incorrect use of input "others"
* Convert text_embed_df to just return the embeddings, not update the df
* Convert snapshot functions to noops
* Semver
* Remove lingering covariates_enabled param
* Name consistency
* Syntax cleanup
* Remove aggregate_df from final coomunities and final text units
* Semver
* Ruff and format
* Format
* Format
* Fix tests, ruff and checks
* Remove some leftover prints
* Removed _final_join method
Corrected a misspelling of 'customizability' in the env_vars.md documentation. This change ensures clarity and accuracy in the description of input data handling configurations.
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Create entypoint for cli and api (#1067)
* Add cli and api entrypoints for update index
* Semver
* Update docs
* Run tests on feature branch main
* Better /main handling in tests
* Incremental indexing/file delta (#1123)
* Calculate new inputs and deleted inputs on update
* Semver
* Clear ruff checks
* Fix pyright
* Fix PyRight
* Ruff again
* Update Final Entities merging in new and existing entities from delta
* Update formatting
* Pyright
* Ruff
* Fix for pyright
* Yet Another Pyright test
* Pyright
* Format
* Migrate towards using static output directories
- Fixes load_config eagering resolving directories.
Directories are only resolved when the output
directories are local.
- Add support for `--output` and `--reporting` flags
for index CLI. To achieve previous output structure
`index --output run1/artifacts --reports run1/reports`.
- Use static output directories when initializing
a new project.
- Maintains backward compatibility for those using
timestamp outputs locally.
* fix smoke tests
* update query cli to work with static directories
* remove eager path resolution from load_config. Support CLI overrides that can be resolved.
* add docs and output logs/artifacts to same directory
* use match statement
* switch back to if statement
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>