mirror of
https://github.com/microsoft/graphrag.git
synced 2025-07-13 03:50:53 +00:00

* Wordind updates * Update yam lconfig and add notes to "deprecated" env * Add basic search section * Update versioning docs * Minor edits for clarity * Update init command * Update init to add --force in docs * Add NLP extraction params * Move vector_store to root * Add workflows to config * Add FastGraphRAG docs * add metadata column changes * Added documentation for multi index search. * Minor fixes. * Add config and table renames * Update migration notebook and comments to specify v1 * Add frequency to entity table docs * add new chunking options for metadata * Update output docs * Minor edits and cleanup * Add model ids to search configs * Spruce up migration notebook * Lint/format multi-index notebook * SpaCy model note * Update SpaCy footnote * Updated multi_index_search.ipynb to remove ruff errors. * add spacy to dictionary --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Dayenne Souza <ddesouza@microsoft.com> Co-authored-by: dorbaker <dorbaker@microsoft.com>
35 lines
1.6 KiB
Markdown
35 lines
1.6 KiB
Markdown
# Indexing Architecture
|
|
|
|
## Key Concepts
|
|
|
|
### Knowledge Model
|
|
|
|
In order to support the GraphRAG system, the outputs of the indexing engine (in the Default Configuration Mode) are aligned to a knowledge model we call the _GraphRAG Knowledge Model_.
|
|
This model is designed to be an abstraction over the underlying data storage technology, and to provide a common interface for the GraphRAG system to interact with.
|
|
In normal use-cases the outputs of the GraphRAG Indexer would be loaded into a database system, and the GraphRAG's Query Engine would interact with the database using the knowledge model data-store types.
|
|
|
|
### Workflows
|
|
|
|
Because of the complexity of our data indexing tasks, we needed to be able to express our data pipeline as series of multiple, interdependent workflows.
|
|
|
|
```mermaid
|
|
---
|
|
title: Sample Workflow DAG
|
|
---
|
|
stateDiagram-v2
|
|
[*] --> Prepare
|
|
Prepare --> Chunk
|
|
Chunk --> ExtractGraph
|
|
Chunk --> EmbedDocuments
|
|
ExtractGraph --> GenerateReports
|
|
ExtractGraph --> EmbedEntities
|
|
ExtractGraph --> EmbedGraph
|
|
```
|
|
|
|
### LLM Caching
|
|
|
|
The GraphRAG library was designed with LLM interactions in mind, and a common setback when working with LLM APIs is various errors due to network latency, throttling, etc..
|
|
Because of these potential error cases, we've added a cache layer around LLM interactions.
|
|
When completion requests are made using the same input set (prompt and tuning parameters), we return a cached result if one exists.
|
|
This allows our indexer to be more resilient to network issues, to act idempotently, and to provide a more efficient end-user experience.
|