mirror of
https://github.com/microsoft/graphrag.git
synced 2025-11-25 22:46:59 +00:00
* Initial plan * Switch from Poetry to uv for package management Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Clean up build artifacts and update gitignore Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * remove build artifacts * remove hardcoded version string * fix calls to pip in cicd * Update gh-pages.yml workflow to use uv instead of Poetry Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * update cicd workflow with latest uv action * fix command to retrieve package version * update development instructions * remove Poetry references * Replace deprecated azuright action with npm-based Azurite installation Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * skip api version check for azurite * add semversioner file * update more changes from switching to UV * Migrate unified-search-app from Poetry to uv package management Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * minor typo update * minor Dockerfile update * update cicd thresholds * update pytest thresholds * ruff fixes * ruff fixes * remove legacy npm settings that no longer apply * Update Unified Search App Readme --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
41 lines
1.8 KiB
Markdown
41 lines
1.8 KiB
Markdown
# GraphRAG Indexing 🤖
|
|
|
|
The GraphRAG indexing package is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using LLMs.
|
|
|
|
Indexing Pipelines are configurable. They are composed of workflows, standard and custom steps, prompt templates, and input/output adapters. Our standard pipeline is designed to:
|
|
|
|
- extract entities, relationships and claims from raw text
|
|
- perform community detection in entities
|
|
- generate community summaries and reports at multiple levels of granularity
|
|
- embed entities into a graph vector space
|
|
- embed text chunks into a textual vector space
|
|
|
|
The outputs of the pipeline are stored as Parquet tables by default, and embeddings are written to your configured vector store.
|
|
|
|
## Getting Started
|
|
|
|
### Requirements
|
|
|
|
See the [requirements](../developing.md#requirements) section in [Get Started](../get_started.md) for details on setting up a development environment.
|
|
|
|
To configure GraphRAG, see the [configuration](../config/overview.md) documentation.
|
|
After you have a config file you can run the pipeline using the CLI or the Python API.
|
|
|
|
## Usage
|
|
|
|
### CLI
|
|
|
|
```bash
|
|
uv run poe index --root <data_root> # default config mode
|
|
```
|
|
|
|
### Python API
|
|
|
|
Please see the indexing API [python file](https://github.com/microsoft/graphrag/blob/main/graphrag/api/index.py) for the recommended method to call directly from Python code.
|
|
|
|
## Further Reading
|
|
|
|
- To start developing within the _GraphRAG_ project, see [getting started](../developing.md)
|
|
- To understand the underlying concepts and execution model of the indexing library, see [the architecture documentation](../index/architecture.md)
|
|
- To read more about configuring the indexing engine, see [the configuration documentation](../config/overview.md)
|