graphrag/docs/index/overview.md
Copilot 7c28c70d5c
Switch from Poetry to uv for package management (#2008)
* Initial plan

* Switch from Poetry to uv for package management

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Clean up build artifacts and update gitignore

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove build artifacts

* remove hardcoded version string

* fix calls to pip in cicd

* Update gh-pages.yml workflow to use uv instead of Poetry

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update cicd workflow with latest uv action

* fix command to retrieve package version

* update development instructions

* remove Poetry references

* Replace deprecated azuright action with npm-based Azurite installation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* skip api version check for azurite

* add semversioner file

* update more changes from switching to UV

* Migrate unified-search-app from Poetry to uv package management

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* minor typo update

* minor Dockerfile update

* update cicd thresholds

* update pytest thresholds

* ruff fixes

* ruff fixes

* remove legacy npm settings that no longer apply

* Update Unified Search App Readme

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-08-13 18:57:25 -06:00

1.8 KiB

GraphRAG Indexing 🤖

The GraphRAG indexing package is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using LLMs.

Indexing Pipelines are configurable. They are composed of workflows, standard and custom steps, prompt templates, and input/output adapters. Our standard pipeline is designed to:

  • extract entities, relationships and claims from raw text
  • perform community detection in entities
  • generate community summaries and reports at multiple levels of granularity
  • embed entities into a graph vector space
  • embed text chunks into a textual vector space

The outputs of the pipeline are stored as Parquet tables by default, and embeddings are written to your configured vector store.

Getting Started

Requirements

See the requirements section in Get Started for details on setting up a development environment.

To configure GraphRAG, see the configuration documentation. After you have a config file you can run the pipeline using the CLI or the Python API.

Usage

CLI

uv run poe index --root <data_root> # default config mode

Python API

Please see the indexing API python file for the recommended method to call directly from Python code.

Further Reading