mirror of
https://github.com/microsoft/graphrag.git
synced 2025-11-23 05:26:53 +00:00
* Replace docs by mkdocs-material * Fix markdown * Fix verions in gh-pages workflow * remove whitespaces * add semver * Add build docs check on python-ci * Fix command in index cli * Spellcheck * Spellcheck * remove docsite paths * clear outputs from notebook * remove dependabot npm for docsite * remove more docsite left overs * execute notebooks * Update notebooks * update poetry lock * Remove notebook build from ci * Revert dep update * Navigation tabs * Fix stylesheet * add kwds to dictionary * Turn on notebook execution * Update gitignore * Add MSR Blog posts * spellcheck * Accessibility Changes --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
1.8 KiB
1.8 KiB
Indexer CLI
The GraphRAG indexer CLI allows for no-code usage of the GraphRAG Indexer.
python -m graphrag.index --verbose --root </workspace/project/root> \
--config <custom_config.yml> --resume <timestamp> \
--reporter <rich|print|none> --emit json,csv,parquet \
--nocache
CLI Arguments
--verbose- Adds extra logging information during the run.--root <data-project-dir>- the data root directory. This should contain aninputdirectory with the input data, and an.envfile with environment variables. These are described below.--init- This will initialize the data project directory at the specifiedrootwith bootstrap configuration and prompt-overrides.--resume <output-timestamp>- if specified, the pipeline will attempt to resume a prior run. The parquet files from the prior run will be loaded into the system as inputs, and the workflows that generated those files will be skipped. The input value should be the timestamped output folder, e.g. "20240105-143721".--config <config_file.yml>- This will opt-out of the Default Configuration mode and execute a custom configuration. If this is used, then none of the environment-variables below will apply.--reporter <reporter>- This will specify the progress reporter to use. The default isrich. Valid values arerich,print, andnone.--emit <types>- This specifies the table output formats the pipeline should emit. The default isparquet. Valid values areparquet,csv, andjson, comma-separated.--nocache- This will disable the caching mechanism. This is useful for debugging and development, but should not be used in production.--output <directory>- Specify the output directory for pipeline artifacts.--reports <directory>- Specify the output directory for reporting.