6.8 KiB
Development Guide
This document is for developers interested in contributing to GraphRAG.
Quickstart
Development is best done in a unix environment (Linux, Mac, or Windows WSL).
-
Clone the GraphRAG repository.
-
Follow all directions in the deployment guide to install required tools and deploy an instance of the GraphRAG service in Azure. Alternatively, this repo provides a devcontainer with all tools preinstalled.
-
Create a
.envfile in the root of the repository (GraphRAG/.env). A detailed description of environment variables used to configure graphrag can be found here. Add the following environment variables to the.envfile:Environment Variable Description COSMOS_URI_ENDPOINTAzure CosmosDB connection string from graphrag deployment STORAGE_ACCOUNT_BLOB_URLAzure Storage blob url from graphrag deployment AI_SEARCH_URLAI search endpoint from graphrag deployment (will be in the form of https://<name>.search.windows.net) GRAPHRAG_API_BASEThe AOAI API Base URL. GRAPHRAG_API_VERSIONThe AOAI API version (i.e. 2023-03-15-preview)GRAPHRAG_LLM_MODELThe AOAI model name (i.e. gpt-4)GRAPHRAG_LLM_DEPLOYMENT_NAMEThe AOAI model deployment name (i.e. gpt-4-turbo)GRAPHRAG_EMBEDDING_MODELThe AOAI model name (i.e. text-embedding-ada-002)GRAPHRAG_EMBEDDING_DEPLOYMENT_NAMEThe AOAI model deployment name (i.e. my-text-embedding-ada-002)REPORTERSA comma-delimited list of logging that will be enabled. Possible values are blob,console,file -
Developing inside the devcontainer
-
Requirements
- Docker
- Visual Studio Code
- Remote - Containers extension for VS Code
-
Open VS Code in the directory containing your project.
- Use the Command Palette (Ctrl+Shift+P on Windows/Linux, Cmd+Shift+P on macOS) and type "Remote-Containers: Open Folder in Container..."
- Select your project folder and VS Code will start building the Docker container based on the Dockerfile and devcontainer.json in your project. This process may take a few minutes, especially on the first run.
- Once your vscode prompt appears, it may not be done. You should wait for the following prompt to appear to ensure full install is complete.
vscode@<hostname>:/graphrag$
-
Adding Python packages to the dev container.
- Poetry is the Python package manager in the dev container. Python packages can be added using
poetry add <package-name> - Everytime a package is added it will update
poetry.lockandpyproject.toml, these are the two files that track all package management. Changes to these file should be checked into the repo. That is how we keep our devcontainer consistent across users. - Its possible to get into a situation where a package has been added but your local poetry.lock does not contain the proper hash. This is most common after resolving a merge conflict and the easiest way to resolve this issue is
poetry install, which will check all package status' and update hash values inpoetry.lock.
- Poetry is the Python package manager in the dev container. Python packages can be added using
-
Adding dependencies to the environment
- Most dependencies (packages or tools) should be added to the environment through the Dockerfile. This allows us to maintain a consistent development enviornment. If you need a tool added, please make the appropriate changes to the Dockerfile and submit a Pull Request.
-
Deploying GraphRAG
The GraphRAG service consist of two components - a backend application and a frontend UI application (coming soon). GraphRAG can be launched in multiple ways depending on where in the application stack you are developing and debugging.
-
In Azure Kubernetes Service (AKS):
Navigate to the root directory of the repository. First build and publish the
backenddocker image to an azure container registry.> az acr build --registry <my_container_registry> -f docker/Dockerfile-backend --image graphrag:backend .Update
infra/deployment.parameters.jsonto use your custom graphrag docker images and re-run the deployment script to update AKS.After deployment is complete,
kubectlis used to login and view the GraphRAG AKS resources as well aid in other debugging use-cases. See below for some helpful commands to quickly access AKS> RGNAME=<your_resource_group> > AKSNAME=`az aks list --resource-group $RGNAME --query "[].name" --output tsv` > az aks get-credentials -g $RGNAME -n $AKSNAME --overwrite-existing > kubectl config set-context --current --namespace=graphragSome example AKS commands below to get started
> kubectl get pods # view a list of all deployed pods > kubectl get nodes # view a list of all deployed nodes > kubectl get jobs # view a list of all AKS jobs > kubectl logs <pod_name> # print out useful logging information (print statements) > kubectl exec -it <pod_name> -- bash # login to a running container > kubectl describe pod <pod_name> # retrieve detailed info about a pod > kubectl describe node <node_name> # retrieve detailed info about a node
Testing
A small collection of pytests have been written to test functionality of the API. To run the tests, add the following envirionment variables to a .env file in the root of the repo directory.
APIM_SUBSCRIPTION_KEY
COSMOS_URI_ENDPOINT
DEPLOYMENT_URL
STORAGE_ACCOUNT_BLOB_URL
The tests assume the solution accelerator has been previously deployed and managed identity has been setup with RBAC access to CosmosDB and Azure Storage. To run the test locally:
# cd to root directory of the repo
> pytest backend/src/tests/test_all_index_endpoint.py -s
Deployment (CI/CD)
This repository uses Github Actions for continuous integration and continious deployment (CI/CD).
Style Guide:
-
We follow PEP 8 standards and naming conventions as close as possible.
-
ruff is used for linting and code formatting. A pre-commit hook has been setup to automatically apply settings to this repo. To make use of this tool without explicitly calling it, install the pre-commit hook.
> pre-commit install
Versioning
We use SemVer for semantic versioning.