# Development Guide This document is for developers interested in contributing to GraphRAG. ## Quickstart Development is best done in a unix environment (Linux, Mac, or [Windows WSL](https://learn.microsoft.com/en-us/windows/wsl/install)). 1. Clone the GraphRAG repository. 1. Follow all directions in the [deployment guide](DEPLOYMENT-GUIDE.md) to install required tools and deploy an instance of the GraphRAG service in Azure. Alternatively, this repo provides a devcontainer with all tools preinstalled. 1. Create a `.env` file in the root of the repository (`GraphRAG/.env`). A detailed description of environment variables used to configure graphrag can be found [here](https://microsoft.github.io/graphrag). Add the following environment variables to the `.env` file: | Environment Variable | Description | | :--- | ---: | `COSMOS_URI_ENDPOINT` | Azure CosmosDB connection string from graphrag deployment `STORAGE_ACCOUNT_BLOB_URL` | Azure Storage blob url from graphrag deployment `AI_SEARCH_URL` | AI search endpoint from graphrag deployment (will be in the form of https://\.search.windows.net) `GRAPHRAG_API_BASE` | The AOAI API Base URL. `GRAPHRAG_API_VERSION` | The AOAI API version (i.e. `2023-03-15-preview`) `GRAPHRAG_LLM_MODEL` | The AOAI model name (i.e. `gpt-4`) `GRAPHRAG_LLM_DEPLOYMENT_NAME` | The AOAI model deployment name (i.e. `gpt-4-turbo`) `GRAPHRAG_EMBEDDING_MODEL` | The AOAI model name (i.e. `text-embedding-ada-002`) `GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME` | The AOAI model deployment name (i.e.`my-text-embedding-ada-002`) `REPORTERS` | A comma-delimited list of logging that will be enabled. Possible values are `blob,console,file` 1. Developing inside the devcontainer 1. Requirements - [Docker](https://www.docker.com/) - [Visual Studio Code](https://code.visualstudio.com/) - [Remote - Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack) for VS Code 1. Open VS Code in the directory containing your project. - Use the Command Palette (Ctrl+Shift+P on Windows/Linux, Cmd+Shift+P on macOS) and type "Remote-Containers: Open Folder in Container..." - Select your project folder and VS Code will start building the Docker container based on the Dockerfile and devcontainer.json in your project. This process may take a few minutes, especially on the first run. - Once your vscode prompt appears, it may not be done. You should wait for the following prompt to appear to ensure full install is complete. `vscode@:/graphrag$` 1. Adding Python packages to the dev container. - Poetry is the Python package manager in the dev container. Python packages can be added using `poetry add ` - Everytime a package is added it will update `poetry.lock` and `pyproject.toml`, these are the two files that track all package management. Changes to these file should be checked into the repo. That is how we keep our devcontainer consistent across users. - Its possible to get into a situation where a package has been added but your local poetry.lock does not contain the proper hash. This is most common after resolving a merge conflict and the easiest way to resolve this issue is `poetry install`, which will check all package status' and update hash values in `poetry.lock`. 1. Adding dependencies to the environment - Most dependencies (packages or tools) should be added to the environment through the Dockerfile. This allows us to maintain a consistent development enviornment. If you need a tool added, please make the appropriate changes to the Dockerfile and submit a Pull Request. ### Deploying GraphRAG The GraphRAG service consist of two components - a `backend` application and a `frontend` UI application (coming soon). GraphRAG can be launched in multiple ways depending on where in the application stack you are developing and debugging. - In Azure Kubernetes Service (AKS): Navigate to the root directory of the repository. First build and publish the `backend` docker image to an azure container registry. ``` > az acr build --registry -f docker/Dockerfile-backend --image graphrag:backend . ``` Update `infra/deployment.parameters.json` to use your custom graphrag docker images and re-run the deployment script to update AKS. After deployment is complete, `kubectl` is used to login and view the GraphRAG AKS resources as well aid in other debugging use-cases. See below for some helpful commands to quickly access AKS ``` > RGNAME= > AKSNAME=`az aks list --resource-group $RGNAME --query "[].name" --output tsv` > az aks get-credentials -g $RGNAME -n $AKSNAME --overwrite-existing > kubectl config set-context --current --namespace=graphrag ``` Some example AKS commands below to get started ``` > kubectl get pods # view a list of all deployed pods > kubectl get nodes # view a list of all deployed nodes > kubectl get jobs # view a list of all AKS jobs > kubectl logs # print out useful logging information (print statements) > kubectl exec -it -- bash # login to a running container > kubectl describe pod # retrieve detailed info about a pod > kubectl describe node # retrieve detailed info about a node ``` ### Testing A small collection of pytests have been written to test functionality of the API. To run the tests, add the following envirionment variables to a `.env` file in the root of the repo directory. ```shell APIM_SUBSCRIPTION_KEY COSMOS_URI_ENDPOINT DEPLOYMENT_URL STORAGE_ACCOUNT_BLOB_URL ``` The tests assume the solution accelerator has been previously deployed and managed identity has been setup with RBAC access to CosmosDB and Azure Storage. To run the test locally: ``` # cd to root directory of the repo > pytest backend/src/tests/test_all_index_endpoint.py -s ``` ### Deployment (CI/CD) This repository uses Github Actions for continuous integration and continious deployment (CI/CD). ### Style Guide: * We follow [PEP 8](https://peps.python.org/pep-0008) standards and naming conventions as close as possible. * [ruff](https://docs.astral.sh/ruff) is used for linting and code formatting. A pre-commit hook has been setup to automatically apply settings to this repo. To make use of this tool without explicitly calling it, install the pre-commit hook. ``` > pre-commit install ``` ### Versioning We use [SemVer](https://aka.ms/StartRight/README-Template/semver) for semantic versioning.