Notebook cleanup and update (#36)

This commit is contained in:
Josh Bradley 2024-06-28 15:55:54 -04:00 committed by americanthinker
parent 19ac4604b7
commit e83227fbd5
6 changed files with 258 additions and 148 deletions

4
.gitignore vendored
View File

@ -5,9 +5,11 @@
logs
logs/*
# ignore example directory created by HelloWorld.ipynb
# ignore files created by the jupyter notebooks demos
example_files/
files/
prompts
testdata
.scripts/
# Byte-compiled / optimized / DLL files

View File

@ -12,7 +12,7 @@ For FAQ, access instructions, and our roadmap, please visit `aka.ms/graphrag`
### Deployment Guide
To deploy the solution accelerator, see the [deployment guide](docs/DEPLOYMENT-GUIDE.md). This will result in a full deployment of graphrag as an API.
Afterwards, check out the [Hello World](notebooks/HelloWorld.ipynb) notebook for a demonstration of various API calls.
Afterwards, check out the [Quickstart](notebooks/1-Quickstart.ipynb) notebook for a demonstration of various API calls.
## Development Guide
Interested in contributing? Check out the [development guide](docs/DEVELOPMENT-GUIDE.md).

View File

@ -15,7 +15,7 @@ The deployment process requires the following tools to be installed:
* [kubectl](https://kubernetes.io/docs/tasks/tools) - k8s command line tool
* [yq](https://github.com/mikefarah/yq?tab=readme-ov-file#install) >= v4.40.7 - yaml file parser
TIP: If you open this repository inside a devcontainer (i.e. VSCode Dev Containers or Codespaces), all required tools for deployment will already be available. Opening a devcontainer using VS Code requires <a href="https://docs.docker.com/engine/install/" target="_blank" >Docker to be installed</a>.
TIP: If you open this repository inside a devcontainer (i.e. VSCode Dev Containers or Codespaces), all required tools for deployment will already be available. Opening a devcontainer using VS Code requires <a href="https://docs.docker.com/engine/install/" target="_blank" >Docker to be installed</a>.
The setup/deployment process has been mostly automated with a shell script and Bicep files (infrastructure as code). Azure CLI will deploy all necessary Azure resources using these Bicep files. The deployment is configurable using values defined in `infra/deploy.parameters.json`. To the utmost extent, we have provided default values but users are still expected to modify some values.
@ -25,7 +25,7 @@ You will need the following <a href="https://learn.microsoft.com/en-us/azure/rol
| Permission | Scope |
| :--- | ---: |
Contributor | Subscription
Role Based Access Control (RBAC) Administrator | Subscription
Role Based Access Control (RBAC) Administrator | Subscription
#### Resource Provider
The Azure subscription that you deploy this solution accelerator in will require the `Microsoft.OperationsManagement` resource provider to be registered.
@ -99,4 +99,4 @@ bash deploy.sh -p deploy.parameters.json
When deploying for the first time, it will take ~40-50 minutes to deploy. Subsequent runs of this command will be faster.
### 6. Use GraphRAG
Once the deployment has finished, check out our [`Hello World`](../notebooks/HelloWorld.ipynb) notebook for a demonstration of how to use the GraphRAG API. To access the API documentation, visit `<APIM_gateway_url>/manpage/docs` in your browser. You can find the `APIM_gateway_url` by looking in the Azure Portal for the deployed APIM instance.
Once the deployment has finished, check out our [`Quickstart`](../notebooks/1-Quickstart.ipynb) notebook for a demonstration of how to use the GraphRAG API. To access the API documentation, visit `<APIM_gateway_url>/manpage/docs` in your browser. You can find the `APIM_gateway_url` by looking in the Azure Portal for the deployed APIM instance.

View File

@ -11,8 +11,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisite installs to run the quickstart notebook\n",
"Install 3rd party packages that are not part of the Python Standard Library"
"## Prerequisites\n",
"Install 3rd party packages, not part of the Python Standard Library, to run the notebook"
]
},
{
@ -21,7 +21,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install devtools pandas python-magic requests tqdm"
"! pip install devtools python-magic requests tqdm"
]
},
{
@ -32,12 +32,10 @@
"source": [
"import getpass\n",
"import json\n",
"import sys\n",
"import time\n",
"from pathlib import Path\n",
"\n",
"import magic\n",
"import pandas as pd\n",
"import requests\n",
"from devtools import pprint\n",
"from tqdm import tqdm"
@ -47,15 +45,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configuration - API Key, file directions and API endpoints"
"## (REQUIRED) User Configuration\n",
"Set the API subscription key, API base endpoint, and some file directory names that will be referenced later in the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get API Key for API Management Service\n",
"For authentication, the API requires a *subscription key* to be passed in the header of all requests. To find this key, visit the Azure Portal. The API subscription key will be located under `<my_resource_group> --> <API Management service> --> <APIs> --> <Subscriptions> --> <Built-in all-access subscription> Primary Key`."
"#### API subscription key\n",
"\n",
"APIM supports multiple forms of authentication and access control (e.g. managed identity). For this notebook demonstration, we will use a **[subscription key](https://learn.microsoft.com/en-us/azure/api-management/api-management-subscriptions)**. To locate this key, visit the Azure Portal. The subscription key can be found under `<my_resource_group> --> <API Management service> --> <APIs> --> <Subscriptions> --> <Built-in all-access subscription> Primary Key`. For multiple API users, individual subscription keys can be generated."
]
},
{
@ -66,7 +66,15 @@
"source": [
"ocp_apim_subscription_key = getpass.getpass(\n",
" \"Enter the subscription key to the GraphRag APIM:\"\n",
")"
")\n",
"\n",
"\"\"\"\n",
"\"Ocp-Apim-Subscription-Key\": \n",
" This is a custom HTTP header used by Azure API Management service (APIM) to \n",
" authenticate API requests. The value for this key should be set to the subscription \n",
" key provided by the Azure APIM instance in your GraphRAG resource group.\n",
"\"\"\"\n",
"headers = {\"Ocp-Apim-Subscription-Key\": ocp_apim_subscription_key}"
]
},
{
@ -75,13 +83,7 @@
"source": [
"#### Setup directories and API endpoint\n",
"\n",
"The following parameters are required to access and use the GraphRAG solution accelerator API:\n",
"* file_directory\n",
"* storage_name\n",
"* index_name\n",
"* endpoint\n",
"\n",
"For demonstration purposes, you may use the provided `get-wiki-articles.py` script to download a small set of wikipedia articles or provide your own data."
"For demonstration purposes, please use the provided `get-wiki-articles.py` script to download a small set of wikipedia articles or provide your own data (graphrag requires txt files to be utf-8 encoded)."
]
},
{
@ -91,18 +93,21 @@
"outputs": [],
"source": [
"\"\"\"\n",
"These parameters must be defined by the user:\n",
"These parameters must be defined by the notebook user:\n",
"\n",
"- file_directory: local directory where data files of interest are stored.\n",
"- storage_name: unique name for an Azure blob storage container where files will be uploaded.\n",
"- index_name: unique name for a single knowledge graph construction. Multiple indexes can be created from the same blob container of data.\n",
"- apim_url: the endpoint URL for GraphRAG service (this is the Gateway URL found in the APIM resource).\n",
"- file_directory: a local directory of text files. The file structure should be flat,\n",
" with no nested directories. (i.e. file_directory/file1.txt, file_directory/file2.txt, etc.)\n",
"- storage_name: a unique name to identify a blob storage container in Azure where files\n",
" from `file_directory` will be uploaded.\n",
"- index_name: a unique name to identify a single graphrag knowledge graph index.\n",
" Note: Multiple indexes may be created from the same `storage_name` blob storage container.\n",
"- endpoint: the base/endpoint URL for the GraphRAG API (this is the Gateway URL found in the APIM resource).\n",
"\"\"\"\n",
"\n",
"file_directory = \"\"\n",
"storage_name = \"\"\n",
"index_name = \"\"\n",
"apim_url = \"\""
"endpoint = \"\""
]
},
{
@ -112,31 +117,17 @@
"outputs": [],
"source": [
"assert (\n",
" file_directory != \"\" and storage_name != \"\" and index_name != \"\" and apim_url != \"\"\n",
" file_directory != \"\" and storage_name != \"\" and index_name != \"\" and endpoint != \"\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"\"Ocp-Apim-Subscription-Key\": \n",
" This is a custom HTTP header used by Azure API Management service (APIM) to \n",
" authenticate API requests. The value for this key should be set to the subscription \n",
" key provided by the Azure APIM instance in your GraphRAG resource group.\n",
"\"\"\"\n",
"\n",
"headers = {\"Ocp-Apim-Subscription-Key\": ocp_apim_subscription_key}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upload Files to Storage Data"
"## Upload Files\n",
"\n",
"For a demonstration of how to index data in graphrag, we first need to ingest a few files into graphrag."
]
},
{
@ -156,16 +147,16 @@
" Upload files to a blob storage container.\n",
"\n",
" Args:\n",
" file_directory - a local directory of .txt files to upload. All files must be in utf-8 encoding.\n",
" storage_name - a unique name for the Azure storage container.\n",
" file_directory - a local directory of .txt files to upload. All files must have utf-8 encoding.\n",
" storage_name - a unique name for the Azure storage blob container.\n",
" batch_size - the number of files to upload in a single batch.\n",
" overwrite - whether or not to overwrite files if they already exist in the storage container.\n",
" overwrite - whether or not to overwrite files if they already exist in the storage blob container.\n",
" max_retries - the maximum number of times to retry uploading a batch of files if the API is busy.\n",
"\n",
" NOTE: Uploading files may sometimes fail if the blob container was recently deleted\n",
" (i.e. a few seconds before. The solution \"in practice\" is to sleep a few seconds and try again.\n",
" \"\"\"\n",
" url = apim_url + \"/data\"\n",
" url = endpoint + \"/data\"\n",
"\n",
" def upload_batch(\n",
" files: list, storage_name: str, overwrite: bool, max_retries: int\n",
@ -236,9 +227,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Index\n",
"## Build an Index\n",
"\n",
"After data files have been uploaded, it is now possible to construct a knowledge graph by creating a search index. If an entity configuration is not provided, a default entity configuration will be used that has been shown to generally work well."
"After data files have been uploaded, we can construct a knowledge graph by building a search index."
]
},
{
@ -252,13 +243,10 @@
" index_name: str,\n",
") -> requests.Response:\n",
" \"\"\"Create a search index.\n",
" This function kicks off a job that builds a knowledge graph (KG) index from files located in a blob storage container.\n",
" This function kicks off a job that builds a knowledge graph index from files located in a blob storage container.\n",
" \"\"\"\n",
" url = apim_url + \"/index\"\n",
" request = {\n",
" \"storage_name\": storage_name,\n",
" \"index_name\": index_name\n",
" }\n",
" url = endpoint + \"/index\"\n",
" request = {\"storage_name\": storage_name, \"index_name\": index_name}\n",
" return requests.post(url, params=request, headers=headers)"
]
},
@ -268,10 +256,7 @@
"metadata": {},
"outputs": [],
"source": [
"response = build_index(\n",
" storage_name=storage_name,\n",
" index_name=index_name\n",
")\n",
"response = build_index(storage_name=storage_name, index_name=index_name)\n",
"print(response)\n",
"if response.ok:\n",
" print(response.text)\n",
@ -283,9 +268,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Check the status of an indexing job\n",
"### Check status of an indexing job\n",
"\n",
"Please wait for your index to reach 100 percent complete before continuing on to the next section to run queries."
"Please wait for your index to reach 100 percent completion before continuing on to the next section (running queries). You may rerun the next cell multiple times to monitor status. Note: the indexing speed of graphrag is directly correlated to the TPM quota of the Azure OpenAI model you are using."
]
},
{
@ -295,18 +280,11 @@
"outputs": [],
"source": [
"def index_status(index_name: str) -> requests.Response:\n",
" url = apim_url + f\"/index/status/{index_name}\"\n",
" return requests.get(url, headers=headers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"response = index_status(index_name)\n",
" url = endpoint + f\"/index/status/{index_name}\"\n",
" return requests.get(url, headers=headers)\n",
"\n",
"\n",
"response = index_status(index_name)\n",
"pprint(response.json())"
]
},
@ -316,7 +294,7 @@
"source": [
"## Query\n",
"\n",
"After an indexing job has completed, the knowledge graph is ready to query. Two types of queries (global and local) are currently supported. In addition, you can issue a query over a single index or multiple indexes."
"Once an indexing job is complete, the knowledge graph is ready to query. Two types of queries (global and local) are currently supported. We encourage you to try both and experience the difference in responses. Note that query response time is also correlated to the TPM quota of the Azure OpenAI model you are using."
]
},
{
@ -325,7 +303,7 @@
"metadata": {},
"outputs": [],
"source": [
"\"\"\"Needed helper function to parse out the clear result from the query response. \"\"\"\n",
"# a helper function to parse out the result from a query response\n",
"def parse_query_response(\n",
" response: requests.Response, return_context_data: bool = False\n",
") -> requests.Response | dict[list[dict]]:\n",
@ -350,20 +328,7 @@
"source": [
"### Global Query \n",
"\n",
"Global search queries are resource-intensive, but give good responses to questions that require an understanding of the dataset as a whole."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def global_search(index_name: str | list[str], query: str) -> requests.Response:\n",
" \"\"\"Run a global query over the knowledge graph(s) associated with one or more indexes\"\"\"\n",
" url = apim_url + \"/query/global\"\n",
" request = {\"index_name\": index_name, \"query\": query}\n",
" return requests.post(url, json=request, headers=headers)"
"Global queries are resource-intensive, but provide good responses to questions that require an understanding of the dataset as a whole."
]
},
{
@ -373,11 +338,18 @@
"outputs": [],
"source": [
"%%time\n",
"# pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2]\n",
"\n",
"\n",
"def global_search(index_name: str | list[str], query: str) -> requests.Response:\n",
" \"\"\"Run a global query over the knowledge graph(s) associated with one or more indexes\"\"\"\n",
" url = endpoint + \"/query/global\"\n",
" request = {\"index_name\": index_name, \"query\": query}\n",
" return requests.post(url, json=request, headers=headers)\n",
"\n",
"\n",
"global_response = global_search(\n",
" index_name=index_name, query=\"Summarize the main topics of this data\"\n",
")\n",
"# print the result and save context data in a variable\n",
"global_response_data = parse_query_response(global_response, return_context_data=True)\n",
"global_response_data"
]
@ -397,25 +369,20 @@
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"\n",
"def local_search(index_name: str | list[str], query: str) -> requests.Response:\n",
" \"\"\"Run a local query over the knowledge graph(s) associated with one or more indexes\"\"\"\n",
" url = apim_url + \"/query/local\"\n",
" url = endpoint + \"/query/local\"\n",
" request = {\"index_name\": index_name, \"query\": query}\n",
" return requests.post(url, json=request, headers=headers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2]\n",
" return requests.post(url, json=request, headers=headers)\n",
"\n",
"\n",
"# perform a local query\n",
"local_response = local_search(\n",
" index_name=index_name, query=\"Who are the primary actors in these communities?\"\n",
")\n",
"# print the result and save context data in a variable\n",
"local_response_data = parse_query_response(local_response, return_context_data=True)\n",
"local_response_data"
]
@ -437,7 +404,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.13"
}
},
"nbformat": 4,

View File

@ -7,7 +7,7 @@
"source": [
"# GraphRAG API Demo\n",
"\n",
"This notebook is written as a tutorial/demonstration on how to use the GraphRAG solution accelerator API."
"This notebook is written as an advanced tutorial/demonstration on how to use the GraphRAG solution accelerator API. It builds on top of the concepts covered in the `1-Quickstart` notebook."
]
},
{
@ -24,18 +24,18 @@
"| DELETE | /data/{storage_name}\n",
"| GET | /index\n",
"| POST | /index\n",
"| GET | /index/config/prompts\n",
"| DELETE | /index/{index_name}\n",
"| GET | /index/status/{index_name}\n",
"| POST | /query/global\n",
"| POST | /query/local\n",
"| GET | /graph/graphml/{index_name}\n",
"| GET | /graph/stats/{index_name}\n",
"| GET | /index/config/prompts\n",
"| GET | /source/report/{index_name}/{report_id}\n",
"| GET | /source/text/{index_name}/{text_unit_id}\n",
"| GET | /source/entity/{index_name}/{entity_id}\n",
"| GET | /source/claim/{index_name}/{claim_id}\n",
"| GET | /source/relationship/{index_name}/{relationship_id}"
"| GET | /source/relationship/{index_name}/{relationship_id}\n",
"| GET | /graph/graphml/{index_name}\n",
"| GET | /graph/stats/{index_name}"
]
},
{
@ -84,7 +84,8 @@
"id": "5",
"metadata": {},
"source": [
"## (REQUIRED) User Configuration\n"
"## (REQUIRED) User Configuration\n",
"Set the API subscription key, API base endpoint, and some file directory names that will be referenced later in this notebook."
]
},
{
@ -92,7 +93,7 @@
"id": "6",
"metadata": {},
"source": [
"#### Get API Key for API Management Service (APIM)\n",
"#### API subscription key\n",
"\n",
"APIM supports multiple forms of authentication and access control (e.g. managed identity). For this notebook demonstration, we will use a **[subscription key](https://learn.microsoft.com/en-us/azure/api-management/api-management-subscriptions)**. To locate this key, visit the Azure Portal. The subscription key can be found under `<my_resource_group> --> <API Management service> --> <APIs> --> <Subscriptions> --> <Built-in all-access subscription> Primary Key`. For multiple API users, individual subscription keys can be generated."
]
@ -108,7 +109,15 @@
"source": [
"ocp_apim_subscription_key = getpass.getpass(\n",
" \"Enter the subscription key to the GraphRag APIM:\"\n",
")"
")\n",
"\n",
"\"\"\"\n",
"\"Ocp-Apim-Subscription-Key\": \n",
" This is a custom HTTP header used by Azure API Management service (APIM) to \n",
" authenticate API requests. The value for this key should be set to the subscription \n",
" key provided by the Azure APIM instance in your GraphRAG resource group.\n",
"\"\"\"\n",
"headers = {\"Ocp-Apim-Subscription-Key\": ocp_apim_subscription_key}"
]
},
{
@ -118,13 +127,7 @@
"source": [
"#### Setup directories and API endpoint\n",
"\n",
"The following parameters are required to access and use the GraphRAG solution accelerator API:\n",
"* file_directory\n",
"* storage_name\n",
"* index_name\n",
"* endpoint\n",
"\n",
"For demonstration purposes, you may use the provided `get-wiki-articles.py` script to download a small set of wikipedia articles or provide your own data."
"For demonstration purposes, please use the provided `get-wiki-articles.py` script to download a small set of wikipedia articles or provide your own data (graphrag requires txt files to be utf-8 encoded)."
]
},
{
@ -137,12 +140,15 @@
"outputs": [],
"source": [
"\"\"\"\n",
"These parameters must be defined by the user:\n",
"These parameters must be defined by the notebook user:\n",
"\n",
"- file_directory: local directory where data files of interest are stored.\n",
"- storage_name: unique name for an Azure blob storage container where files will be uploaded.\n",
"- index_name: unique name for a single knowledge graph construction. Multiple indexes can be created from the same blob container of data.\n",
"- remote_endpoint: the endpoint URL for GraphRAG service (this is the Gateway URL found in the APIM resource).\n",
"- file_directory: a local directory of text files. The file structure should be flat,\n",
" with no nested directories. (i.e. file_directory/file1.txt, file_directory/file2.txt, etc.)\n",
"- storage_name: a unique name to identify a blob storage container in Azure where files\n",
" from `file_directory` will be uploaded.\n",
"- index_name: a unique name to identify a single graphrag knowledge graph index.\n",
" Note: Multiple indexes may be created from the same `storage_name` blob storage container.\n",
"- endpoint: the base/endpoint URL for the GraphRAG API (this is the Gateway URL found in the APIM resource).\n",
"\"\"\"\n",
"\n",
"file_directory = \"/Users/americanthinker/Downloads/Data/primer/small_subset/\"\n",
@ -168,8 +174,9 @@
"id": "11",
"metadata": {},
"source": [
"## Helper Functions\n",
"We've provided helper functions below that encapsulate http requests to make API interaction more intuitive."
"### Helper Functions\n",
"\n",
"For cleanliness, we've provided helper functions below that encapsulate http requests to make API interaction with each API endpoint more intuitive."
]
},
{
@ -181,15 +188,6 @@
},
"outputs": [],
"source": [
"\"\"\"\n",
"\"Ocp-Apim-Subscription-Key\": \n",
" This is a custom HTTP header used by Azure API Management service (APIM) to \n",
" authenticate API requests. The value for this key should be set to the subscription \n",
" key provided by the Azure APIM instance in your GraphRAG resource group.\n",
"\"\"\"\n",
"headers = {\"Ocp-Apim-Subscription-Key\": ocp_apim_subscription_key}\n",
"\n",
"\n",
"def upload_files(\n",
" file_directory: str,\n",
" storage_name: str,\n",
@ -448,7 +446,8 @@
"metadata": {},
"source": [
"## Upload files\n",
"Use the API to upload a collection of local files. The API will automatically creates a new data blob container to host these files in. For a set of large files, consider reducing the batch upload size in order to not overwhelm the API endpoint and prevent out-of-memory problems."
"\n",
"Use the API to upload a collection of local files. The API will create a new storage blob container to host these files in. For a set of large files, consider reducing the batch upload size in order to not overwhelm the API endpoint and prevent out-of-memory problems."
]
},
{
@ -520,13 +519,14 @@
"source": [
"## Auto-Template Generation (Optional)\n",
"\n",
"GraphRAG constructs a knowledge graph (KG) from data based on the ability to identify entities and the relationships between them. To improve the quality of the knowledge graph constructed by GraphRAG over private data, we provide a feature called \"Automatic Templating\". This capability takes user-provided data samples and generates custom-tailored prompts based on characteristics of that data. These custom prompts contain few-shot examples of entities and relationships, which can then be used to build a graphrag index."
"GraphRAG constructs a knowledge graph from data based on the ability to identify entities and the relationships between them. To improve the quality of the knowledge graph constructed by GraphRAG over private data, we provide a feature called \"Automatic Templating\". This capability takes user-provided data samples and generates custom-tailored prompts based on characteristics of that data. These custom prompts contain few-shot examples of entities and relationships, which can then be used to build a graphrag index."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -537,6 +537,8 @@
"cell_type": "code",
"execution_count": null,
"id": "21",
=======
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -546,7 +548,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "22",
=======
"id": "21",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"After running the previous cell, a new local directory (`prompts`) will be created. Please look at the prompts (`prompts/entity_extraction.txt`, `prompts/community_report.txt`, and `prompts/summarize_descriptions.txt`) that were generated from the user-provided data. Users are encouraged to spend some time and inspect/modify these prompts, taking into account characteristics of their data and their own goals of what kind/type of knowledge they wish to extract and model with graphrag."
@ -554,17 +560,25 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "23",
=======
"id": "22",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"## Indexing\n",
"## Build an Index\n",
"\n",
"After data files have been uploaded and (optionally) custom promps have been generated, it is time to construct a knowledge graph by building an index. If custom prompts are not provided, default built-in prompts will be used that we find generally work well."
"After data files have been uploaded and (optionally) custom promps have been generated, it is time to construct a knowledge graph by building an index. If custom prompts are not provided (demonstrated in the `1-Quickstart` notebook), default built-in prompts are used that we find generally work well."
]
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "24",
=======
"id": "23",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Start a new indexing job"
@ -573,7 +587,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "25",
=======
"id": "24",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -611,24 +629,38 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "26",
=======
"id": "25",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"Note: An indexing job may fail sometimes due to insufficient TPM quota of the GPT-4 turbo model. In this situation, an indexing job can be restarted by re-running the cell above with the same parameters. `graphrag` caches previous indexing results as a cost-savings measure so that restarting indexing jobs will \"pick up\" where the last job stopped."
"Note: An indexing job may fail sometimes due to insufficient TPM quota of the Azure OpenAI model. In this situation, an indexing job can be restarted by re-running the cell above with the same parameters. `graphrag` caches previous indexing results as a cost-savings measure so that restarting indexing jobs will \"pick up\" where the last job stopped."
]
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "27",
=======
"id": "26",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Check the status of an indexing job"
"#### Check the status of an indexing job\n",
"\n",
"Please wait for your index to reach 100 percent completion before continuing on to the next section (running queries). You may rerun the next cell multiple times to monitor status. Note: the indexing speed of graphrag is directly correlated to the TPM quota of the Azure OpenAI model you are using."
]
},
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "28",
=======
"id": "27",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -641,7 +673,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "29",
=======
"id": "28",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### List indexes\n",
@ -651,7 +687,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "30",
=======
"id": "29",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -661,7 +701,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "31",
=======
"id": "30",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Delete an indexing job\n",
@ -671,7 +715,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "32",
=======
"id": "31",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -683,17 +731,25 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "33",
=======
"id": "32",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"## Query\n",
"\n",
"After an indexing job has completed, the knowledge graph is ready to query. Two types of queries (global and local) are currently supported. In addition, you can issue a query over a single index or multiple indexes."
"Once an indexing job is complete, the knowledge graph is ready to query. Two types of queries (global and local) are currently supported. We encourage you to try both and experience the difference in responses. Note that query response time is also correlated to the TPM quota of the Azure OpenAI model you are using."
]
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "34",
=======
"id": "33",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Global Search\n",
@ -704,7 +760,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "35",
=======
"id": "34",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -722,16 +782,24 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "36",
=======
"id": "35",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"An *experimental* API endpoint has been designed to support streaming back the graphrag response while executing a global query (useful in chatbot applications)."
"An *experimental* API endpoint has been designed to support streaming back the graphrag response while executing a global query (useful in applications like a chatbot)."
]
},
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "37",
=======
"id": "36",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -742,7 +810,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "38",
=======
"id": "37",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Local Search\n",
@ -753,7 +825,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "39",
=======
"id": "38",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -771,7 +847,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "40",
=======
"id": "39",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"## Sources\n",
@ -783,7 +863,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "41",
=======
"id": "40",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Get a Report"
@ -792,7 +876,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "42",
=======
"id": "41",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -807,7 +895,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "43",
=======
"id": "42",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Get an Entity"
@ -816,7 +908,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "44",
=======
"id": "43",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -831,7 +927,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "45",
=======
"id": "44",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -842,7 +942,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "46",
=======
"id": "45",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -857,16 +961,25 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "47",
=======
"id": "46",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Get a Claim"
"#### Get a Claim\n",
"Note: claims are supported only if the solution accelerator deployment was initially configured to enable claims."
]
},
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "48",
=======
"id": "47",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -880,7 +993,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "49",
=======
"id": "48",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
@ -891,14 +1008,18 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "50",
=======
"id": "49",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# get a text unit id from one of the previous Source endpoint results (look for 'text_units' in the response)\n",
"text_unit_id = \"\"\n",
"text_unit_id = \"a1ea6b4d13016fa863f4d76a0dd532e3\"\n",
"if not text_unit_id:\n",
" raise ValueError(\n",
" \"Must provide a text_unit_id from previous source results. Look for 'text_units' in the response.\"\n",
@ -913,7 +1034,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "51",
=======
"id": "50",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"## Exploring the GraphRAG knowledge graph\n",
@ -924,7 +1049,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "52",
=======
"id": "51",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Basic knowledge graph statistics"
@ -933,7 +1062,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "53",
=======
"id": "52",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [
@ -944,7 +1077,11 @@
},
{
"cell_type": "markdown",
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "54",
=======
"id": "53",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"source": [
"#### Get a GraphML file"
@ -953,7 +1090,11 @@
{
"cell_type": "code",
"execution_count": null,
<<<<<<< HEAD:notebooks/HelloWorld.ipynb
"id": "55",
=======
"id": "54",
>>>>>>> f70e224 (Notebook cleanup and update (#36)):notebooks/2-Advanced_Getting_Started.ipynb
"metadata": {},
"outputs": [],
"source": [

View File

@ -10,4 +10,4 @@ For a faster example with less data
> python get-wiki-articles.py --short-summary --num-articles 1 testdata
```
2. Follow instructions in the `HelloWorld.ipynb` notebook to explore the GraphRAG API, by building an index of the data in `testdata` and executing queries.
2. Follow instructions in the `1-Quickstart.ipynb` notebook to explore the GraphRAG API, by building an index of the data in `testdata` and executing queries.