docling/docs/examples/rag_azuresearch.ipynb
Michele Dolfi 5458a88464
ci: add coverage and ruff (#1383)
* add coverage calculation and push

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* new codecov version and usage of token

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* enable ruff formatter instead of black and isort

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply ruff lint fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply ruff unsafe fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add removed imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* runs 1 on linter issues

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* finalize linter fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update pyproject.toml

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-04-14 18:01:26 +02:00

897 lines
94 KiB
Plaintext
Vendored
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Ag9kcX2B_atc"
},
"source": [
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_azuresearch.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG with Azure AI Search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"| Step | Tech | Execution |\n",
"| ------------------ | ------------------ | --------- |\n",
"| Embedding | Azure OpenAI | 🌐 Remote |\n",
"| Vector Store | Azure AI Search | 🌐 Remote |\n",
"| Gen AI | Azure OpenAI | 🌐 Remote |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## A recipe 🧑‍🍳 🐥 💚\n",
"\n",
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
"- [Docling](https://docling-project.github.io/docling/) for document parsing and chunking\n",
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
"\n",
"This sample demonstrates how to:\n",
"1. Parse a PDF with Docling.\n",
"2. Chunk the parsed text.\n",
"3. Use Azure OpenAI for embeddings.\n",
"4. Index and search in Azure AI Search.\n",
"5. Run a retrieval-augmented generation (RAG) query with Azure OpenAI GPT-4o.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If running in a fresh environment (like Google Colab), uncomment and run this single command:\n",
"%pip install \"docling~=2.12\" azure-search-documents==11.5.2 azure-identity openai rich torch python-dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 0: Prerequisites\n",
" - **Azure AI Search** resource\n",
" - **Azure OpenAI** resource with a deployed embedding and chat completion model (e.g. `text-embedding-3-small` and `gpt-4o`) \n",
" - **Docling 2.12+** (installs `docling_core` automatically) Docling installed (Python 3.8+ environment)\n",
"\n",
"- A **GPU-enabled environment** is preferred for faster parsing. Docling 2.12 automatically detects GPU if present.\n",
" - If you only have CPU, parsing large PDFs can be slower. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()\n",
"\n",
"\n",
"def _get_env(key, default=None):\n",
" try:\n",
" from google.colab import userdata\n",
"\n",
" try:\n",
" return userdata.get(key)\n",
" except userdata.SecretNotFoundError:\n",
" pass\n",
" except ImportError:\n",
" pass\n",
" return os.getenv(key, default)\n",
"\n",
"\n",
"AZURE_SEARCH_ENDPOINT = _get_env(\"AZURE_SEARCH_ENDPOINT\")\n",
"AZURE_SEARCH_KEY = _get_env(\"AZURE_SEARCH_KEY\") # Ensure this is your Admin Key\n",
"AZURE_SEARCH_INDEX_NAME = _get_env(\"AZURE_SEARCH_INDEX_NAME\", \"docling-rag-sample\")\n",
"AZURE_OPENAI_ENDPOINT = _get_env(\"AZURE_OPENAI_ENDPOINT\")\n",
"AZURE_OPENAI_API_KEY = _get_env(\"AZURE_OPENAI_API_KEY\")\n",
"AZURE_OPENAI_API_VERSION = _get_env(\"AZURE_OPENAI_API_VERSION\", \"2024-10-21\")\n",
"AZURE_OPENAI_CHAT_MODEL = _get_env(\n",
" \"AZURE_OPENAI_CHAT_MODEL\"\n",
") # Using a deployed model named \"gpt-4o\"\n",
"AZURE_OPENAI_EMBEDDINGS = _get_env(\n",
" \"AZURE_OPENAI_EMBEDDINGS\", \"text-embedding-3-small\"\n",
") # Using a deployed model named \"text-embeddings-3-small\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 1: Parse the PDF with Docling\n",
"\n",
"Well parse the **Microsoft GraphRAG Research Paper** (~15 pages). Parsing should be relatively quick, even on CPU, but it will be faster on a GPU or MPS device if available.\n",
"\n",
"*(If you prefer a different document, simply provide a different URL or local file path.)*"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">Parsing a ~</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">15</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">-page PDF. The process should be relatively quick, even on CPU...</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;33mParsing a ~\u001b[0m\u001b[1;33m15\u001b[0m\u001b[1;33m-page PDF. The process should be relatively quick, even on CPU\u001b[0m\u001b[1;33m...\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─────────────────────────────────────────── Docling Markdown Preview ────────────────────────────────────────────╮\n",
"│ ## From Local to Global: A Graph RAG Approach to Query-Focused Summarization │\n",
"│ │\n",
"│ Darren Edge 1† │\n",
"│ │\n",
"│ Ha Trinh 1† │\n",
"│ │\n",
"│ Newman Cheng 2 │\n",
"│ │\n",
"│ Joshua Bradley 2 │\n",
"│ │\n",
"│ Alex Chao 3 │\n",
"│ │\n",
"│ Apurva Mody 3 │\n",
"│ │\n",
"│ Steven Truitt 2 │\n",
"│ │\n",
"│ ## Jonathan Larson 1 │\n",
"│ │\n",
"│ 1 Microsoft Research 2 Microsoft Strategic Missions and Technologies 3 Microsoft Office of the CTO │\n",
"│ │\n",
"│ { daedge,trinhha,newmancheng,joshbradley,achao,moapurva,steventruitt,jolarso } @microsoft.com │\n",
"│ │\n",
"│ † These authors contributed equally to this work │\n",
"│ │\n",
"│ ## Abstract │\n",
"│ │\n",
"│ The use of retrieval-augmented gen... │\n",
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
"</pre>\n"
],
"text/plain": [
"╭─────────────────────────────────────────── Docling Markdown Preview ────────────────────────────────────────────╮\n",
"│ ## From Local to Global: A Graph RAG Approach to Query-Focused Summarization │\n",
"│ │\n",
"│ Darren Edge 1† │\n",
"│ │\n",
"│ Ha Trinh 1† │\n",
"│ │\n",
"│ Newman Cheng 2 │\n",
"│ │\n",
"│ Joshua Bradley 2 │\n",
"│ │\n",
"│ Alex Chao 3 │\n",
"│ │\n",
"│ Apurva Mody 3 │\n",
"│ │\n",
"│ Steven Truitt 2 │\n",
"│ │\n",
"│ ## Jonathan Larson 1 │\n",
"│ │\n",
"│ 1 Microsoft Research 2 Microsoft Strategic Missions and Technologies 3 Microsoft Office of the CTO │\n",
"│ │\n",
"│ { daedge,trinhha,newmancheng,joshbradley,achao,moapurva,steventruitt,jolarso } @microsoft.com │\n",
"│ │\n",
"│ † These authors contributed equally to this work │\n",
"│ │\n",
"│ ## Abstract │\n",
"│ │\n",
"│ The use of retrieval-augmented gen... │\n",
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from rich.console import Console\n",
"from rich.panel import Panel\n",
"\n",
"from docling.document_converter import DocumentConverter\n",
"\n",
"console = Console()\n",
"\n",
"# This URL points to the Microsoft GraphRAG Research Paper (arXiv: 2404.16130), ~15 pages\n",
"source_url = \"https://arxiv.org/pdf/2404.16130\"\n",
"\n",
"console.print(\n",
" \"[bold yellow]Parsing a ~15-page PDF. The process should be relatively quick, even on CPU...[/bold yellow]\"\n",
")\n",
"converter = DocumentConverter()\n",
"result = converter.convert(source_url)\n",
"\n",
"# Optional: preview the parsed Markdown\n",
"md_preview = result.document.export_to_markdown()\n",
"console.print(Panel(md_preview[:500] + \"...\", title=\"Docling Markdown Preview\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 2: Hierarchical Chunking\n",
"We convert the `Document` into smaller chunks for embedding and indexing. The built-in `HierarchicalChunker` preserves structure. "
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Total chunks from PDF: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">106</span>\n",
"</pre>\n"
],
"text/plain": [
"Total chunks from PDF: \u001b[1;36m106\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from docling.chunking import HierarchicalChunker\n",
"\n",
"chunker = HierarchicalChunker()\n",
"doc_chunks = list(chunker.chunk(result.document))\n",
"\n",
"all_chunks = []\n",
"for idx, c in enumerate(doc_chunks):\n",
" chunk_text = c.text\n",
" all_chunks.append((f\"chunk_{idx}\", chunk_text))\n",
"\n",
"console.print(f\"Total chunks from PDF: {len(all_chunks)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 3: Create Azure AI Search Index and Push Chunk Embeddings\n",
"Well define a vector index in Azure AI Search, then embed each chunk using Azure OpenAI and upload in batches."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Index <span style=\"color: #008000; text-decoration-color: #008000\">'docling-rag-sample-2'</span> created.\n",
"</pre>\n"
],
"text/plain": [
"Index \u001b[32m'docling-rag-sample-2'\u001b[0m created.\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from azure.core.credentials import AzureKeyCredential\n",
"from azure.search.documents.indexes import SearchIndexClient\n",
"from azure.search.documents.indexes.models import (\n",
" AzureOpenAIVectorizer,\n",
" AzureOpenAIVectorizerParameters,\n",
" HnswAlgorithmConfiguration,\n",
" SearchableField,\n",
" SearchField,\n",
" SearchFieldDataType,\n",
" SearchIndex,\n",
" SimpleField,\n",
" VectorSearch,\n",
" VectorSearchProfile,\n",
")\n",
"from rich.console import Console\n",
"\n",
"console = Console()\n",
"\n",
"VECTOR_DIM = 1536 # Adjust based on your chosen embeddings model\n",
"\n",
"index_client = SearchIndexClient(\n",
" AZURE_SEARCH_ENDPOINT, AzureKeyCredential(AZURE_SEARCH_KEY)\n",
")\n",
"\n",
"\n",
"def create_search_index(index_name: str):\n",
" # Define fields\n",
" fields = [\n",
" SimpleField(name=\"chunk_id\", type=SearchFieldDataType.String, key=True),\n",
" SearchableField(name=\"content\", type=SearchFieldDataType.String),\n",
" SearchField(\n",
" name=\"content_vector\",\n",
" type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
" searchable=True,\n",
" filterable=False,\n",
" sortable=False,\n",
" facetable=False,\n",
" vector_search_dimensions=VECTOR_DIM,\n",
" vector_search_profile_name=\"default\",\n",
" ),\n",
" ]\n",
" # Vector search config with an AzureOpenAIVectorizer\n",
" vector_search = VectorSearch(\n",
" algorithms=[HnswAlgorithmConfiguration(name=\"default\")],\n",
" profiles=[\n",
" VectorSearchProfile(\n",
" name=\"default\",\n",
" algorithm_configuration_name=\"default\",\n",
" vectorizer_name=\"default\",\n",
" )\n",
" ],\n",
" vectorizers=[\n",
" AzureOpenAIVectorizer(\n",
" vectorizer_name=\"default\",\n",
" parameters=AzureOpenAIVectorizerParameters(\n",
" resource_url=AZURE_OPENAI_ENDPOINT,\n",
" deployment_name=AZURE_OPENAI_EMBEDDINGS,\n",
" model_name=\"text-embedding-3-small\",\n",
" api_key=AZURE_OPENAI_API_KEY,\n",
" ),\n",
" )\n",
" ],\n",
" )\n",
"\n",
" # Create or update the index\n",
" new_index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)\n",
" try:\n",
" index_client.delete_index(index_name)\n",
" except Exception:\n",
" pass\n",
"\n",
" index_client.create_or_update_index(new_index)\n",
" console.print(f\"Index '{index_name}' created.\")\n",
"\n",
"\n",
"create_search_index(AZURE_SEARCH_INDEX_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Generate Embeddings and Upload to Azure AI Search\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploaded batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> -&gt; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span>; all_succeeded: <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, first_doc_status_code: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">201</span>\n",
"</pre>\n"
],
"text/plain": [
"Uploaded batch \u001b[1;36m0\u001b[0m -> \u001b[1;36m50\u001b[0m; all_succeeded: \u001b[3;92mTrue\u001b[0m, first_doc_status_code: \u001b[1;36m201\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploaded batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span> -&gt; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span>; all_succeeded: <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, first_doc_status_code: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">201</span>\n",
"</pre>\n"
],
"text/plain": [
"Uploaded batch \u001b[1;36m50\u001b[0m -> \u001b[1;36m100\u001b[0m; all_succeeded: \u001b[3;92mTrue\u001b[0m, first_doc_status_code: \u001b[1;36m201\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploaded batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span> -&gt; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">106</span>; all_succeeded: <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, first_doc_status_code: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">201</span>\n",
"</pre>\n"
],
"text/plain": [
"Uploaded batch \u001b[1;36m100\u001b[0m -> \u001b[1;36m106\u001b[0m; all_succeeded: \u001b[3;92mTrue\u001b[0m, first_doc_status_code: \u001b[1;36m201\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">All chunks uploaded to Azure Search.\n",
"</pre>\n"
],
"text/plain": [
"All chunks uploaded to Azure Search.\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from azure.search.documents import SearchClient\n",
"from openai import AzureOpenAI\n",
"\n",
"search_client = SearchClient(\n",
" AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_INDEX_NAME, AzureKeyCredential(AZURE_SEARCH_KEY)\n",
")\n",
"openai_client = AzureOpenAI(\n",
" api_key=AZURE_OPENAI_API_KEY,\n",
" api_version=AZURE_OPENAI_API_VERSION,\n",
" azure_endpoint=AZURE_OPENAI_ENDPOINT,\n",
")\n",
"\n",
"\n",
"def embed_text(text: str):\n",
" \"\"\"\n",
" Helper to generate embeddings with Azure OpenAI.\n",
" \"\"\"\n",
" response = openai_client.embeddings.create(\n",
" input=text, model=AZURE_OPENAI_EMBEDDINGS\n",
" )\n",
" return response.data[0].embedding\n",
"\n",
"\n",
"upload_docs = []\n",
"for chunk_id, chunk_text in all_chunks:\n",
" embedding_vector = embed_text(chunk_text)\n",
" upload_docs.append(\n",
" {\n",
" \"chunk_id\": chunk_id,\n",
" \"content\": chunk_text,\n",
" \"content_vector\": embedding_vector,\n",
" }\n",
" )\n",
"\n",
"\n",
"BATCH_SIZE = 50\n",
"for i in range(0, len(upload_docs), BATCH_SIZE):\n",
" subset = upload_docs[i : i + BATCH_SIZE]\n",
" resp = search_client.upload_documents(documents=subset)\n",
"\n",
" all_succeeded = all(r.succeeded for r in resp)\n",
" console.print(\n",
" f\"Uploaded batch {i} -> {i + len(subset)}; all_succeeded: {all_succeeded}, \"\n",
" f\"first_doc_status_code: {resp[0].status_code}\"\n",
" )\n",
"\n",
"console.print(\"All chunks uploaded to Azure Search.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 4: Perform RAG over PDF\n",
"Combine retrieval from Azure AI Search with Azure OpenAI Chat Completions (aka. grounding your LLM)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╭──────────────────────────────────────────────────</span> RAG Prompt <span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">───────────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ You are an AI assistant helping answering questions about Microsoft GraphRAG. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Use ONLY the text below to answer the user's question. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ If the answer isn't in the text, say you don't know. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Context: │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Community summaries vs. source texts. When comparing community summaries to source texts using Graph RAG, │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ community summaries generally provided a small but consistent improvement in answer comprehensiveness and │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ diversity, except for root-level summaries. Intermediate-level summaries in the Podcast dataset and low-level │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ community summaries in the News dataset achieved comprehensiveness win rates of 57% and 64%, respectively. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Diversity win rates were 57% for Podcast intermediate-level summaries and 60% for News low-level community │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summaries. Table 3 also illustrates the scalability advantages of Graph RAG compared to source text │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summarization: for low-level community summaries ( C3 ), Graph RAG required 26-33% fewer context tokens, while │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ for root-level community summaries ( C0 ), it required over 97% fewer tokens. For a modest drop in performance │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ compared with other global methods, root-level Graph RAG offers a highly efficient method for the iterative │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ question answering that characterizes sensemaking activity, while retaining advantages in comprehensiveness │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (72% win rate) and diversity (62% win rate) over na¨ıve RAG. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ We have presented a global approach to Graph RAG, combining knowledge graph generation, retrieval-augmented │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generation (RAG), and query-focused summarization (QFS) to support human sensemaking over entire text corpora. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Initial evaluations show substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ diversity of answers, as well as favorable comparisons to a global but graph-free approach using map-reduce │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ source text summarization. For situations requiring many global queries over the same dataset, summaries of │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ root-level communities in the entity-based graph index provide a data index that is both superior to na¨ıve RAG │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ and achieves competitive performance to other global methods at a fraction of the token cost. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Trade-offs of building a graph index . We consistently observed Graph RAG achieve the best headto-head results │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ against other methods, but in many cases the graph-free approach to global summarization of source texts │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ performed competitively. The real-world decision about whether to invest in building a graph index depends on │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ multiple factors, including the compute budget, expected number of lifetime queries per dataset, and value │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ obtained from other aspects of the graph index (including the generic community summaries and the use of other │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ graph-related RAG approaches). │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Future work . The graph index, rich text annotations, and hierarchical community structure supporting the │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ current Graph RAG approach offer many possibilities for refinement and adaptation. This includes RAG approaches │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ that operate in a more local manner, via embedding-based matching of user queries and graph annotations, as │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ well as the possibility of hybrid RAG schemes that combine embedding-based matching against community reports │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ before employing our map-reduce summarization mechanisms. This 'roll-up' operation could also be extended │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ across more levels of the community hierarchy, as well as implemented as a more exploratory 'drill down' │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ mechanism that follows the information scent contained in higher-level community summaries. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Advanced RAG systems include pre-retrieval, retrieval, post-retrieval strategies designed to overcome the │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ drawbacks of Na¨ıve RAG, while Modular RAG systems include patterns for iterative and dynamic cycles of │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ interleaved retrieval and generation (Gao et al., 2023). Our implementation of Graph RAG incorporates multiple │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ concepts related to other systems. For example, our community summaries are a kind of self-memory (Selfmem, │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Cheng et al., 2024) for generation-augmented retrieval (GAR, Mao et al., 2020) that facilitates future │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generation cycles, while our parallel generation of community answers from these summaries is a kind of │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ iterative (Iter-RetGen, Shao et al., 2023) or federated (FeB4RAG, Wang et al., 2024) retrieval-generation │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ strategy. Other systems have also combined these concepts for multi-document summarization (CAiRE-COVID, Su et │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ al., 2020) and multi-hop question answering (ITRG, Feng et al., 2023; IR-CoT, Trivedi et al., 2022; DSP, │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Khattab et al., 2022). Our use of a hierarchical index and summarization also bears resemblance to further │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ approaches, such as generating a hierarchical index of text chunks by clustering the vectors of text embeddings │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (RAPTOR, Sarthi et al., 2024) or generating a 'tree of clarifications' to answer multiple interpretations of │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ambiguous questions (Kim et al., 2023). However, none of these iterative or hierarchical approaches use the │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ kind of self-generated graph index that enables Graph RAG. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ source enables large language models (LLMs) to answer questions over private and/or previously unseen document │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ collections. However, RAG fails on global questions directed at an entire text corpus, such as 'What are the │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ main themes in the dataset?', since this is inherently a queryfocused summarization (QFS) task, rather than an │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ typical RAGsystems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ question answering over private text corpora that scales with both the generality of user questions and the │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summaries for all groups of closely-related entities. Given a question, each community summary is used to │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generate a partial response, before all partial responses are again summarized in a final response to the user. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ leads to substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and diversity of │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ forthcoming at https://aka . ms/graphrag . │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Given the multi-stage nature of our Graph RAG mechanism, the multiple conditions we wanted to compare, and the │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ lack of gold standard answers to our activity-based sensemaking questions, we decided to adopt a head-to-head │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ comparison approach using an LLM evaluator. We selected three target metrics capturing qualities that are │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ desirable for sensemaking activities, as well as a control metric (directness) used as a indicator of validity. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Since directness is effectively in opposition to comprehensiveness and diversity, we would not expect any │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ method to win across all four metrics. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Figure 1: Graph RAG pipeline using an LLM-derived graph index of source document text. This index spans nodes │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (e.g., entities), edges (e.g., relationships), and covariates (e.g., claims) that have been detected, │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ extracted, and summarized by LLM prompts tailored to the domain of the dataset. Community detection (e.g., │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Leiden, Traag et al., 2019) is used to partition the graph index into groups of elements (nodes, edges, │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ covariates) that the LLM can summarize in parallel at both indexing time and query time. The 'global answer' to │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ a given query is produced using a final round of query-focused summarization over all community summaries │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ reporting relevance to that query. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Retrieval-augmented generation (RAG, Lewis et al., 2020) is an established approach to answering user questions │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ over entire datasets, but it is designed for situations where these answers are contained locally within │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ regions of text whose retrieval provides sufficient grounding for the generation task. Instead, a more │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ appropriate task framing is query-focused summarization (QFS, Dang, 2006), and in particular, query-focused │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ abstractive summarization that generates natural language summaries and not just concatenated excerpts (Baumel │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ et al., 2018; Laskar et al., 2020; Yao et al., 2017) . In recent years, however, such distinctions between │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summarization tasks that are abstractive versus extractive, generic versus query-focused, and single-document │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ versus multi-document, have become less relevant. While early applications of the transformer architecture │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ showed substantial improvements on the state-of-the-art for all such summarization tasks (Goodwin et al., 2020; │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Laskar et al., 2022; Liu and Lapata, 2019), these tasks are now trivialized by modern LLMs, including the GPT │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (Achiam et al., 2023; Brown et al., 2020), Llama (Touvron et al., 2023), and Gemini (Anil et al., 2023) series, │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ all of which can use in-context learning to summarize any content provided in their context window. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ --- │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ community descriptions provide complete coverage of the underlying graph index and the input documents it │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ represents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ first using each community summary to answer the query independently and in parallel, then summarizing all │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ relevant partial answers into a final global answer. │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Question: What are the main advantages of using the Graph RAG approach for query-focused summarization compared │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ to traditional RAG methods? │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Answer: │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;31m╭─\u001b[0m\u001b[1;31m─────────────────────────────────────────────────\u001b[0m RAG Prompt \u001b[1;31m──────────────────────────────────────────────────\u001b[0m\u001b[1;31m─╮\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mYou are an AI assistant helping answering questions about Microsoft GraphRAG.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mUse ONLY the text below to answer the user's question.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mIf the answer isn't in the text, say you don't know.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mContext:\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mCommunity summaries vs. source texts. When comparing community summaries to source texts using Graph RAG, \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcommunity summaries generally provided a small but consistent improvement in answer comprehensiveness and \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdiversity, except for root-level summaries. Intermediate-level summaries in the Podcast dataset and low-level \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcommunity summaries in the News dataset achieved comprehensiveness win rates of 57% and 64%, respectively. \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mDiversity win rates were 57% for Podcast intermediate-level summaries and 60% for News low-level community \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummaries. Table 3 also illustrates the scalability advantages of Graph RAG compared to source text \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummarization: for low-level community summaries ( C3 ), Graph RAG required 26-33% fewer context tokens, while \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mfor root-level community summaries ( C0 ), it required over 97% fewer tokens. For a modest drop in performance \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcompared with other global methods, root-level Graph RAG offers a highly efficient method for the iterative \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mquestion answering that characterizes sensemaking activity, while retaining advantages in comprehensiveness \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(72% win rate) and diversity (62% win rate) over na¨ıve RAG.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mWe have presented a global approach to Graph RAG, combining knowledge graph generation, retrieval-augmented \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgeneration (RAG), and query-focused summarization (QFS) to support human sensemaking over entire text corpora. \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mInitial evaluations show substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdiversity of answers, as well as favorable comparisons to a global but graph-free approach using map-reduce \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msource text summarization. For situations requiring many global queries over the same dataset, summaries of \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mroot-level communities in the entity-based graph index provide a data index that is both superior to na¨ıve RAG\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mand achieves competitive performance to other global methods at a fraction of the token cost.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mTrade-offs of building a graph index . We consistently observed Graph RAG achieve the best headto-head results \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31magainst other methods, but in many cases the graph-free approach to global summarization of source texts \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mperformed competitively. The real-world decision about whether to invest in building a graph index depends on \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmultiple factors, including the compute budget, expected number of lifetime queries per dataset, and value \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mobtained from other aspects of the graph index (including the generic community summaries and the use of other \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgraph-related RAG approaches).\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mFuture work . The graph index, rich text annotations, and hierarchical community structure supporting the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcurrent Graph RAG approach offer many possibilities for refinement and adaptation. This includes RAG approaches\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mthat operate in a more local manner, via embedding-based matching of user queries and graph annotations, as \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mwell as the possibility of hybrid RAG schemes that combine embedding-based matching against community reports \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mbefore employing our map-reduce summarization mechanisms. This 'roll-up' operation could also be extended \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31macross more levels of the community hierarchy, as well as implemented as a more exploratory 'drill down' \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmechanism that follows the information scent contained in higher-level community summaries.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mAdvanced RAG systems include pre-retrieval, retrieval, post-retrieval strategies designed to overcome the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdrawbacks of Na¨ıve RAG, while Modular RAG systems include patterns for iterative and dynamic cycles of \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31minterleaved retrieval and generation (Gao et al., 2023). Our implementation of Graph RAG incorporates multiple \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mconcepts related to other systems. For example, our community summaries are a kind of self-memory (Selfmem, \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mCheng et al., 2024) for generation-augmented retrieval (GAR, Mao et al., 2020) that facilitates future \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgeneration cycles, while our parallel generation of community answers from these summaries is a kind of \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31miterative (Iter-RetGen, Shao et al., 2023) or federated (FeB4RAG, Wang et al., 2024) retrieval-generation \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mstrategy. Other systems have also combined these concepts for multi-document summarization (CAiRE-COVID, Su et \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mal., 2020) and multi-hop question answering (ITRG, Feng et al., 2023; IR-CoT, Trivedi et al., 2022; DSP, \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mKhattab et al., 2022). Our use of a hierarchical index and summarization also bears resemblance to further \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mapproaches, such as generating a hierarchical index of text chunks by clustering the vectors of text embeddings\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(RAPTOR, Sarthi et al., 2024) or generating a 'tree of clarifications' to answer multiple interpretations of \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mambiguous questions (Kim et al., 2023). However, none of these iterative or hierarchical approaches use the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mkind of self-generated graph index that enables Graph RAG.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mThe use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msource enables large language models (LLMs) to answer questions over private and/or previously unseen document \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcollections. However, RAG fails on global questions directed at an entire text corpus, such as 'What are the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmain themes in the dataset?', since this is inherently a queryfocused summarization (QFS) task, rather than an \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mexplicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mtypical RAGsystems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mquestion answering over private text corpora that scales with both the generality of user questions and the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mquantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mstages: first to derive an entity knowledge graph from the source documents, then to pregenerate community \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummaries for all groups of closely-related entities. Given a question, each community summary is used to \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgenerate a partial response, before all partial responses are again summarized in a final response to the user.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mFor a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mleads to substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and diversity of \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgenerated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mforthcoming at https://aka . ms/graphrag .\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mGiven the multi-stage nature of our Graph RAG mechanism, the multiple conditions we wanted to compare, and the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mlack of gold standard answers to our activity-based sensemaking questions, we decided to adopt a head-to-head \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcomparison approach using an LLM evaluator. We selected three target metrics capturing qualities that are \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdesirable for sensemaking activities, as well as a control metric (directness) used as a indicator of validity.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mSince directness is effectively in opposition to comprehensiveness and diversity, we would not expect any \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmethod to win across all four metrics.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mFigure 1: Graph RAG pipeline using an LLM-derived graph index of source document text. This index spans nodes \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(e.g., entities), edges (e.g., relationships), and covariates (e.g., claims) that have been detected, \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mextracted, and summarized by LLM prompts tailored to the domain of the dataset. Community detection (e.g., \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mLeiden, Traag et al., 2019) is used to partition the graph index into groups of elements (nodes, edges, \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcovariates) that the LLM can summarize in parallel at both indexing time and query time. The 'global answer' to\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31ma given query is produced using a final round of query-focused summarization over all community summaries \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mreporting relevance to that query.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mRetrieval-augmented generation (RAG, Lewis et al., 2020) is an established approach to answering user questions\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mover entire datasets, but it is designed for situations where these answers are contained locally within \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mregions of text whose retrieval provides sufficient grounding for the generation task. Instead, a more \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mappropriate task framing is query-focused summarization (QFS, Dang, 2006), and in particular, query-focused \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mabstractive summarization that generates natural language summaries and not just concatenated excerpts (Baumel \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31met al., 2018; Laskar et al., 2020; Yao et al., 2017) . In recent years, however, such distinctions between \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummarization tasks that are abstractive versus extractive, generic versus query-focused, and single-document \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mversus multi-document, have become less relevant. While early applications of the transformer architecture \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mshowed substantial improvements on the state-of-the-art for all such summarization tasks (Goodwin et al., 2020;\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mLaskar et al., 2022; Liu and Lapata, 2019), these tasks are now trivialized by modern LLMs, including the GPT \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(Achiam et al., 2023; Brown et al., 2020), Llama (Touvron et al., 2023), and Gemini (Anil et al., 2023) series,\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mall of which can use in-context learning to summarize any content provided in their context window.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcommunity descriptions provide complete coverage of the underlying graph index and the input documents it \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mrepresents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mfirst using each community summary to answer the query independently and in parallel, then summarizing all \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mrelevant partial answers into a final global answer.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mQuestion: What are the main advantages of using the Graph RAG approach for query-focused summarization compared\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mto traditional RAG methods?\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mAnswer:\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
"\u001b[1;31m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╭─────────────────────────────────────────────────</span> RAG Response <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">──────────────────────────────────────────────────╮</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ The main advantages of using the Graph RAG approach for query-focused summarization compared to traditional RAG │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ methods include: │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 1. **Improved Comprehensiveness and Diversity**: Graph RAG shows substantial improvements over a naïve RAG │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ baseline in terms of the comprehensiveness and diversity of answers. This is particularly beneficial for global │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ sensemaking questions over large datasets. │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 2. **Scalability**: Graph RAG provides scalability advantages, achieving efficient summarization with │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ significantly fewer context tokens required. For instance, it requires 26-33% fewer tokens for low-level │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ community summaries and over 97% fewer tokens for root-level summaries compared to source text summarization. │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 3. **Efficiency in Iterative Question Answering**: Root-level Graph RAG offers a highly efficient method for │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ iterative question answering, which is crucial for sensemaking activities, with only a modest drop in │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ performance compared to other global methods. │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 4. **Global Query Handling**: It supports handling global queries effectively, as it combines knowledge graph │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ generation, retrieval-augmented generation, and query-focused summarization, making it suitable for sensemaking │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ over entire text corpora. │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 5. **Hierarchical Indexing and Summarization**: The use of a hierarchical index and summarization allows for │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ efficient processing and summarizing of community summaries into a final global answer, facilitating a │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ comprehensive coverage of the underlying graph index and input documents. │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 6. **Reduced Token Cost**: For situations requiring many global queries over the same dataset, Graph RAG │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ achieves competitive performance to other global methods at a fraction of the token cost. │</span>\n",
"<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;32m╭─\u001b[0m\u001b[1;32m────────────────────────────────────────────────\u001b[0m RAG Response \u001b[1;32m─────────────────────────────────────────────────\u001b[0m\u001b[1;32m─╮\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mThe main advantages of using the Graph RAG approach for query-focused summarization compared to traditional RAG\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mmethods include:\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m1. **Improved Comprehensiveness and Diversity**: Graph RAG shows substantial improvements over a naïve RAG \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mbaseline in terms of the comprehensiveness and diversity of answers. This is particularly beneficial for global\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32msensemaking questions over large datasets.\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m2. **Scalability**: Graph RAG provides scalability advantages, achieving efficient summarization with \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32msignificantly fewer context tokens required. For instance, it requires 26-33% fewer tokens for low-level \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mcommunity summaries and over 97% fewer tokens for root-level summaries compared to source text summarization.\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m3. **Efficiency in Iterative Question Answering**: Root-level Graph RAG offers a highly efficient method for \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32miterative question answering, which is crucial for sensemaking activities, with only a modest drop in \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mperformance compared to other global methods.\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m4. **Global Query Handling**: It supports handling global queries effectively, as it combines knowledge graph \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mgeneration, retrieval-augmented generation, and query-focused summarization, making it suitable for sensemaking\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mover entire text corpora.\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m5. **Hierarchical Indexing and Summarization**: The use of a hierarchical index and summarization allows for \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mefficient processing and summarizing of community summaries into a final global answer, facilitating a \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mcomprehensive coverage of the underlying graph index and input documents.\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m6. **Reduced Token Cost**: For situations requiring many global queries over the same dataset, Graph RAG \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32machieves competitive performance to other global methods at a fraction of the token cost.\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
"\u001b[1;32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from typing import Optional\n",
"\n",
"from azure.search.documents.models import VectorizableTextQuery\n",
"\n",
"\n",
"def generate_chat_response(prompt: str, system_message: Optional[str] = None):\n",
" \"\"\"\n",
" Generates a single-turn chat response using Azure OpenAI Chat.\n",
" If you need multi-turn conversation or follow-up queries, you'll have to\n",
" maintain the messages list externally.\n",
" \"\"\"\n",
" messages = []\n",
" if system_message:\n",
" messages.append({\"role\": \"system\", \"content\": system_message})\n",
" messages.append({\"role\": \"user\", \"content\": prompt})\n",
"\n",
" completion = openai_client.chat.completions.create(\n",
" model=AZURE_OPENAI_CHAT_MODEL, messages=messages, temperature=0.7\n",
" )\n",
" return completion.choices[0].message.content\n",
"\n",
"\n",
"user_query = \"What are the main advantages of using the Graph RAG approach for query-focused summarization compared to traditional RAG methods?\"\n",
"user_embed = embed_text(user_query)\n",
"\n",
"vector_query = VectorizableTextQuery(\n",
" text=user_query, # passing in text for a hybrid search\n",
" k_nearest_neighbors=5,\n",
" fields=\"content_vector\",\n",
")\n",
"\n",
"search_results = search_client.search(\n",
" search_text=user_query, vector_queries=[vector_query], select=[\"content\"], top=10\n",
")\n",
"\n",
"retrieved_chunks = []\n",
"for result in search_results:\n",
" snippet = result[\"content\"]\n",
" retrieved_chunks.append(snippet)\n",
"\n",
"context_str = \"\\n---\\n\".join(retrieved_chunks)\n",
"rag_prompt = f\"\"\"\n",
"You are an AI assistant helping answering questions about Microsoft GraphRAG.\n",
"Use ONLY the text below to answer the user's question.\n",
"If the answer isn't in the text, say you don't know.\n",
"\n",
"Context:\n",
"{context_str}\n",
"\n",
"Question: {user_query}\n",
"Answer:\n",
"\"\"\"\n",
"\n",
"final_answer = generate_chat_response(rag_prompt)\n",
"\n",
"console.print(Panel(rag_prompt, title=\"RAG Prompt\", style=\"bold red\"))\n",
"console.print(Panel(final_answer, title=\"RAG Response\", style=\"bold green\"))"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.8"
}
},
"nbformat": 4,
"nbformat_minor": 0
}