diff --git a/config/yaml/index.html b/config/yaml/index.html index 6cf7d7c2..8bac044f 100644 --- a/config/yaml/index.html +++ b/config/yaml/index.html @@ -1455,7 +1455,7 @@ api_key: ${GRAPHRAG_API_KEY}

Config Sections

-

Indexing

+

Language Model Setup

models

This is a dict of model configurations. The dict key is used to reference this configuration elsewhere when a model instance is desired. In this way, you can specify as many different models as you need, and reference them differentially in the workflow steps.

For example: @@ -1473,137 +1473,155 @@

Fields

-

embed_text

-

By default, the GraphRAG indexer will only export embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be customized by setting the target and names fields.

-

Supported embeddings names are: -- text_unit.text -- document.text -- entity.title -- entity.description -- relationship.description -- community.title -- community.summary -- community.full_content

+

Input Files and Chunking

+

input

+

Our pipeline can ingest .csv, .txt, or .json data from an input folder. See the inputs page for more details and examples.

Fields

+

chunks

+

These settings configure how we parse documents into text chunks. This is necessary because very large documents may not fit into a single context window, and graph extraction accuracy can be modulated. Also note the metadata setting in the input document config, which will replicate document metadata into each chunk.

+

Fields

+ +

Outputs and Storage

+

output

+

This section controls the storage mechanism used by the pipeline used for exporting output tables.

+

Fields

+ +

update_index_output

+

The section defines a secondary storage location for running incremental indexing, to preserve your original outputs.

+

Fields

+ +

cache

+

This section controls the cache mechanism used by the pipeline. This is used to cache LLM invocation results for faster performance when re-running the indexing process.

+

Fields

+ +

reporting

+

This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to the console or to an Azure Blob Storage container.

+

Fields

+ +

vector_store

+

Where to put all vectors for the system. Configured for lancedb by default. This is a dict, with the key used to identify individual store parameters (e.g., for text embedding).

+

Fields

+ +

Workflow Configurations

+

These settings control each individual workflow as they execute.

+

workflows

+

list[str] - This is a list of workflow names to run, in order. GraphRAG has built-in pipelines to configure this, but you can run exactly and only what you want by specifying the list here. Useful if you have done part of the processing yourself.

+

embed_text

+

By default, the GraphRAG indexer will only export embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be customized by setting the target and names fields.

+

Supported embeddings names are:

+ +

Fields

+ -

vector_store

-

Where to put all vectors for the system. Configured for lancedb by default.

-

Fields

- -

input

-

Our pipeline can ingest .csv or .txt data from an input folder. These files can be nested within subfolders. In general, CSV-based data provides the most customizability. Each CSV should at least contain a text field. You can use the metadata list to specify additional columns from the CSV to include as headers in each text chunk, allowing you to repeat document content within each chunk for better LLM inclusion.

-

Fields

- -

chunks

-

These settings configure how we parse documents into text chunks. This is necessary because very large documents may not fit into a single context window, and graph extraction accuracy can be modulated. Also note the metadata setting in the input document config, which will replicate document metadata into each chunk.

-

Fields

- -

cache

-

This section controls the cache mechanism used by the pipeline. This is used to cache LLM invocation results.

-

Fields

- -

output

-

This section controls the storage mechanism used by the pipeline used for exporting output tables.

-

Fields

- -

update_index_storage

-

The section defines a secondary storage location for running incremental indexing, to preserve your original outputs.

-

Fields

- -

reporting

-

This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to the console or to an Azure Blob Storage container.

-

Fields

-

extract_graph

+

Tune the language model-based graph extraction process.

Fields

summarize_descriptions

Fields

@@ -1629,26 +1647,9 @@
  • noun_phrase_tags list[str] - List of noun phrase tags to ignore.
  • noun_phrase_grammars dict[str, str] - Noun phrase grammars for the model (cfg-only).
  • -

    extract_claims

    -

    Fields

    - -

    community_reports

    -

    Fields

    -

    prune_graph

    Parameters for manual graph pruning. This can be used to optimize the modularity of your graph clusters, by removing overly-connected or rare nodes.

    -

    Fields

    +

    Fields

    cluster_graph

    These are the settings used for Leiden hierarchical clustering of the graph to create communities.

    -

    Fields

    +

    Fields

    +

    extract_claims

    +

    Fields

    + +

    community_reports

    +

    Fields

    +

    embed_graph

    -

    We use node2vec to embed the graph. This is primarily used for visualization, so it is not turned on by default. However, if you do prefer to embed the graph for secondary analysis, you can turn this on and we will persist the embeddings to your configured vector store.

    +

    We use node2vec to embed the graph. This is primarily used for visualization, so it is not turned on by default.

    Fields

    -

    workflows

    -

    list[str] - This is a list of workflow names to run, in order. GraphRAG has built-in pipelines to configure this, but you can run exactly and only what you want by specifying the list here. Useful if you have done part of the processing yourself.

    diff --git a/examples_notebooks/drift_search/index.html b/examples_notebooks/drift_search/index.html index 7034711e..7df5a1fd 100644 --- a/examples_notebooks/drift_search/index.html +++ b/examples_notebooks/drift_search/index.html @@ -2432,7 +2432,7 @@ search = DRIFTSearch(
    -
    100%|██████████| 1/1 [00:12<00:00, 12.57s/it]
    +
    100%|██████████| 1/1 [00:13<00:00, 13.62s/it]
    @@ -2456,13 +2456,19 @@ search = DRIFTSearch(
    +
    +
    + +
    @@ -2486,19 +2492,19 @@ search = DRIFTSearch(
    @@ -2709,9 +2713,9 @@ print(
    Build context (gpt-4o)
    -LLM calls: 2. Prompt tokens: 1761. Output tokens: 208.
    +LLM calls: 2. Prompt tokens: 1761. Output tokens: 209.
     Map-reduce (gpt-4o)
    -LLM calls: 2. Prompt tokens: 3378. Output tokens: 493.
    +LLM calls: 2. Prompt tokens: 3378. Output tokens: 591.
     
    diff --git a/examples_notebooks/local_search/index.html b/examples_notebooks/local_search/index.html index 0afa127c..7293c8b3 100644 --- a/examples_notebooks/local_search/index.html +++ b/examples_notebooks/local_search/index.html @@ -3366,21 +3366,21 @@ print(result.response)
    ### Overview of Agent Alex Mercer
     
    -Agent Alex Mercer is a prominent member of the Paranormal Military Squad, an elite group tasked with executing Operation: Dulce. He plays a crucial role in the mission, providing guidance and emphasizing the importance of intuition and trust among his team members. Mercer's leadership and mentorship are particularly significant, as he serves as a mentor to Sam Rivera, offering valuable support and leadership [Data: Reports (1); Entities (0); Relationships (2, 15)].
    -
    -### Role in Operation: Dulce
    -
    -In Operation: Dulce, Alex Mercer is one of the agents exploring the Dulce base, a mysterious and secretive location associated with advanced alien technology. His involvement in the mission is critical, as he is responsible for leading the team into the Dulce base and navigating its complexities. Mercer's leadership is characterized by a balance between compliance with protocols and a natural inclination to question and explore all details, which sometimes leads to internal conflict [Data: Reports (1); Entities (0, 8); Relationships (23, 4); Claims (3, 5)].
    -
    -### Relationships and Interactions
    -
    -Agent Mercer maintains professional relationships with other key members of the Paranormal Military Squad, including Taylor Cruz, Jordan Hayes, and Sam Rivera. His relationship with Taylor Cruz is primarily professional, with Mercer acknowledging Cruz's authority while also experiencing a competitive undercurrent due to Cruz's authoritative nature. With Jordan Hayes, Mercer shares a mutual respect and understanding, particularly admiring each other's expertise and analytical abilities. His mentorship of Sam Rivera highlights his role as a guiding figure within the team [Data: Reports (1); Entities (0, 1, 2, 3); Relationships (0, 1, 2, 15)].
    +Agent Alex Mercer is a pivotal member of the Paranormal Military Squad, an elite group tasked with executing Operation: Dulce. His role is crucial to the mission's success, as he is one of the agents exploring the Dulce base, a mysterious and secretive location associated with advanced alien technology [Data: Reports (1); Entities (0, 8); Relationships (23, 4)].
     
     ### Leadership and Mentorship
     
    -Alex Mercer's leadership style is marked by his emphasis on intuition and trust, which he believes are essential for the success of their mission. His mentorship of Sam Rivera is a testament to his commitment to nurturing the skills and potential of his team members. This mentorship is not only about imparting knowledge but also about fostering a sense of confidence and readiness in facing the unknown challenges of Operation: Dulce [Data: Reports (1); Entities (0, 3); Relationships (2, 15)].
    +Alex Mercer is recognized for his leadership qualities and serves as a mentor to Sam Rivera, another key member of the squad. He provides guidance and emphasizes the importance of intuition and trust, which are essential traits for navigating the complexities of Operation: Dulce. His mentorship relationship with Sam Rivera highlights his supportive nature and his ability to foster talent within the team [Data: Reports (1); Entities (0, 3); Relationships (2, 15)].
     
    -In summary, Agent Alex Mercer is a key figure in the Paranormal Military Squad, whose leadership and mentorship are vital to the success of Operation: Dulce. His ability to balance protocol with intuition, along with his strong professional relationships, underscores his importance in the mission and his role as a mentor to his colleagues.
    +### Professional Relationships
    +
    +Mercer maintains professional relationships with other team members, including Taylor Cruz and Jordan Hayes. His interactions with Taylor Cruz are marked by a competitive undercurrent, as Cruz's authoritative nature often challenges Mercer's compliance. Despite this, Mercer acknowledges Cruz's authority and follows their lead during the mission [Data: Reports (1); Entities (0, 1, 2); Relationships (0, 1)].
    +
    +### Internal Conflict and Role in Operation: Dulce
    +
    +Agent Mercer experiences internal conflict between adhering to protocols and his natural inclination to question and explore all details. This conflict is evident in his interactions with Taylor Cruz and his own reflections on the mission. Despite these challenges, Mercer is depicted as a determined individual who is leading a mission into the Dulce base, indicating his significant role in mission leadership and decision-making [Data: Claims (3, 5); Reports (1); Entities (0)].
    +
    +In summary, Agent Alex Mercer is a key figure in the Paranormal Military Squad, known for his leadership, mentorship, and professional relationships. His role in Operation: Dulce is critical, as he navigates the complexities of the mission while managing internal conflicts and fostering teamwork among his colleagues.
     
    @@ -3424,27 +3424,27 @@ print(result.response)
    -
    ### Overview of Dr. Jordan Hayes
    +
    ## Overview of Dr. Jordan Hayes
     
     Dr. Jordan Hayes is a prominent scientist and a key member of the Paranormal Military Squad, known for their expertise in physics and composed demeanor. They play a significant role in Operation: Dulce, particularly in working with alien technology, which is a central element of the mission [Data: Entities (2); Reports (1)].
     
    -### Role in Operation: Dulce
    +## Role in Operation: Dulce
     
    -Dr. Hayes is deeply involved in the exploration of the Dulce base, where they contribute their analytical skills to the mission. Their work primarily focuses on understanding and analyzing alien technology, which is crucial for the success of Operation: Dulce. This role highlights their importance in the mission, as they provide valuable insights and expertise in dealing with the complexities of alien technology [Data: Reports (1); Entities (2, 13); Relationships (26, 48, 51)].
    +Dr. Hayes is deeply involved in the exploration of the Dulce base, where they contribute their analytical skills to the mission. Their work primarily focuses on understanding and manipulating alien technology, which is crucial for the success of Operation: Dulce. Hayes is recognized for their analytical mind and reflective nature, often contemplating the complexities of their missions [Data: Reports (1); Entities (2, 13)].
     
    -### Professional Relationships
    +## Professional Relationships
     
    -Dr. Hayes maintains professional relationships with other key members of the Paranormal Military Squad, including Taylor Cruz, Sam Rivera, and Alex Mercer. Their interactions with these team members emphasize the importance of adaptability and analytical thinking in the mission. Dr. Hayes is known for their skepticism towards strict adherence to protocols, advocating for a more flexible approach to the unknown variables encountered during the mission [Data: Reports (1); Relationships (1, 5, 9, 25)].
    +Dr. Hayes maintains professional relationships with other key members of the Paranormal Military Squad, including Taylor Cruz, Sam Rivera, and Alex Mercer. Their interactions with Taylor Cruz are marked by differing views on protocol and adaptability, highlighting a complex relationship characterized by moments of mutual respect. With Sam Rivera, Hayes shares a common belief in the importance of adaptability, which is essential for the mission's success [Data: Reports (1); Relationships (5, 9, 25)].
     
    -### Analytical and Skeptical Nature
    +## Analytical and Skeptical Nature
     
    -Dr. Hayes is portrayed as a skeptical and analytical member of the team, often contemplating the layers of data and the complexities of their missions. This skepticism is particularly evident in their interactions with Taylor Cruz, where they emphasize the need for adaptability over rigid protocols. Their analytical insights are crucial in identifying hidden elements within the Dulce base, such as a suspicious panel that seemed out of place [Data: Claims (2, 6, 10); Sources (0, 2)].
    +Dr. Hayes is portrayed as skeptical of strict adherence to protocols, emphasizing the need for adaptability and acknowledging the unknown variables that exceed the known. This skepticism is evident in their interactions and comments during mission briefings. They provide analytical insights and express concerns about the mission, indicating a role in analytical assessment [Data: Claims (2, 6); Sources (0)].
     
    -### Contribution to the Team
    +## Contributions and Discoveries
     
    -Dr. Hayes' contribution to the Paranormal Military Squad is significant, as they bring a reflective and analytical perspective to the mission. Their ability to navigate the complexities of alien technology and their composed demeanor make them a valuable asset to the team. Their role in Operation: Dulce underscores the importance of scientific expertise and adaptability in dealing with the unknown challenges posed by the mission [Data: Reports (1); Entities (2); Relationships (10, 18, 21)].
    +During the mission, Dr. Hayes identified a suspicious panel that seemed out of place, indicating a hidden element within the Dulce base. This discovery underscores their role in uncovering critical aspects of the mission and highlights their importance in the team's efforts to navigate the complexities of the Dulce base [Data: Claims (10); Sources (2)].
     
    -In summary, Dr. Jordan Hayes is a critical member of the Paranormal Military Squad, whose expertise in physics and alien technology plays a vital role in the success of Operation: Dulce. Their analytical nature and professional relationships with other team members highlight the importance of adaptability and scientific insight in navigating the complexities of the mission.
    +In summary, Dr. Jordan Hayes is a vital member of the Paranormal Military Squad, contributing their scientific expertise and analytical skills to Operation: Dulce. Their role in working with alien technology and their professional relationships with other team members are crucial to the mission's success.
     
    @@ -3968,7 +3968,7 @@ print(candidate_questions.response)
    -
    ['- What is the role of Agent Alex Mercer in Operation: Dulce?', "- How does Agent Taylor Cruz's leadership style impact the team's mission at Dulce base?", '- What expertise does Dr. Jordan Hayes bring to the exploration of the Dulce base?', '- How does Sam Rivera contribute to the mission at Dulce base with their cybersecurity skills?', '- What are the dynamics and relationships among the Paranormal Military Squad members during Operation: Dulce?']
    +
    ['- What is the role of Agent Alex Mercer in Operation: Dulce?', "- How does Agent Taylor Cruz's leadership style impact the Paranormal Military Squad's mission?", '- What expertise does Dr. Jordan Hayes bring to the exploration of the Dulce base?', '- How does Sam Rivera contribute to the success of Operation: Dulce?', '- What are the dynamics and relationships among the members of the Paranormal Military Squad during their mission at the Dulce base?']
     
    diff --git a/search/search_index.json b/search/search_index.json index 62524a29..56c8e6d4 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config": {"lang": ["en"], "separator": "[\\s\\-]+", "pipeline": ["stopWordFilter"]}, "docs": [{"location": "", "title": "Welcome to GraphRAG", "text": "

    \ud83d\udc49 Microsoft Research Blog Post \ud83d\udc49 GraphRAG Accelerator \ud83d\udc49 GraphRAG Arxiv

    Figure 1: An LLM-generated knowledge graph built using GPT-4 Turbo.

    GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets. The GraphRAG process involves extracting a knowledge graph out of raw text, building a community hierarchy, generating summaries for these communities, and then leveraging these structures when perform RAG-based tasks.

    To learn more about GraphRAG and how it can be used to enhance your language model's ability to reason about your private data, please visit the Microsoft Research Blog Post.

    "}, {"location": "#solution-accelerator", "title": "Solution Accelerator \ud83d\ude80", "text": "

    To quickstart the GraphRAG system we recommend trying the Solution Accelerator package. This provides a user-friendly end-to-end experience with Azure resources.

    "}, {"location": "#get-started-with-graphrag", "title": "Get Started with GraphRAG \ud83d\ude80", "text": "

    To start using GraphRAG, check out the Get Started guide. For a deeper dive into the main sub-systems, please visit the docpages for the Indexer and Query packages.

    "}, {"location": "#graphrag-vs-baseline-rag", "title": "GraphRAG vs Baseline RAG \ud83d\udd0d", "text": "

    Retrieval-Augmented Generation (RAG) is a technique to improve LLM outputs using real-world information. This technique is an important part of most LLM-based tools and the majority of RAG approaches use vector similarity as the search technique, which we call Baseline RAG. GraphRAG uses knowledge graphs to provide substantial improvements in question-and-answer performance when reasoning about complex information. RAG techniques have shown promise in helping LLMs to reason about private datasets - data that the LLM is not trained on and has never seen before, such as an enterprise\u2019s proprietary research, business documents, or communications. Baseline RAG was created to help solve this problem, but we observe situations where baseline RAG performs very poorly. For example:

    To address this, the tech community is working to develop methods that extend and enhance RAG. Microsoft Research\u2019s new approach, GraphRAG, creates a knowledge graph based on an input corpus. This graph, along with community summaries and graph machine learning outputs, are used to augment prompts at query time. GraphRAG shows substantial improvement in answering the two classes of questions described above, demonstrating intelligence or mastery that outperforms other approaches previously applied to private datasets.

    "}, {"location": "#the-graphrag-process", "title": "The GraphRAG Process \ud83e\udd16", "text": "

    GraphRAG builds upon our prior research and tooling using graph machine learning. The basic steps of the GraphRAG process are as follows:

    "}, {"location": "#index", "title": "Index", "text": ""}, {"location": "#query", "title": "Query", "text": "

    At query time, these structures are used to provide materials for the LLM context window when answering a question. The primary query modes are:

    "}, {"location": "#prompt-tuning", "title": "Prompt Tuning", "text": "

    Using GraphRAG with your data out of the box may not yield the best possible results. We strongly recommend to fine-tune your prompts following the Prompt Tuning Guide in our documentation.

    "}, {"location": "#versioning", "title": "Versioning", "text": "

    Please see the breaking changes document for notes on our approach to versioning the project.

    Always run graphrag init --root [path] --force between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary.

    "}, {"location": "blog_posts/", "title": "Microsoft Research Blog", "text": "