mirror of
https://github.com/microsoft/graphrag.git
synced 2025-10-02 11:37:59 +00:00
Deploying to gh-pages from @ microsoft/graphrag@425dbc60e3 🚀
This commit is contained in:
parent
a36727c12d
commit
235cd282d2
@ -1518,7 +1518,7 @@
|
|||||||
<h1 id="configuration-template">Configuration Template</h1>
|
<h1 id="configuration-template">Configuration Template</h1>
|
||||||
<p>The following template can be used and stored as a <code>.env</code> in the the directory where you're are pointing
|
<p>The following template can be used and stored as a <code>.env</code> in the the directory where you're are pointing
|
||||||
the <code>--root</code> parameter on your Indexing Pipeline execution.</p>
|
the <code>--root</code> parameter on your Indexing Pipeline execution.</p>
|
||||||
<p>For details about how to run the Indexing Pipeline, refer to the <a href="../index/cli.md">Index CLI</a> documentation.</p>
|
<p>For details about how to run the Indexing Pipeline, refer to the <a href="../../cli/">Index CLI</a> documentation.</p>
|
||||||
<h2 id="env-file-template">.env File Template</h2>
|
<h2 id="env-file-template">.env File Template</h2>
|
||||||
<p>Required variables are uncommented. All the optional configuration can be turned on or off as needed.</p>
|
<p>Required variables are uncommented. All the optional configuration can be turned on or off as needed.</p>
|
||||||
<h3 id="minimal-configuration">Minimal Configuration</h3>
|
<h3 id="minimal-configuration">Minimal Configuration</h3>
|
||||||
|
@ -1479,7 +1479,7 @@
|
|||||||
<p>Make sure you have python3.10-dev installed or more generally <code>python<version>-dev</code></p>
|
<p>Make sure you have python3.10-dev installed or more generally <code>python<version>-dev</code></p>
|
||||||
<p><code>sudo apt-get install python3.10-dev</code></p>
|
<p><code>sudo apt-get install python3.10-dev</code></p>
|
||||||
<h3 id="llm-call-constantly-exceeds-tpm-rpm-or-time-limits">LLM call constantly exceeds TPM, RPM or time limits</h3>
|
<h3 id="llm-call-constantly-exceeds-tpm-rpm-or-time-limits">LLM call constantly exceeds TPM, RPM or time limits</h3>
|
||||||
<p><code>GRAPHRAG_LLM_THREAD_COUNT</code> and <code>GRAPHRAG_EMBEDDING_THREAD_COUNT</code> are both set to 50 by default. You can modify this values
|
<p><code>GRAPHRAG_LLM_THREAD_COUNT</code> and <code>GRAPHRAG_EMBEDDING_THREAD_COUNT</code> are both set to 50 by default. You can modify these values
|
||||||
to reduce concurrency. Please refer to the <a href="../config/overview/">Configuration Documents</a></p>
|
to reduce concurrency. Please refer to the <a href="../config/overview/">Configuration Documents</a></p>
|
||||||
|
|
||||||
|
|
||||||
|
File diff suppressed because one or more lines are too long
@ -2606,19 +2606,19 @@ print(result.response)</div>
|
|||||||
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
||||||
<pre>### Overview of Cosmic Vocalization
|
<pre>### Overview of Cosmic Vocalization
|
||||||
|
|
||||||
Cosmic Vocalization is a phenomenon that has garnered significant attention within various communities, including military and civilian groups. It is perceived as a cosmic event that may have implications for security and interstellar communication. The phenomenon is central to discussions and strategic considerations, particularly due to its potential impact and the mystery surrounding its nature.
|
Cosmic Vocalization is a phenomenon that has garnered significant attention within various communities, including military and civilian groups. It is perceived as a cosmic event that may have implications for security and interstellar communication.
|
||||||
|
|
||||||
### Involvement and Perspectives
|
### Involvement and Perspectives
|
||||||
|
|
||||||
1. **Paranormal Military Squad**: This group is actively engaged with Cosmic Vocalization, indicating that it is considered a matter of strategic importance in terms of security measures. Their involvement suggests that the phenomenon is not only of scientific interest but also of potential national or global security concern [Data: Reports (6)].
|
1. **Paranormal Military Squad**: This group is actively engaged with Cosmic Vocalization, indicating that it is considered a strategic element in their security measures. Their involvement underscores the potential significance of these cosmic phenomena in defense and security contexts [Data: Reports (6)].
|
||||||
|
|
||||||
2. **Community Interest**: Cosmic Vocalization has captured the interest of various individuals and groups within the community. Alex Mercer, for instance, views it as part of an "interstellar duet," which implies a belief in its communicative or responsive nature. This perspective highlights the broader curiosity and speculative interpretations that surround the phenomenon [Data: Reports (6)].
|
2. **Community Interest**: Cosmic Vocalization has become a focal point of interest for various individuals and groups. Alex Mercer, for instance, views it as part of an interstellar duet, suggesting a responsive and perhaps communicative aspect to these cosmic events [Data: Reports (6)].
|
||||||
|
|
||||||
3. **Concerns and Speculations**: Taylor Cruz has expressed concerns about Cosmic Vocalization, fearing it might be a "homing tune." This adds a layer of urgency and potential threat to the discussions, as it suggests the possibility of the phenomenon being a signal or beacon with unknown intentions or consequences [Data: Reports (6)].
|
3. **Concerns and Speculations**: Taylor Cruz has expressed concerns about the nature of Cosmic Vocalization, fearing it might function as a homing tune. This perspective adds a layer of urgency and potential threat to the phenomenon, highlighting the diverse interpretations and concerns surrounding it [Data: Reports (6)].
|
||||||
|
|
||||||
### Implications
|
### Implications
|
||||||
|
|
||||||
The involvement of diverse groups and the range of perspectives on Cosmic Vocalization underscore its complexity and the need for further investigation. The phenomenon's potential implications for security, communication, and interstellar relations make it a subject of both intrigue and caution. As such, it remains a focal point for ongoing research and strategic planning.
|
The involvement of both military and civilian entities in Cosmic Vocalization suggests that it may have far-reaching implications. The strategic interest from the Paranormal Military Squad indicates potential security concerns, while the community's varied interpretations reflect a broader curiosity and apprehension about the phenomenon's nature and purpose.
|
||||||
</pre>
|
</pre>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -2869,7 +2869,7 @@ print(
|
|||||||
<div class="jp-OutputArea-child">
|
<div class="jp-OutputArea-child">
|
||||||
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
|
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
|
||||||
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
||||||
<pre>LLM calls: 2. Prompt tokens: 11237. Output tokens: 533.
|
<pre>LLM calls: 2. Prompt tokens: 11237. Output tokens: 474.
|
||||||
</pre>
|
</pre>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
@ -2566,19 +2566,19 @@ print(result.response)</div>
|
|||||||
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
||||||
<pre>### Overview of Cosmic Vocalization
|
<pre>### Overview of Cosmic Vocalization
|
||||||
|
|
||||||
Cosmic Vocalization is a phenomenon that has captured the attention of various individuals and groups within the community. It is perceived as a significant cosmic event, with different interpretations and implications depending on the observer's perspective.
|
Cosmic Vocalization is a phenomenon that has captured the attention of various individuals and groups within the community. It is perceived as a significant event with potential implications for both cosmic and terrestrial activities. The concept of Cosmic Vocalization is central to the community's focus, suggesting its importance in ongoing discussions and actions [Data: Reports (6)].
|
||||||
|
|
||||||
### Key Perspectives and Concerns
|
### Perspectives and Concerns
|
||||||
|
|
||||||
Alex Mercer views Cosmic Vocalization as part of an interstellar duet, suggesting that it may be a responsive or interactive cosmic event. This perspective highlights the potential for Cosmic Vocalization to be part of a larger cosmic communication or interaction [Data: Reports (6)].
|
Alex Mercer views Cosmic Vocalization as part of an interstellar duet, indicating a belief that it may be a responsive or interactive event. This perspective suggests that Cosmic Vocalization could be a form of communication or interaction with cosmic entities or phenomena [Data: Reports (6)].
|
||||||
|
|
||||||
On the other hand, Taylor Cruz expresses concerns that Cosmic Vocalization might be a homing tune. This interpretation adds a layer of urgency and potential threat, as it implies that the phenomenon could be signaling or attracting attention from unknown entities [Data: Reports (6)].
|
On the other hand, Taylor Cruz raises concerns about the implications of Cosmic Vocalization, fearing it might serve as a homing tune. This perspective introduces a layer of urgency and potential threat, as it implies that the vocalization could attract attention or entities from beyond our planet [Data: Reports (6)].
|
||||||
|
|
||||||
### Involvement of the Paranormal Military Squad
|
### Strategic Engagement
|
||||||
|
|
||||||
The Paranormal Military Squad is actively engaged with Cosmic Vocalization, indicating its importance in strategic and security measures. Their involvement suggests that Cosmic Vocalization is not only a subject of scientific or philosophical interest but also a matter of security concern. The squad metaphorically treats the Universe as a concert hall, which reflects a broader perspective on how cosmic events are interpreted and responded to by human entities [Data: Reports (6)].
|
The Paranormal Military Squad's involvement with Cosmic Vocalization highlights its significance in terms of security and strategic response. Their engagement suggests that Cosmic Vocalization is not only a subject of scientific or philosophical interest but also a matter of national or global security. This involvement underscores the need for a coordinated and strategic approach to understanding and potentially mitigating any risks associated with the phenomenon [Data: Reports (6)].
|
||||||
|
|
||||||
In summary, Cosmic Vocalization is a multifaceted phenomenon with varying interpretations and implications. It involves key figures like Alex Mercer and Taylor Cruz, as well as strategic entities like the Paranormal Military Squad, each bringing their unique perspectives and concerns to the table.
|
In summary, Cosmic Vocalization is a multifaceted phenomenon involving various stakeholders, each with their own perspectives and concerns. Its implications range from potential communication with cosmic entities to security threats, necessitating a comprehensive and strategic response.
|
||||||
</pre>
|
</pre>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -2672,10 +2672,10 @@ result.context_data["reports"]</div>
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<th>3</th>
|
<th>3</th>
|
||||||
<td>18</td>
|
<td>17</td>
|
||||||
<td>Paranormal Military Squad Team and Dulce Base'...</td>
|
<td>Dulce Team and Underground Command Center: Int...</td>
|
||||||
<td>0.04</td>
|
<td>0.02</td>
|
||||||
<td># Paranormal Military Squad Team and Dulce Bas...</td>
|
<td># Dulce Team and Underground Command Center: I...</td>
|
||||||
<td>8.5</td>
|
<td>8.5</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -2778,9 +2778,9 @@ print(
|
|||||||
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
|
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
|
||||||
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
||||||
<pre>Build context (gpt-4o-mini)
|
<pre>Build context (gpt-4o-mini)
|
||||||
LLM calls: 12. Prompt tokens: 8565. Output tokens: 1095.
|
LLM calls: 12. Prompt tokens: 8565. Output tokens: 1098.
|
||||||
Map-reduce (gpt-4o)
|
Map-reduce (gpt-4o)
|
||||||
LLM calls: 2. Prompt tokens: 5772. Output tokens: 549.
|
LLM calls: 2. Prompt tokens: 5864. Output tokens: 506.
|
||||||
</pre>
|
</pre>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
@ -3376,23 +3376,21 @@ print(result.response)</div>
|
|||||||
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
||||||
<pre>### Overview of Agent Mercer
|
<pre>### Overview of Agent Mercer
|
||||||
|
|
||||||
Agent Alex Mercer is a prominent figure within the Paranormal Military Squad, playing a crucial role in Operation: Dulce at the Dulce Base. He is recognized for his strategic and commanding presence, particularly in overseeing operations related to communication with extraterrestrial intelligence. His responsibilities include initiating broadcasts to communicate with extraterrestrial beings, decoding alien messages, and leading the team in understanding and responding to these communications [Data: Entities (0)].
|
Agent Alex Mercer is a prominent figure within the Paranormal Military Squad, playing a crucial role in Operation: Dulce at the Dulce Base. He is recognized for his commanding and strategic presence, particularly in overseeing operations related to communication with extraterrestrial intelligence. His responsibilities include initiating broadcasts to communicate with extraterrestrial beings, decoding alien messages, and leading the team in understanding and responding to these communications [Data: Entities (0)].
|
||||||
|
|
||||||
### Role and Responsibilities
|
### Role and Responsibilities
|
||||||
|
|
||||||
Mercer is deeply involved in the philosophical and strategic aspects of interstellar communication, viewing these interactions as a form of cosmic dialogue. His leadership style is characterized by a blend of determination, compliance with mission protocols, and a protective approach towards his team. He collaborates with team members like Jordan Hayes, exploring secured areas and engaging in high-stakes, secretive operations. Mercer is also known for his intellectual curiosity and deep involvement in the philosophical and strategic aspects of interstellar communication [Data: Entities (0)].
|
Mercer is deeply involved in the philosophical and strategic aspects of interstellar communication, viewing these interactions as a form of cosmic dialogue. His leadership style is characterized by a blend of determination, compliance with mission protocols, and a protective approach towards his team. He collaborates with team members like Jordan Hayes, exploring secured areas and engaging in high-stakes, secretive operations. Mercer is also known for his intellectual curiosity and deep involvement in the mission's objectives, which include analyzing cosmic signals and strategizing contact efforts [Data: Entities (0)].
|
||||||
|
|
||||||
### Mentorship and Influence
|
### Relationships and Influence
|
||||||
|
|
||||||
Mercer is depicted as a thoughtful mentor, particularly to Sam Rivera, emphasizing the importance of intuition and trust beyond protocol. His experiences during encounters with alien signals have led to profound changes, reinforcing his role as a key decision-maker and guardian in missions that transcend traditional boundaries. His mentorship is evident in the influence he has on team members like Sam Rivera, whose actions and confidence are shaped by Mercer's guidance [Data: Entities (0); Relationships (167)].
|
Agent Mercer is depicted as a thoughtful mentor, particularly to Sam Rivera, emphasizing the importance of intuition and trust beyond protocol. His mentorship has a significant influence on Rivera's actions and confidence [Data: Relationships (167)]. Additionally, Mercer shares a professional relationship with other key figures at Dulce Base, such as Taylor Cruz and Jordan Hayes, working closely with them to navigate the complexities of alien communication and technology [Data: Relationships (71, 116, 336)].
|
||||||
|
|
||||||
### Interactions and Relationships
|
### Challenges and Philosophical Approach
|
||||||
|
|
||||||
Within the team, Mercer shares a complex dynamic with other members, such as Taylor Cruz and Jordan Hayes. He is involved in discussions and operations at Dulce Base, often navigating the tension between strict military discipline and the imperative to confront unknown challenges. His interactions with Jordan Hayes highlight a shared commitment to discovery and understanding, often in the face of authoritarian oversight from figures like Taylor Cruz [Data: Entities (0); Relationships (116, 336)].
|
Despite his outward compliance, Mercer harbors a subtle form of determination that resists easy submission to authority, as observed during the briefing for Operation: Dulce [Data: Claims (1)]. He is involved in high-stakes operations that require a balance between strict military discipline and the imperative to confront unknown challenges. This duality underscores his complex role within the squad, as he navigates the tension between adhering to protocols and exploring the broader implications of their mission [Data: Entities (0, 8)].
|
||||||
|
|
||||||
### Conclusion
|
In summary, Agent Alex Mercer is a pivotal figure in the Paranormal Military Squad, known for his strategic leadership, mentorship, and philosophical approach to interstellar communication. His role at Dulce Base is marked by a commitment to understanding and engaging with extraterrestrial intelligence, making him a key decision-maker in missions that transcend traditional boundaries.
|
||||||
|
|
||||||
Agent Alex Mercer is a pivotal character in the narrative of Operation: Dulce, embodying the intersection of leadership, mentorship, and strategic communication with extraterrestrial intelligence. His role is essential in guiding the Paranormal Military Squad through the complexities of interstellar dialogue and the challenges posed by their mission at Dulce Base.
|
|
||||||
</pre>
|
</pre>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -3438,23 +3436,21 @@ print(result.response)</div>
|
|||||||
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
|
||||||
<pre>### Overview of Dr. Jordan Hayes
|
<pre>### Overview of Dr. Jordan Hayes
|
||||||
|
|
||||||
Dr. Jordan Hayes is a prominent scientist at Dulce Base, playing a crucial role in the Paranormal Military Squad's efforts to understand and communicate with extraterrestrial entities. Dr. Hayes's work is primarily focused on decoding and analyzing alien signals and codes, which are essential for interstellar communication and potentially interspecies interaction. This involves decrypting algorithms, interpreting cosmic signals, and analyzing the implications of alien society [Data: Entities (2, 17, 47); Relationships (201, 322, 283)].
|
Dr. Jordan Hayes is a prominent scientist at Dulce Base, playing a crucial role in the Paranormal Military Squad's efforts to understand and communicate with extraterrestrial entities. Dr. Hayes's work is primarily focused on decoding and analyzing alien signals and codes, which are essential for interstellar communication and potentially interspecies interaction. This involves decrypting algorithms, interpreting cosmic signals, and analyzing the implications of alien society [Data: Entities (2, 17); Relationships (201, 322, 283)].
|
||||||
|
|
||||||
### Key Contributions and Expertise
|
### Key Contributions and Expertise
|
||||||
|
|
||||||
Dr. Hayes is noted for their adaptability and skepticism, qualities that are essential given the uncertainties and unknown challenges of their mission. They are deeply involved in the scientific exploration aspects of Operation: Dulce, where they focus on critical decryption tasks. Their expertise plays a vital role in understanding and interpreting alien messages, aiding the squad in their paranormal operations [Data: Entities (2, 17); Relationships (201, 246, 322)].
|
Dr. Hayes is noted for their adaptability and skepticism, qualities that are essential given the uncertainties and unknown challenges of their mission. They are deeply involved in the scientific exploration aspects of Operation: Dulce, where their efforts are on the verge of a significant scientific breakthrough. Dr. Hayes leads efforts in isolating and understanding complex alien signals that resemble human cognition, suggesting that these signals are artificial and patterned, indicating a tandem evolution with humanity [Data: Entities (2); Claims (60, 91, 134)].
|
||||||
|
|
||||||
Dr. Hayes's efforts are on the verge of a significant scientific breakthrough, as they lead efforts in isolating and understanding complex alien signals that resemble human cognition. Their work suggests that these signals are artificial and patterned, indicating a tandem evolution with humanity. This breakthrough is crucial for crafting humanity's responses to cosmic alignments with stars and responsive galactic signals [Data: Entities (2); Claims (60, 83, 91, 134)].
|
|
||||||
|
|
||||||
### Collaborative Efforts and Leadership
|
### Collaborative Efforts and Leadership
|
||||||
|
|
||||||
In addition to their scientific endeavors, Dr. Hayes is involved in setting up lab stations, operating the mainframe, and playing a crucial role in the command center at Dulce Base. They work closely with colleagues like Alex Mercer, engaging in thoughtful dialogue and showing analytical thinking about the mission's uncertainties. Dr. Hayes's attention to detail is also evident in their discovery of significant panels among secured doorways and their contemplation of the mission's broader ramifications [Data: Entities (2); Relationships (26, 270, 254)].
|
In addition to their scientific endeavors, Dr. Hayes is involved in setting up lab stations, operating the mainframe, and playing a crucial role in the command center at Dulce Base. They work closely with colleagues like Alex Mercer, engaging in thoughtful dialogue and showing analytical thinking about the mission's uncertainties. Dr. Hayes's attention to detail is also evident in their discovery of significant panels among secured doorways and their contemplation of the mission's broader ramifications [Data: Entities (2); Relationships (26, 270, 254)].
|
||||||
|
|
||||||
Dr. Hayes emphasizes the importance of adaptability in leadership, especially when facing unknown challenges. This perspective is crucial in navigating the unknown variables in their field of research, making them a vital asset to the team's efforts in decoding extraterrestrial messages [Data: Claims (2, 13); Entities (2, 17)].
|
### Challenges and Philosophical Reflections
|
||||||
|
|
||||||
### Conclusion
|
Dr. Hayes's role is not without its challenges. They often reflect on their own skepticism and its potential as a blind spot, indicating a moment of self-awareness and growth. This introspection is crucial as they navigate the profound implications of their findings on physics and the possibilities of their mission. Dr. Hayes's work suggests that the signals they are deciphering are not just messages but structured, intentional interstellar communications, which could lead to a technological breakthrough [Data: Claims (2, 13, 83, 96)].
|
||||||
|
|
||||||
Dr. Jordan Hayes is a central figure in the efforts to understand and communicate with extraterrestrial entities at Dulce Base. Their work in decoding alien signals and their leadership in scientific exploration are pivotal to the success of Operation: Dulce. Dr. Hayes's contributions are not only advancing the mission's objectives but also potentially reshaping humanity's understanding of its place in the universe [Data: Entities (2, 17); Relationships (201, 246, 322)].
|
In summary, Dr. Jordan Hayes is a vital asset to the Paranormal Military Squad, contributing significantly to the understanding of extraterrestrial communications and the potential for interspecies interaction. Their work at Dulce Base is characterized by a blend of scientific rigor, adaptability, and philosophical reflection, making them a key figure in the ongoing exploration of the unknown.
|
||||||
</pre>
|
</pre>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
@ -1538,7 +1538,7 @@ Since we have already configured a directory named <code>./ragtest</code> in the
|
|||||||
<ul>
|
<ul>
|
||||||
<li>For more details about configuring GraphRAG, see the <a href="../config/overview/">configuration documentation</a>.</li>
|
<li>For more details about configuring GraphRAG, see the <a href="../config/overview/">configuration documentation</a>.</li>
|
||||||
<li>To learn more about Initialization, refer to the <a href="../config/init/">Initialization documentation</a>.</li>
|
<li>To learn more about Initialization, refer to the <a href="../config/init/">Initialization documentation</a>.</li>
|
||||||
<li>For more details about using the CLI, refer to the <a href="query/cli.md">CLI documentation</a>.</li>
|
<li>For more details about using the CLI, refer to the <a href="../cli/">CLI documentation</a>.</li>
|
||||||
</ul>
|
</ul>
|
||||||
<h2 id="running-the-indexing-pipeline">Running the Indexing pipeline</h2>
|
<h2 id="running-the-indexing-pipeline">Running the Indexing pipeline</h2>
|
||||||
<p>Finally we'll run the pipeline!</p>
|
<p>Finally we'll run the pipeline!</p>
|
||||||
|
@ -675,6 +675,42 @@
|
|||||||
</span>
|
</span>
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_entities" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_entities
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_nodes" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_nodes
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_relationships" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_relationships
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_text_units" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_text_units
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
</ul>
|
</ul>
|
||||||
@ -1517,6 +1553,42 @@
|
|||||||
</span>
|
</span>
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_entities" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_entities
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_nodes" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_nodes
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_relationships" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_relationships
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li class="md-nav__item">
|
||||||
|
<a href="#create_final_text_units" class="md-nav__link">
|
||||||
|
<span class="md-ellipsis">
|
||||||
|
create_final_text_units
|
||||||
|
</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
</ul>
|
</ul>
|
||||||
@ -1540,81 +1612,408 @@
|
|||||||
<h1 id="outputs">Outputs</h1>
|
<h1 id="outputs">Outputs</h1>
|
||||||
<p>The default pipeline produces a series of output tables that align with the <a href="../default_dataflow/">conceptual knowledge model</a>. This page describes the detailed output table schemas. By default we write these tables out as parquet files on disk.</p>
|
<p>The default pipeline produces a series of output tables that align with the <a href="../default_dataflow/">conceptual knowledge model</a>. This page describes the detailed output table schemas. By default we write these tables out as parquet files on disk.</p>
|
||||||
<h2 id="shared-fields">Shared fields</h2>
|
<h2 id="shared-fields">Shared fields</h2>
|
||||||
<p>All tables have two identifier fields:
|
<p>All tables have two identifier fields:</p>
|
||||||
- id: str - Generated UUID, assuring global uniqueness
|
<table>
|
||||||
- human_readable_id: int - This is an incremented short ID created per-run. For example, we use this short ID with generated summaries that print citations so they are easy to cross-reference visually.</p>
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>name</th>
|
||||||
|
<th>type</th>
|
||||||
|
<th>description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>id</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Generated UUID, assuring global uniqueness</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>human_readable_id</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>This is an incremented short ID created per-run. For example, we use this short ID with generated summaries that print citations so they are easy to cross-reference visually.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
<h2 id="create_final_communities">create_final_communities</h2>
|
<h2 id="create_final_communities">create_final_communities</h2>
|
||||||
<p>This is a list of the final communities generated by Leiden. Communities are strictly hierarchical, subdividing into children as the cluster affinity is narrowed.
|
<p>This is a list of the final communities generated by Leiden. Communities are strictly hierarchical, subdividing into children as the cluster affinity is narrowed.</p>
|
||||||
- community: int - Leiden-generated cluster ID for the community. Note that these increment with depth, so they are unique through all levels of the community hierarchy. For this table, human_readable_id is a copy of the community ID rather than a plain increment.
|
<table>
|
||||||
- level: int - Depth of the community in the hierarchy.
|
<thead>
|
||||||
- title: str - Friendly name of the community.
|
<tr>
|
||||||
- entity_ids - List of entities that are members of the community.
|
<th>name</th>
|
||||||
- relationship_ids - List of relationships that are wholly within the community (source and target are both in the community).
|
<th>type</th>
|
||||||
- text_unit_ids - List of text units represented within the community.
|
<th>description</th>
|
||||||
- period - Date of ingest, used for incremental update merges.
|
</tr>
|
||||||
- size - Size of the community (entity count), used for incremental update merges.</p>
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>community</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Leiden-generated cluster ID for the community. Note that these increment with depth, so they are unique through all levels of the community hierarchy. For this table, human_readable_id is a copy of the community ID rather than a plain increment.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>level</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Depth of the community in the hierarchy.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>title</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Friendly name of the community.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>entity_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of entities that are members of the community.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>relationship_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of relationships that are wholly within the community (source and target are both in the community).</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>text_unit_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of text units represented within the community.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>period</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Date of ingest, used for incremental update merges. ISO8601</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>size</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Size of the community (entity count), used for incremental update merges.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
<h2 id="create_final_community_reports">create_final_community_reports</h2>
|
<h2 id="create_final_community_reports">create_final_community_reports</h2>
|
||||||
<p>This is the list of summarized reports for each community.
|
<p>This is the list of summarized reports for each community.</p>
|
||||||
- community: int - Short ID of the community this report applies to.
|
<table>
|
||||||
- level: int - Level of the community this report applies to.
|
<thead>
|
||||||
- title: str - LM-generated title for the report.
|
<tr>
|
||||||
- summary: str - LM-generated summary of the report.
|
<th>name</th>
|
||||||
- full_content: str - LM-generated full report.
|
<th>type</th>
|
||||||
- rank: float - LM-derived relevance ranking of the report based on member entity salience
|
<th>description</th>
|
||||||
- rank_explanation - LM-derived explanation of the rank.
|
</tr>
|
||||||
- findings: dict - LM-derived list of the top 5-10 insights from the community. Contains <code>summary</code> and <code>explanation</code> values.
|
</thead>
|
||||||
- full_content_json - Full JSON output as returned by the LM. Most fields are extracted into columns, but this JSON is sent for query summarization so we leave it to allow for prompt tuning to add fields/content by end users.
|
<tbody>
|
||||||
- period - Date of ingest, used for incremental update merges.
|
<tr>
|
||||||
- size - Size of the community (entity count), used for incremental update merges.</p>
|
<td>community</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Short ID of the community this report applies to.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>level</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Level of the community this report applies to.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>title</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-generated title for the report.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>summary</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-generated summary of the report.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>full_content</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-generated full report.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>rank</td>
|
||||||
|
<td>float</td>
|
||||||
|
<td>LM-derived relevance ranking of the report based on member entity salience</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>rank_explanation</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-derived explanation of the rank.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>findings</td>
|
||||||
|
<td>dict</td>
|
||||||
|
<td>LM-derived list of the top 5-10 insights from the community. Contains <code>summary</code> and <code>explanation</code> values.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>full_content_json</td>
|
||||||
|
<td>json</td>
|
||||||
|
<td>Full JSON output as returned by the LM. Most fields are extracted into columns, but this JSON is sent for query summarization so we leave it to allow for prompt tuning to add fields/content by end users.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>period</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Date of ingest, used for incremental update merges. ISO8601</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>size</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Size of the community (entity count), used for incremental update merges.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
<h2 id="create_final_covariates">create_final_covariates</h2>
|
<h2 id="create_final_covariates">create_final_covariates</h2>
|
||||||
<p>(Optional) If claim extraction is turned on, this is a list of the extracted covariates. Note that claims are typically oriented around identifying malicious behavior such as fraud, so they are not useful for all datasets.
|
<p>(Optional) If claim extraction is turned on, this is a list of the extracted covariates. Note that claims are typically oriented around identifying malicious behavior such as fraud, so they are not useful for all datasets.</p>
|
||||||
- covariate_type: str - This is always "claim" with our default covariates.
|
<table>
|
||||||
- type: str - Nature of the claim type.
|
<thead>
|
||||||
- description: str - LM-generated description of the behavior.
|
<tr>
|
||||||
- subject_id: str - Name of the source entity (that is performing the claimed behavior).
|
<th>name</th>
|
||||||
- object_id: str - Name of the target entity (that the claimed behavior is performed on).
|
<th>type</th>
|
||||||
- status: str [TRUE, FALSE, SUSPECTED] - LM-derived assessment of the correctness of the claim.
|
<th>description</th>
|
||||||
- start_date: str (ISO8601) - LM-derived start of the claimed activity.
|
</tr>
|
||||||
- end_date: str (ISO8601) - LM-derived end of the claimed activity.
|
</thead>
|
||||||
- source_text: str - Short string of text containing the claimed behavior.
|
<tbody>
|
||||||
- text_unit_id: str - ID of the text unit the claim text was extracted from.</p>
|
<tr>
|
||||||
|
<td>covariate_type</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>This is always "claim" with our default covariates.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>type</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Nature of the claim type.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>description</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-generated description of the behavior.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>subject_id</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Name of the source entity (that is performing the claimed behavior).</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>object_id</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Name of the target entity (that the claimed behavior is performed on).</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>status</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-derived assessment of the correctness of the claim. One of [TRUE, FALSE, SUSPECTED]</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>start_date</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-derived start of the claimed activity. ISO8601</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>end_date</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-derived end of the claimed activity. ISO8601</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>source_text</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Short string of text containing the claimed behavior.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>text_unit_id</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>ID of the text unit the claim text was extracted from.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
<h2 id="create_final_documents">create_final_documents</h2>
|
<h2 id="create_final_documents">create_final_documents</h2>
|
||||||
<p>List of document content after import.
|
<p>List of document content after import.</p>
|
||||||
- title: str - Filename, unless otherwise configured during CSV import.
|
<table>
|
||||||
- text: str - Full text of the document.
|
<thead>
|
||||||
- text_unit_ids: str[] - List of text units (chunks) that were parsed from the document.
|
<tr>
|
||||||
- attributes: dict (optional) - If specified during CSV import, this is a dict of attributes for the document.</p>
|
<th>name</th>
|
||||||
<h1 id="create_final_entities">create_final_entities</h1>
|
<th>type</th>
|
||||||
<p>List of all entities found in the data by the LM.
|
<th>description</th>
|
||||||
- title: str - Name of the entity.
|
</tr>
|
||||||
- type: str - Type of the entity. By default this will be "organization", "person", "geo", or "event" unless configured differently or auto-tuning is used.
|
</thead>
|
||||||
- description: str - Textual description of the entity. Entities may be found in many text units, so this is an LM-derived summary of all descriptions.
|
<tbody>
|
||||||
- text_unit_ids: str[] - List of the text units containing the entity.</p>
|
<tr>
|
||||||
<h1 id="create_final_nodes">create_final_nodes</h1>
|
<td>title</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Filename, unless otherwise configured during CSV import.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>text</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Full text of the document.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>text_unit_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of text units (chunks) that were parsed from the document.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>attributes</td>
|
||||||
|
<td>dict</td>
|
||||||
|
<td>(optional) If specified during CSV import, this is a dict of attributes for the document.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<h2 id="create_final_entities">create_final_entities</h2>
|
||||||
|
<p>List of all entities found in the data by the LM.</p>
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>name</th>
|
||||||
|
<th>type</th>
|
||||||
|
<th>description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>title</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Name of the entity.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>type</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Type of the entity. By default this will be "organization", "person", "geo", or "event" unless configured differently or auto-tuning is used.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>description</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Textual description of the entity. Entities may be found in many text units, so this is an LM-derived summary of all descriptions.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>text_unit_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of the text units containing the entity.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<h2 id="create_final_nodes">create_final_nodes</h2>
|
||||||
<p>This is graph-related information for the entities. It contains only information relevant to the graph such as community. There is an entry for each entity at every community level it is found within, so you may see "duplicate" entities.</p>
|
<p>This is graph-related information for the entities. It contains only information relevant to the graph such as community. There is an entry for each entity at every community level it is found within, so you may see "duplicate" entities.</p>
|
||||||
<p>Note that the ID fields match those in create_final_entities and can be used for joining if additional information about a node is required.
|
<p>Note that the ID fields match those in create_final_entities and can be used for joining if additional information about a node is required.</p>
|
||||||
- title: str - Name of the referenced entity. Duplicated from create_final_entities for convenient cross-referencing.
|
<table>
|
||||||
- community: int - Leiden community the node is found within. Entities are not always assigned a community (they may not be close enough to any), so they may have a ID of -1.
|
<thead>
|
||||||
- level: int - Level of the community the entity is in.
|
<tr>
|
||||||
- degree: int - Node degree (connectedness) in the graph.
|
<th>name</th>
|
||||||
- x: float - X position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0.
|
<th>type</th>
|
||||||
- y: float - Y position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0.</p>
|
<th>description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>title</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Name of the referenced entity. Duplicated from create_final_entities for convenient cross-referencing.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>community</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Leiden community the node is found within. Entities are not always assigned a community (they may not be close enough to any), so they may have a ID of -1.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>level</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Level of the community the entity is in.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>degree</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Node degree (connectedness) in the graph.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>x</td>
|
||||||
|
<td>float</td>
|
||||||
|
<td>X position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>y</td>
|
||||||
|
<td>float</td>
|
||||||
|
<td>Y position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
<h2 id="create_final_relationships">create_final_relationships</h2>
|
<h2 id="create_final_relationships">create_final_relationships</h2>
|
||||||
<p>List of all entity-to-entity relationships found in the data by the LM. This is also the <em>edge list</em> for the graph.
|
<p>List of all entity-to-entity relationships found in the data by the LM. This is also the <em>edge list</em> for the graph.</p>
|
||||||
- source: str - Name of the source entity.
|
<table>
|
||||||
- target: str - Name of the target entity.
|
<thead>
|
||||||
- description: str - LM-derived description of the relationship. Also see note for entity descriptions.
|
<tr>
|
||||||
- weight: float - Weight of the edge in the graph. This is summed from an LM-derived "strength" measure for each relationship instance.
|
<th>name</th>
|
||||||
- combined_degree: int - Sum of source and target node degrees.
|
<th>type</th>
|
||||||
- text_unit_ids: str[] - List of text units the relationship was found within.</p>
|
<th>description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>source</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Name of the source entity.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>target</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Name of the target entity.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>description</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>LM-derived description of the relationship. Also see note for entity descriptions.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>weight</td>
|
||||||
|
<td>float</td>
|
||||||
|
<td>Weight of the edge in the graph. This is summed from an LM-derived "strength" measure for each relationship instance.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>combined_degree</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Sum of source and target node degrees.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>text_unit_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of text units the relationship was found within.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
<h2 id="create_final_text_units">create_final_text_units</h2>
|
<h2 id="create_final_text_units">create_final_text_units</h2>
|
||||||
<p>List of all text chunks parsed from the input documents.
|
<p>List of all text chunks parsed from the input documents.</p>
|
||||||
- text: str - Raw full text of the chunk.
|
<table>
|
||||||
- n_tokens: int - Number of tokens in the chunk. This should normally match the <code>chunk_size</code> config parameter, except for the last chunk which is often shorter.
|
<thead>
|
||||||
- document_ids: str[] - List of document IDs the chunk came from. This is normally only 1 due to our default groupby, but for very short text documents (e.g., microblogs) it can be configured so text units span multiple documents.
|
<tr>
|
||||||
- entity_ids: str[] - List of entities found in the text unit.
|
<th>name</th>
|
||||||
- relationships_ids: str[] - List of relationships found in the text unit.
|
<th>type</th>
|
||||||
- covariate_ids: str[] - Optional list of covariates found in the text unit.</p>
|
<th>description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>text</td>
|
||||||
|
<td>str</td>
|
||||||
|
<td>Raw full text of the chunk.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>n_tokens</td>
|
||||||
|
<td>int</td>
|
||||||
|
<td>Number of tokens in the chunk. This should normally match the <code>chunk_size</code> config parameter, except for the last chunk which is often shorter.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>document_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of document IDs the chunk came from. This is normally only 1 due to our default groupby, but for very short text documents (e.g., microblogs) it can be configured so text units span multiple documents.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>entity_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of entities found in the text unit.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>relationships_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>List of relationships found in the text unit.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>covariate_ids</td>
|
||||||
|
<td>str[]</td>
|
||||||
|
<td>Optional list of covariates found in the text unit.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -1607,34 +1607,7 @@ After you have a config file you can run the pipeline using the CLI or the Pytho
|
|||||||
<a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a>yarn<span class="w"> </span>run:index<span class="w"> </span>--config<span class="w"> </span>your_pipeline.yml<span class="w"> </span><span class="c1"># custom config mode</span>
|
<a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a>yarn<span class="w"> </span>run:index<span class="w"> </span>--config<span class="w"> </span>your_pipeline.yml<span class="w"> </span><span class="c1"># custom config mode</span>
|
||||||
</code></pre></div>
|
</code></pre></div>
|
||||||
<h3 id="python-api">Python API</h3>
|
<h3 id="python-api">Python API</h3>
|
||||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-1-1" name="__codelineno-1-1" href="#__codelineno-1-1"></a><span class="kn">from</span> <span class="nn">graphrag.index</span> <span class="kn">import</span> <span class="n">run_pipeline</span>
|
<p>Please see the <a href="https://github.com/microsoft/graphrag/blob/main/examples/README.md">examples folder</a> for a handful of functional pipelines illustrating how to create and run via a custom settings.yml or through custom python scripts.</p>
|
||||||
<a id="__codelineno-1-2" name="__codelineno-1-2" href="#__codelineno-1-2"></a><span class="kn">from</span> <span class="nn">graphrag.index.config</span> <span class="kn">import</span> <span class="n">PipelineWorkflowReference</span>
|
|
||||||
<a id="__codelineno-1-3" name="__codelineno-1-3" href="#__codelineno-1-3"></a>
|
|
||||||
<a id="__codelineno-1-4" name="__codelineno-1-4" href="#__codelineno-1-4"></a><span class="n">workflows</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">PipelineWorkflowReference</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span>
|
|
||||||
<a id="__codelineno-1-5" name="__codelineno-1-5" href="#__codelineno-1-5"></a> <span class="n">PipelineWorkflowReference</span><span class="p">(</span>
|
|
||||||
<a id="__codelineno-1-6" name="__codelineno-1-6" href="#__codelineno-1-6"></a> <span class="n">steps</span><span class="o">=</span><span class="p">[</span>
|
|
||||||
<a id="__codelineno-1-7" name="__codelineno-1-7" href="#__codelineno-1-7"></a> <span class="p">{</span>
|
|
||||||
<a id="__codelineno-1-8" name="__codelineno-1-8" href="#__codelineno-1-8"></a> <span class="c1"># built-in verb</span>
|
|
||||||
<a id="__codelineno-1-9" name="__codelineno-1-9" href="#__codelineno-1-9"></a> <span class="s2">"verb"</span><span class="p">:</span> <span class="s2">"derive"</span><span class="p">,</span> <span class="c1"># https://github.com/microsoft/datashaper/blob/main/python/datashaper/datashaper/verbs/derive.py</span>
|
|
||||||
<a id="__codelineno-1-10" name="__codelineno-1-10" href="#__codelineno-1-10"></a> <span class="s2">"args"</span><span class="p">:</span> <span class="p">{</span>
|
|
||||||
<a id="__codelineno-1-11" name="__codelineno-1-11" href="#__codelineno-1-11"></a> <span class="s2">"column1"</span><span class="p">:</span> <span class="s2">"col1"</span><span class="p">,</span> <span class="c1"># from above</span>
|
|
||||||
<a id="__codelineno-1-12" name="__codelineno-1-12" href="#__codelineno-1-12"></a> <span class="s2">"column2"</span><span class="p">:</span> <span class="s2">"col2"</span><span class="p">,</span> <span class="c1"># from above</span>
|
|
||||||
<a id="__codelineno-1-13" name="__codelineno-1-13" href="#__codelineno-1-13"></a> <span class="s2">"to"</span><span class="p">:</span> <span class="s2">"col_multiplied"</span><span class="p">,</span> <span class="c1"># new column name</span>
|
|
||||||
<a id="__codelineno-1-14" name="__codelineno-1-14" href="#__codelineno-1-14"></a> <span class="s2">"operator"</span><span class="p">:</span> <span class="s2">"*"</span><span class="p">,</span> <span class="c1"># multiply the two columns</span>
|
|
||||||
<a id="__codelineno-1-15" name="__codelineno-1-15" href="#__codelineno-1-15"></a> <span class="p">},</span>
|
|
||||||
<a id="__codelineno-1-16" name="__codelineno-1-16" href="#__codelineno-1-16"></a> <span class="c1"># Since we're trying to act on the default input, we don't need explicitly to specify an input</span>
|
|
||||||
<a id="__codelineno-1-17" name="__codelineno-1-17" href="#__codelineno-1-17"></a> <span class="p">}</span>
|
|
||||||
<a id="__codelineno-1-18" name="__codelineno-1-18" href="#__codelineno-1-18"></a> <span class="p">]</span>
|
|
||||||
<a id="__codelineno-1-19" name="__codelineno-1-19" href="#__codelineno-1-19"></a> <span class="p">),</span>
|
|
||||||
<a id="__codelineno-1-20" name="__codelineno-1-20" href="#__codelineno-1-20"></a><span class="p">]</span>
|
|
||||||
<a id="__codelineno-1-21" name="__codelineno-1-21" href="#__codelineno-1-21"></a>
|
|
||||||
<a id="__codelineno-1-22" name="__codelineno-1-22" href="#__codelineno-1-22"></a><span class="n">dataset</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([{</span><span class="s2">"col1"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">"col2"</span><span class="p">:</span> <span class="mi">4</span><span class="p">},</span> <span class="p">{</span><span class="s2">"col1"</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="s2">"col2"</span><span class="p">:</span> <span class="mi">10</span><span class="p">}])</span>
|
|
||||||
<a id="__codelineno-1-23" name="__codelineno-1-23" href="#__codelineno-1-23"></a><span class="n">outputs</span> <span class="o">=</span> <span class="p">[]</span>
|
|
||||||
<a id="__codelineno-1-24" name="__codelineno-1-24" href="#__codelineno-1-24"></a><span class="k">async</span> <span class="k">for</span> <span class="n">output</span> <span class="ow">in</span> <span class="k">await</span> <span class="n">run_pipeline</span><span class="p">(</span><span class="n">dataset</span><span class="o">=</span><span class="n">dataset</span><span class="p">,</span> <span class="n">workflows</span><span class="o">=</span><span class="n">workflows</span><span class="p">):</span>
|
|
||||||
<a id="__codelineno-1-25" name="__codelineno-1-25" href="#__codelineno-1-25"></a> <span class="n">outputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
|
|
||||||
<a id="__codelineno-1-26" name="__codelineno-1-26" href="#__codelineno-1-26"></a><span class="n">pipeline_result</span> <span class="o">=</span> <span class="n">outputs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
|
|
||||||
<a id="__codelineno-1-27" name="__codelineno-1-27" href="#__codelineno-1-27"></a><span class="nb">print</span><span class="p">(</span><span class="n">pipeline_result</span><span class="p">)</span>
|
|
||||||
</code></pre></div>
|
|
||||||
<h2 id="further-reading">Further Reading</h2>
|
<h2 id="further-reading">Further Reading</h2>
|
||||||
<ul>
|
<ul>
|
||||||
<li>To start developing within the <em>GraphRAG</em> project, see <a href="../../developing/">getting started</a></li>
|
<li>To start developing within the <em>GraphRAG</em> project, see <a href="../../developing/">getting started</a></li>
|
||||||
|
File diff suppressed because one or more lines are too long
BIN
sitemap.xml.gz
BIN
sitemap.xml.gz
Binary file not shown.
@ -3,6 +3,8 @@
|
|||||||
--md-code-hl-color: #3772d9;
|
--md-code-hl-color: #3772d9;
|
||||||
--md-code-hl-comment-color: #6b6b6b;
|
--md-code-hl-comment-color: #6b6b6b;
|
||||||
--md-code-hl-operator-color: #6b6b6b;
|
--md-code-hl-operator-color: #6b6b6b;
|
||||||
|
--md-footer-fg-color--light: #ffffff;
|
||||||
|
--md-footer-fg-color--lighter: #ffffff;
|
||||||
}
|
}
|
||||||
|
|
||||||
[data-md-color-scheme="slate"] {
|
[data-md-color-scheme="slate"] {
|
||||||
@ -10,6 +12,8 @@
|
|||||||
--md-code-hl-color: #246be5;
|
--md-code-hl-color: #246be5;
|
||||||
--md-code-hl-constant-color: #9a89ed;
|
--md-code-hl-constant-color: #9a89ed;
|
||||||
--md-code-hl-number-color: #f16e5f;
|
--md-code-hl-number-color: #f16e5f;
|
||||||
|
--md-footer-fg-color--light: #ffffff;
|
||||||
|
--md-footer-fg-color--lighter: #ffffff;
|
||||||
}
|
}
|
||||||
|
|
||||||
.md-tabs__item--active {
|
.md-tabs__item--active {
|
||||||
|
Loading…
x
Reference in New Issue
Block a user