graphrag/docs/examples_notebooks/multi_index_search.ipynb

836 lines
42 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Copyright (c) 2024 Microsoft Corporation.\n",
"# Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi Index Search\n",
"This notebook demonstrates multi-index search using the GraphRAG API.\n",
"\n",
"Indexes created from Wikipedia state articles for Alaska, California, DC, Maryland, NY and Washington are used."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['alaska', 'california', 'dc', 'maryland', 'ny', 'washington']\n"
]
}
],
"source": [
"import asyncio\n",
"\n",
"import pandas as pd\n",
"\n",
"from graphrag.api.query import (\n",
" multi_index_basic_search,\n",
" multi_index_drift_search,\n",
" multi_index_global_search,\n",
" multi_index_local_search,\n",
")\n",
"from graphrag.config.create_graphrag_config import create_graphrag_config\n",
"\n",
"indexes = [\"alaska\", \"california\", \"dc\", \"maryland\", \"ny\", \"washington\"]\n",
"indexes = sorted(indexes)\n",
"\n",
"print(indexes)\n",
"\n",
"vector_store_configs = {\n",
" index: {\n",
" \"type\": \"lancedb\",\n",
" \"db_uri\": f\"inputs/{index}/lancedb\",\n",
" \"container_name\": \"default\",\n",
" \"overwrite\": True,\n",
" \"index_name\": f\"{index}\",\n",
" }\n",
" for index in indexes\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"config_data = {\n",
" \"models\": {\n",
" \"default_chat_model\": {\n",
" \"model_supports_json\": True,\n",
" \"parallelization_num_threads\": 50,\n",
" \"parallelization_stagger\": 0.3,\n",
" \"async_mode\": \"threaded\",\n",
" \"type\": \"azure_openai_chat\",\n",
" \"model\": \"gpt-4o\",\n",
" \"auth_type\": \"azure_managed_identity\",\n",
" \"api_base\": \"<API_BASE_URL>\",\n",
" \"api_version\": \"2024-02-15-preview\",\n",
" \"deployment_name\": \"gpt-4o\",\n",
" },\n",
" \"default_embedding_model\": {\n",
" \"parallelization_num_threads\": 50,\n",
" \"parallelization_stagger\": 0.3,\n",
" \"async_mode\": \"threaded\",\n",
" \"type\": \"azure_openai_embedding\",\n",
" \"model\": \"text-embedding-3-large\",\n",
" \"auth_type\": \"azure_managed_identity\",\n",
" \"api_base\": \"<API_BASE_URL>\",\n",
" \"api_version\": \"2024-02-15-preview\",\n",
" \"deployment_name\": \"text-embedding-3-large\",\n",
" },\n",
" },\n",
" \"vector_store\": vector_store_configs,\n",
" \"local_search\": {\n",
" \"prompt\": \"prompts/local_search_system_prompt.txt\",\n",
" \"llm_max_tokens\": 12000,\n",
" },\n",
" \"global_search\": {\n",
" \"map_prompt\": \"prompts/global_search_map_system_prompt.txt\",\n",
" \"reduce_prompt\": \"prompts/global_search_reduce_system_prompt.txt\",\n",
" \"knowledge_prompt\": \"prompts/global_search_knowledge_system_prompt.txt\",\n",
" },\n",
" \"drift_search\": {\n",
" \"prompt\": \"prompts/drift_search_system_prompt.txt\",\n",
" \"reduce_prompt\": \"prompts/drift_search_reduce_prompt.txt\",\n",
" },\n",
" \"basic_search\": {\"prompt\": \"prompts/basic_search_system_prompt.txt\"},\n",
"}\n",
"parameters = create_graphrag_config(config_data, \".\")\n",
"loop = asyncio.get_event_loop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multi-index Global Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"entities = [pd.read_parquet(f\"inputs/{index}/entities.parquet\") for index in indexes]\n",
"communities = [\n",
" pd.read_parquet(f\"inputs/{index}/communities.parquet\") for index in indexes\n",
"]\n",
"community_reports = [\n",
" pd.read_parquet(f\"inputs/{index}/community_reports.parquet\") for index in indexes\n",
"]\n",
"\n",
"task = loop.create_task(\n",
" multi_index_global_search(\n",
" parameters,\n",
" entities,\n",
" communities,\n",
" community_reports,\n",
" indexes,\n",
" 1,\n",
" False,\n",
" \"Multiple Paragraphs\",\n",
" False,\n",
" \"Describe this dataset.\",\n",
" )\n",
")\n",
"results = await task"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print report"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Overview of the Dataset\n",
"\n",
"The dataset is a comprehensive collection of reports that cover a wide array of topics, including historical events, cultural dynamics, economic influences, geographical regions, and environmental issues across various regions in the United States. Each report is uniquely identified by an ID and includes a title, occurrence weight, content, and rank. These elements help to organize the dataset and provide insights into the significance and relevance of each report.\n",
"\n",
"## Content and Structure\n",
"\n",
"The reports provide detailed information about specific entities and their relationships, highlighting their importance and impact in different contexts. Topics range from the historical significance of regions like Maryland and Washington D.C., to the cultural and economic landscapes of areas such as Washington State and Los Angeles. The dataset also delves into significant events and figures, such as the Good Friday Earthquake, the Trans-Alaska Pipeline, and the role of Jimi Hendrix in Seattle's cultural heritage [Data: Reports (120, 129, 40, 16, +more)].\n",
"\n",
"## Key Features\n",
"\n",
"Each report is structured into sections that provide insights into the main topics discussed, supported by data references to entities and relationships. The occurrence weight and rank of each report may indicate its relevance and significance within the dataset. This structure allows for a comprehensive understanding of the topics discussed, emphasizing the interconnectedness of various entities and their roles in broader socio-economic and cultural contexts.\n",
"\n",
"## Topics Covered\n",
"\n",
"The dataset includes a diverse range of topics, such as the strategic geopolitical position of Alaska, the cultural and economic significance of California, the historical and geographical significance of New York State, and the environmental health concerns in Washington. It also covers political transitions, such as the governorship change in Maryland and the 2022 special election in Alaska [Data: Reports (204, 143, 85, 122, 83, +more)].\n",
"\n",
"## Conclusion\n",
"\n",
"Overall, the dataset serves as a valuable resource for understanding the complexities and interdependencies of historical, cultural, economic, and geographical factors in shaping the identity and development of various regions in the United States. The detailed narratives and data references provide a multifaceted perspective on each topic, making it a rich source of information for research and analysis.\n"
]
}
],
"source": [
"print(results[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show context links back to original index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"120 dc 26\n",
"Washington D.C. Founders and Influences\n",
"Washington D.C. Founders and Influences\n",
"129 dc 35\n",
"Smithsonian Institution and Its Museums\n",
"Smithsonian Institution and Its Museums\n",
"40 alaska 40\n",
"Good Friday Earthquake and Its Global Impact\n",
"Good Friday Earthquake and Its Global Impact\n",
"16 alaska 16\n",
"Trans-Alaska Pipeline and Prudhoe Bay\n",
"Trans-Alaska Pipeline and Prudhoe Bay\n",
"204 ny 36\n",
"Long Island and its Educational and Cultural Landscape\n",
"Long Island and its Educational and Cultural Landscape\n",
"143 maryland 5\n",
"Western Maryland and Appalachian Region\n",
"Western Maryland and Appalachian Region\n",
"85 california 38\n",
"California and Its Historical and Geopolitical Context\n",
"California and Its Historical and Geopolitical Context\n",
"122 dc 28\n",
"District of Columbia and Legal Framework\n",
"District of Columbia and Legal Framework\n",
"83 california 36\n",
"Southern California and Key Geographical Entities\n",
"Southern California and Key Geographical Entities\n"
]
}
],
"source": [
"for report_id in [120, 129, 40, 16, 204, 143, 85, 122, 83]:\n",
" index_name = [i for i in results[1][\"reports\"] if i[\"id\"] == str(report_id)][0][ # noqa: RUF015\n",
" \"index_name\"\n",
" ]\n",
" index_id = [i for i in results[1][\"reports\"] if i[\"id\"] == str(report_id)][0][ # noqa: RUF015\n",
" \"index_id\"\n",
" ]\n",
" print(report_id, index_name, index_id)\n",
" index_reports = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_community_reports.parquet\"\n",
" )\n",
" print([i for i in results[1][\"reports\"] if i[\"id\"] == str(report_id)][0][\"title\"]) # noqa: RUF015\n",
" print(\n",
" index_reports[index_reports[\"community\"] == int(index_id)][\"title\"].to_numpy()[\n",
" 0\n",
" ]\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Multi-index Local Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nodes = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_nodes.parquet\") for index in indexes\n",
"]\n",
"entities = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_entities.parquet\")\n",
" for index in indexes\n",
"]\n",
"community_reports = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_community_reports.parquet\")\n",
" for index in indexes\n",
"]\n",
"covariates = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_covariates.parquet\")\n",
" for index in indexes\n",
"]\n",
"text_units = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_text_units.parquet\")\n",
" for index in indexes\n",
"]\n",
"relationships = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_relationships.parquet\")\n",
" for index in indexes\n",
"]\n",
"\n",
"task = loop.create_task(\n",
" multi_index_local_search(\n",
" parameters,\n",
" nodes,\n",
" entities,\n",
" community_reports,\n",
" text_units,\n",
" relationships,\n",
" covariates,\n",
" indexes,\n",
" 1,\n",
" \"Multiple Paragraphs\",\n",
" False,\n",
" \"weather\",\n",
" )\n",
")\n",
"results = await task"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print report"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"### Weather Patterns in California and Washington\n",
"\n",
"#### California's Climate\n",
"\n",
"California exhibits a wide range of climates due to its diverse geography, which includes coastal areas, mountains, and deserts. The state experiences a Mediterranean climate in the Central Valley and coastal regions, characterized by wet winters and dry summers. The Sierra Nevada mountains have an alpine climate with snow in winter and mild summers, while the eastern side of the mountains creates rain shadows, leading to desert conditions in areas like Death Valley, which is one of the hottest places on Earth [Data: Reports (47); Entities (500, 502, 506)].\n",
"\n",
"The state's climate diversity results in varying weather patterns, with northern regions receiving more rainfall than the south. The demand for water is high due to these climatic variations, and droughts have become more frequent, exacerbated by climate change and overextraction of water resources [Data: Reports (47); Claims (100)].\n",
"\n",
"#### Washington's Climate\n",
"\n",
"Washington State's climate is influenced by its location in the Pacific Northwest and its varied topography, including the Cascade Range and the Olympic Mountains. Western Washington has a marine climate with mild temperatures and significant rainfall, especially on the windward side of the mountains. The region is known for its cloudy and rainy weather, particularly in the winter months [Data: Reports (213); Sources (89)].\n",
"\n",
"Eastern Washington, in contrast, experiences a semi-arid climate due to the rain shadow effect of the Cascades. This area has less precipitation and more extreme temperature variations, with hot summers and cold winters. The state is also affected by climate patterns such as the Southern Oscillation, which includes El Niño and La Niña phases, impacting precipitation and temperature [Data: Entities (1960, 1961, 1962); Relationships (1805, 1806)].\n",
"\n",
"#### Environmental Impacts\n",
"\n",
"Both states face environmental challenges related to their weather patterns. California's diverse ecosystems are threatened by urbanization, logging, and climate change, which have led to increased wildfire risks and water scarcity. Efforts to manage these issues include water conservation projects and initiatives to revive traditional land management practices, such as controlled burns [Data: Reports (47)].\n",
"\n",
"Washington's commitment to environmental sustainability is reflected in its conservation efforts and the protection of natural areas like national parks. The state's weather patterns, influenced by atmospheric phenomena like the Pineapple Express, bring heavy rainfall, which can lead to flooding and other environmental impacts [Data: Reports (213); Entities (1950)].\n",
"\n",
"In summary, both California and Washington have unique weather patterns shaped by their geography and climate influences. These patterns have significant implications for environmental management and resource conservation in each state.\n"
]
}
],
"source": [
"print(results[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show context links back to original index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"47 california 0\n",
"California: A Hub of Cultural, Economic, and Environmental Significance\n",
"California: A Hub of Cultural, Economic, and Environmental Significance\n",
"213 washington 0\n",
"Washington State: Economic and Cultural Hub\n",
"Washington State: Economic and Cultural Hub\n",
"500 california 161\n",
"Boca is a location in California where the lowest temperature in the state, 45 °F, was recorded on \n",
"Boca is a location in California where the lowest temperature in the state, 45 °F, was recorded on \n",
"502 california 163\n",
"Mammoth is a location in the Sierra Nevada, California, known for its mountain climate\n",
"Mammoth is a location in the Sierra Nevada, California, known for its mountain climate\n",
"506 california 167\n",
"Eureka is a city in California known for its cool summers in the Humboldt Bay region\n",
"Eureka is a city in California known for its cool summers in the Humboldt Bay region\n",
"1960 washington 104\n",
"The Southern Oscillation is a climate pattern that influences weather during the cold season, affect\n",
"The Southern Oscillation is a climate pattern that influences weather during the cold season, affect\n",
"1961 washington 105\n",
"El Niño is a phase of the Southern Oscillation that causes drier and less snowy conditions in Washin\n",
"El Niño is a phase of the Southern Oscillation that causes drier and less snowy conditions in Washin\n",
"1962 washington 106\n",
"La Niña is a phase of the Southern Oscillation that causes more rain and snow in Washington\n",
"La Niña is a phase of the Southern Oscillation that causes more rain and snow in Washington\n",
"1805 washington 92\n",
"El Niño is a phase of the Southern Oscillation\n",
"El Niño is a phase of the Southern Oscillation\n",
"1806 washington 93\n",
"La Niña is a phase of the Southern Oscillation\n",
"La Niña is a phase of the Southern Oscillation\n",
"1806 california 35\n",
"The lowest temperature in California was 45 °F (43 °C) recorded in Boca on January 20, 1937.\n",
"The lowest temperature in California was 45 °F (43 °C) recorded in Boca on January 20, 1937.\n"
]
}
],
"source": [
"for report_id in [47, 213]:\n",
" index_name = [i for i in results[1][\"reports\"] if i[\"id\"] == str(report_id)][0][ # noqa: RUF015\n",
" \"index_name\"\n",
" ]\n",
" index_id = [i for i in results[1][\"reports\"] if i[\"id\"] == str(report_id)][0][ # noqa: RUF015\n",
" \"index_id\"\n",
" ]\n",
" print(report_id, index_name, index_id)\n",
" index_reports = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_community_reports.parquet\"\n",
" )\n",
" print([i for i in results[1][\"reports\"] if i[\"id\"] == str(report_id)][0][\"title\"]) # noqa: RUF015\n",
" print(\n",
" index_reports[index_reports[\"community\"] == int(index_id)][\"title\"].to_numpy()[\n",
" 0\n",
" ]\n",
" )\n",
"for entity_id in [500, 502, 506, 1960, 1961, 1962]:\n",
" index_name = [i for i in results[1][\"entities\"] if i[\"id\"] == str(entity_id)][0][ # noqa: RUF015\n",
" \"index_name\"\n",
" ]\n",
" index_id = [i for i in results[1][\"entities\"] if i[\"id\"] == str(entity_id)][0][ # noqa: RUF015\n",
" \"index_id\"\n",
" ]\n",
" print(entity_id, index_name, index_id)\n",
" index_entities = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_entities.parquet\"\n",
" )\n",
" print(\n",
" [i for i in results[1][\"entities\"] if i[\"id\"] == str(entity_id)][0][ # noqa: RUF015\n",
" \"description\"\n",
" ][:100]\n",
" )\n",
" print(\n",
" index_entities[index_entities[\"human_readable_id\"] == int(index_id)][\n",
" \"description\"\n",
" ].to_numpy()[0][:100]\n",
" )\n",
"for relationship_id in [1805, 1806]:\n",
" index_name = [ # noqa: RUF015\n",
" i for i in results[1][\"relationships\"] if i[\"id\"] == str(relationship_id)\n",
" ][0][\"index_name\"]\n",
" index_id = [ # noqa: RUF015\n",
" i for i in results[1][\"relationships\"] if i[\"id\"] == str(relationship_id)\n",
" ][0][\"index_id\"]\n",
" print(relationship_id, index_name, index_id)\n",
" index_relationships = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_relationships.parquet\"\n",
" )\n",
" print(\n",
" [i for i in results[1][\"relationships\"] if i[\"id\"] == str(relationship_id)][0][ # noqa: RUF015\n",
" \"description\"\n",
" ]\n",
" )\n",
" print(\n",
" index_relationships[index_relationships[\"human_readable_id\"] == int(index_id)][\n",
" \"description\"\n",
" ].to_numpy()[0]\n",
" )\n",
"for claim_id in [100]:\n",
" index_name = [i for i in results[1][\"claims\"] if i[\"id\"] == str(claim_id)][0][ # noqa: RUF015\n",
" \"index_name\"\n",
" ]\n",
" index_id = [i for i in results[1][\"claims\"] if i[\"id\"] == str(claim_id)][0][ # noqa: RUF015\n",
" \"index_id\"\n",
" ]\n",
" print(relationship_id, index_name, index_id)\n",
" index_claims = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_covariates.parquet\"\n",
" )\n",
" print(\n",
" [i for i in results[1][\"claims\"] if i[\"id\"] == str(claim_id)][0][\"description\"] # noqa: RUF015\n",
" )\n",
" print(\n",
" index_claims[index_claims[\"human_readable_id\"] == int(index_id)][\n",
" \"description\"\n",
" ].to_numpy()[0]\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multi-index Drift Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nodes = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_nodes.parquet\") for index in indexes\n",
"]\n",
"entities = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_entities.parquet\")\n",
" for index in indexes\n",
"]\n",
"community_reports = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_community_reports.parquet\")\n",
" for index in indexes\n",
"]\n",
"text_units = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_text_units.parquet\")\n",
" for index in indexes\n",
"]\n",
"relationships = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_relationships.parquet\")\n",
" for index in indexes\n",
"]\n",
"\n",
"task = loop.create_task(\n",
" multi_index_drift_search(\n",
" parameters,\n",
" nodes,\n",
" entities,\n",
" community_reports,\n",
" text_units,\n",
" relationships,\n",
" indexes,\n",
" 1,\n",
" \"Multiple Paragraphs\",\n",
" False,\n",
" \"agriculture\",\n",
" )\n",
")\n",
"results = await task"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print report"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"### Overview of Agriculture in Key U.S. Regions\n",
"\n",
"Agriculture in the United States is a diverse and regionally varied industry, with different areas specializing in specific crops and facing unique challenges. This overview highlights the agricultural dynamics in several key regions, including California, Washington, and Alaska, as well as the role of agriculture in the broader economic and environmental context.\n",
"\n",
"#### California's Agricultural Landscape\n",
"\n",
"California is a powerhouse in U.S. agriculture, with the Central Valley being a critical area for crop production. The region is known for producing a wide variety of crops, including almonds, grapes, and dairy products, supported by fertile soil and a favorable climate [Data: Sources (16, 29)]. However, water management is a significant challenge due to the state's dry climate and frequent droughts. The Sacramento and San Joaquin Rivers are vital for irrigation, but water scarcity remains a persistent issue, impacting crop yields and farming costs [Data: Sources (24, 21)].\n",
"\n",
"In Southern California, the agricultural sector is characterized by the production of citrus fruits, avocados, and strawberries. The region's Mediterranean climate is ideal for these crops, but water scarcity and urbanization pose challenges to agricultural expansion [Data: Reports (47); Sources (19, 20, 21, 22, 24)].\n",
"\n",
"#### Washington's Agricultural Contributions\n",
"\n",
"Washington State is a leading producer of apples, with the Yakima and WenatcheeOkanogan regions being major contributors to the state's agricultural output. The state's climate, with dry, warm summers and cold winters, is ideal for apple cultivation, supported by extensive irrigation systems from the Columbia River [Data: Sources (93)]. Washington also produces significant quantities of hops, cherries, and potatoes, contributing to its diverse agricultural economy [Data: Sources (93)].\n",
"\n",
"The Columbia River plays a crucial role in supporting agriculture in Washington, providing essential irrigation for the Columbia Basin. However, environmental challenges such as water quality and climate change impact agricultural practices, necessitating sustainable water management strategies [Data: Reports (236); Sources (95)].\n",
"\n",
"#### Alaska's Agricultural Scene\n",
"\n",
"In Alaska, the Tanana Valley, particularly the Delta Junction area, is a notable agricultural region known for producing barley and hay. The region's short growing season is offset by long summer days, which provide ample sunlight for crop growth [Data: Sources (10)]. The development of local agriculture is supported by state programs and initiatives like the Alaska Grown program, which promotes local produce and supports farmers [Data: Sources (10)].\n",
"\n",
"#### Environmental and Economic Interplay\n",
"\n",
"Agriculture in these regions is deeply intertwined with environmental and economic factors. Water management is a common challenge across all areas, with efforts focused on improving irrigation efficiency and adopting sustainable practices to mitigate the impacts of climate change and water scarcity. Additionally, the economic contributions of agriculture are significant, providing employment and supporting local economies, but they also require balancing with environmental conservation efforts to ensure long-term sustainability.\n",
"\n",
"Overall, agriculture in the U.S. is a complex and dynamic sector, shaped by regional characteristics and broader environmental and economic trends. The ongoing challenges of water management, climate change, and economic diversification highlight the need for innovative solutions and adaptive strategies to sustain agricultural productivity and support rural communities.\n"
]
}
],
"source": [
"print(results[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show context links back to original index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"What strategies is the USDA implementing in California to combat drought effects on agriculture? 47 california 0\n",
"California: A Hub of Cultural, Economic, and Environmental Significance\n",
"California: A Hub of Cultural, Economic, and Environmental Significance\n",
"What environmental challenges affect agriculture around the Columbia River? 236 washington 23\n",
"Columbia River and Its Regional Impact\n",
"Columbia River and Its Regional Impact\n",
"How does agriculture in the Tanana Valley impact the local economy? 10 alaska 10\n",
" Fort Greely. This area was largely set aside and developed under a state program spearheaded by Hammond during his second term as governor. Delta-area crops consist predominantly of barley and hay. West of Fairbanks lies another concentration of sma\n",
" Fort Greely. This area was largely set aside and developed under a state program spearheaded by Hammond during his second term as governor. Delta-area crops consist predominantly of barley and hay. West of Fairbanks lies another concentration of sma\n",
"What are the major crops produced in California's Central Valley, and how are they impacted by river water management? 16 california 0\n",
"California is a state in the Western United States, lying on the American Pacific Coast. It borders Oregon to the north, Nevada and Arizona to the east, and an international border with the Mexican state of Baja California to the south. With nearly 3\n",
"California is a state in the Western United States, lying on the American Pacific Coast. It borders Oregon to the north, Nevada and Arizona to the east, and an international border with the Mexican state of Baja California to the south. With nearly 3\n",
"What are the major crops produced in California's Central Valley, and how are they impacted by river water management? 19 california 3\n",
" population of San Francisco increased from 500 to 150,000. \n",
"\n",
"The seat of government for California under Spanish and later Mexican rule had been located in Monterey from 1777 until 1845. Pio Pico, the last Mexican governor of Alta California, had br\n",
" population of San Francisco increased from 500 to 150,000. \n",
"\n",
"The seat of government for California under Spanish and later Mexican rule had been located in Monterey from 1777 until 1845. Pio Pico, the last Mexican governor of Alta California, had br\n",
"What strategies is the USDA implementing in California to combat drought effects on agriculture? 20 california 4\n",
" Alien Land Act, excluding Asian immigrants from owning land. During World War II, Japanese Americans in California were interned in concentration camps; in 2020, California apologized.\n",
"Migration to California accelerated during the early 20th centur\n",
" Alien Land Act, excluding Asian immigrants from owning land. During World War II, Japanese Americans in California were interned in concentration camps; in 2020, California apologized.\n",
"Migration to California accelerated during the early 20th centur\n",
"What strategies is the USDA implementing in California to combat drought effects on agriculture? 21 california 5\n",
"ias region of North America, alongside Baja California Sur).\n",
"In the middle of the state lies the California Central Valley, bounded by the Sierra Nevada in the east, the coastal mountain ranges in the west, the Cascade Range to the north and by the T\n",
"ias region of North America, alongside Baja California Sur).\n",
"In the middle of the state lies the California Central Valley, bounded by the Sierra Nevada in the east, the coastal mountain ranges in the west, the Cascade Range to the north and by the T\n",
"What strategies is the USDA implementing in California to combat drought effects on agriculture? 22 california 6\n",
" seen in the climate of the Bay Area, where areas sheltered from the ocean experience significantly hotter summers and colder winters in contrast with nearby areas closer to the ocean.\n",
"\n",
"Northern parts of the state have more rain than the south. Calif\n",
" seen in the climate of the Bay Area, where areas sheltered from the ocean experience significantly hotter summers and colder winters in contrast with nearby areas closer to the ocean.\n",
"\n",
"Northern parts of the state have more rain than the south. Calif\n",
"What strategies is the USDA implementing in California to combat drought effects on agriculture? 24 california 8\n",
" and Trinity Rivers drain a large area in far northwestern California. The Eel River and Salinas River each drain portions of the California coast, north and south of San Francisco Bay, respectively. The Mojave River is the primary watercourse in the\n",
" and Trinity Rivers drain a large area in far northwestern California. The Eel River and Salinas River each drain portions of the California coast, north and south of San Francisco Bay, respectively. The Mojave River is the primary watercourse in the\n",
"What strategies is the USDA implementing in California to combat drought effects on agriculture? 29 california 13\n",
" Los Angeles and the Port of Long Beach in Southern California collectively play a pivotal role in the global supply chain, together hauling in about 40% of all imports to the United States by TEU volume. The Port of Oakland and Port of Hueneme are t\n",
" Los Angeles and the Port of Long Beach in Southern California collectively play a pivotal role in the global supply chain, together hauling in about 40% of all imports to the United States by TEU volume. The Port of Oakland and Port of Hueneme are t\n",
"How has Hanford's historical role affected current environmental policies in Eastern Washington? 93 washington 8\n",
"Washington is a leading agricultural state. For 2018, the total value of Washington's agricultural products was $10.6 billion. In 2014, Washington ranked first in the nation in production of red raspberries (90.5 percent of total U.S. production), ho\n",
"Washington is a leading agricultural state. For 2018, the total value of Washington's agricultural products was $10.6 billion. In 2014, Washington ranked first in the nation in production of red raspberries (90.5 percent of total U.S. production), ho\n",
"How has Hanford's historical role affected current environmental policies in Eastern Washington? 95 washington 10\n",
", dioxins, two chlorinated pesticides, DDE, dieldrin and PBDEs. As a result of the study, the department will investigate the sources of PCBs in the Wenatchee River, where unhealthy levels of PCBs were found in mountain whitefish. Based on the 2007 i\n",
", dioxins, two chlorinated pesticides, DDE, dieldrin and PBDEs. As a result of the study, the department will investigate the sources of PCBs in the Wenatchee River, where unhealthy levels of PCBs were found in mountain whitefish. Based on the 2007 i\n"
]
}
],
"source": [
"for report_id in [47, 236]:\n",
" for question in results[1]:\n",
" resq = results[1][question]\n",
" if len(resq[\"reports\"]) == 0:\n",
" continue\n",
" if len([i for i in resq[\"reports\"] if i[\"id\"] == str(report_id)]) == 0:\n",
" continue\n",
" index_name = [i for i in resq[\"reports\"] if i[\"id\"] == str(report_id)][0][ # noqa: RUF015\n",
" \"index_name\"\n",
" ]\n",
" index_id = [i for i in resq[\"reports\"] if i[\"id\"] == str(report_id)][0][ # noqa: RUF015\n",
" \"index_id\"\n",
" ]\n",
" print(question, report_id, index_name, index_id)\n",
" index_reports = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_community_reports.parquet\"\n",
" )\n",
" print([i for i in resq[\"reports\"] if i[\"id\"] == str(report_id)][0][\"title\"]) # noqa: RUF015\n",
" print(\n",
" index_reports[index_reports[\"community\"] == int(index_id)][\n",
" \"title\"\n",
" ].to_numpy()[0]\n",
" )\n",
" break\n",
"for source_id in [10, 16, 19, 20, 21, 22, 24, 29, 93, 95]:\n",
" for question in results[1]:\n",
" resq = results[1][question]\n",
" if len(resq[\"sources\"]) == 0:\n",
" continue\n",
" if len([i for i in resq[\"sources\"] if i[\"id\"] == str(source_id)]) == 0:\n",
" continue\n",
" index_name = [i for i in resq[\"sources\"] if i[\"id\"] == str(source_id)][0][ # noqa: RUF015\n",
" \"index_name\"\n",
" ]\n",
" index_id = [i for i in resq[\"sources\"] if i[\"id\"] == str(source_id)][0][ # noqa: RUF015\n",
" \"index_id\"\n",
" ]\n",
" print(question, source_id, index_name, index_id)\n",
" index_sources = pd.read_parquet(\n",
" f\"inputs/{index_name}/create_final_text_units.parquet\"\n",
" )\n",
" print(\n",
" [i for i in resq[\"sources\"] if i[\"id\"] == str(source_id)][0][\"text\"][:250] # noqa: RUF015\n",
" )\n",
" print(index_sources.loc[int(index_id)][\"text\"][:250])\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multi-index Basic Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text_units = [\n",
" pd.read_parquet(f\"inputs/{index}/create_final_text_units.parquet\")\n",
" for index in indexes\n",
"]\n",
"\n",
"task = loop.create_task(\n",
" multi_index_basic_search(\n",
" parameters, text_units, indexes, False, \"industry in maryland\"\n",
" )\n",
")\n",
"results = await task"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print report"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# Industry in Maryland\n",
"\n",
"Maryland's economy is diverse and robust, with significant contributions from various sectors, including manufacturing, biotechnology, transportation, and agriculture. The state's strategic location near Washington, D.C., and its access to major transportation hubs like the Port of Baltimore, play a crucial role in its industrial landscape.\n",
"\n",
"## Manufacturing\n",
"\n",
"Manufacturing in Maryland is highly diversified, with no single sub-sector contributing more than 20% of the total. Key manufacturing industries include electronics, computer equipment, and chemicals. Historically, the primary metals sub-sector was significant, with the Sparrows Point steel factory once being the largest in the world. However, this sector has faced challenges from foreign competition, bankruptcies, and mergers [Data: Sources (0, 1)].\n",
"\n",
"## Biotechnology\n",
"\n",
"Maryland is a major center for life sciences research and development, hosting more than 400 biotechnology companies, making it the fourth largest nexus in this field in the United States. The state is home to prominent institutions and government agencies involved in research and development, such as Johns Hopkins University, the National Institutes of Health (NIH), and the Food and Drug Administration (FDA) [Data: Sources (0, 1)].\n",
"\n",
"## Transportation and the Port of Baltimore\n",
"\n",
"Transportation is a significant service activity in Maryland, centered around the Port of Baltimore. The port is a major hub for imports, particularly raw materials and bulk commodities, and is the number one auto port in the U.S. The port's strategic location allows for efficient distribution to manufacturing centers in the inland Midwest [Data: Sources (0, 1)].\n",
"\n",
"## Agriculture and Food Production\n",
"\n",
"Agriculture remains an important part of Maryland's economy, with large areas of fertile land in the coastal and Piedmont zones. The state is known for dairy farming, specialty horticulture crops, and a significant chicken-farming sector. Maryland's food-processing plants are the most significant type of manufacturing by value in the state [Data: Sources (0, 1)].\n",
"\n",
"## Conclusion\n",
"\n",
"Maryland's industrial landscape is characterized by its diversity and strategic advantages, including proximity to federal government operations and major transportation routes. The state's economy benefits from a mix of traditional industries like manufacturing and agriculture, alongside cutting-edge sectors such as biotechnology and transportation. This combination positions Maryland as a dynamic player in the national economy.\n"
]
}
],
"source": [
"print(results[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show context links back to original text\n",
"\n",
"Note that original index name is not saved in context data for basic search"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" highly diversified with no sub-sector contributing over 20 percent of the total. Typical forms of manufacturing include electronics, computer equipment, and chemicals. The once-mighty primary metals sub-sector, which once included what was then the \n",
"20%. Demographically, both Protestants and those identifying with no religion are more numerous than Catholics.\n",
"According to the Pew Research Center in 2014, 69 percent of Maryland's population identifies themselves as Christian. Nearly 52% of the ad\n"
]
}
],
"source": [
"for source_id in [0, 1]:\n",
" print(results[1][\"sources\"][source_id][\"text\"][:250])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}