haystack/tutorials/Tutorial10_Knowledge_Graph.ipynb
Sara Zan 13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00

289 lines
9.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Question Answering on a Knowledge Graph\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial10_Knowledge_Graph.ipynb)\n",
"\n",
"Haystack allows storing and querying knowledge graphs with the help of pre-trained models that translate text queries to SPARQL queries.\n",
"This tutorial demonstrates how to load an existing knowledge graph into haystack, load a pre-trained retriever, and execute text queries on the knowledge graph.\n",
"The training of models that translate text queries into SPARQL queries is currently not supported."
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Install the latest release of Haystack in your own environment\n",
"#! pip install farm-haystack\n",
"\n",
"# Install the latest master of Haystack\n",
"!pip install grpcio-tools==1.34.1\n",
"!pip install git+https://github.com/deepset-ai/haystack.git\n",
"\n",
"# If you run this notebook on Google Colab, you might need to\n",
"# restart the runtime after installing haystack."
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Here are some imports that we'll need\n",
"\n",
"import subprocess\n",
"import time\n",
"from pathlib import Path\n",
"\n",
"from haystack.nodes import Text2SparqlRetriever\n",
"from haystack.document_stores import GraphDBKnowledgeGraph\n",
"from haystack.utils import fetch_archive_from_http"
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Downloading Knowledge Graph and Model"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Let's first fetch some triples that we want to store in our knowledge graph\n",
"# Here: exemplary triples from the wizarding world\n",
"graph_dir = \"../data/tutorial10_knowledge_graph/\"\n",
"s3_url = \"https://fandom-qa.s3-eu-west-1.amazonaws.com/triples_and_config.zip\"\n",
"fetch_archive_from_http(url=s3_url, output_dir=graph_dir)\n",
"\n",
"# Fetch a pre-trained BART model that translates text queries to SPARQL queries\n",
"model_dir = \"../saved_models/tutorial10_knowledge_graph/\"\n",
"s3_url = \"https://fandom-qa.s3-eu-west-1.amazonaws.com/saved_models/hp_v3.4.zip\"\n",
"fetch_archive_from_http(url=s3_url, output_dir=model_dir)"
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Launching a GraphDB instance"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Unfortunately, there seems to be no good way to run GraphDB in colab environments\n",
"# In your local environment, you could start a GraphDB server with docker\n",
"# Feel free to check GraphDB's website for the free version https://www.ontotext.com/products/graphdb/graphdb-free/\n",
"print(\"Starting GraphDB ...\")\n",
"status = subprocess.run(\n",
" ['docker run -d -p 7200:7200 --name graphdb-instance-tutorial docker-registry.ontotext.com/graphdb-free:9.4.1-adoptopenjdk11'], shell=True\n",
")\n",
"if status.returncode:\n",
" raise Exception(\"Failed to launch GraphDB. Maybe it is already running or you already have a container with that name that you could start?\")\n",
"time.sleep(5)"
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Creating a new GraphDB repository (also known as index in haystack's document stores)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Initialize a knowledge graph connected to GraphDB and use \"tutorial_10_index\" as the name of the index\n",
"kg = GraphDBKnowledgeGraph(index=\"tutorial_10_index\")\n",
"\n",
"# Delete the index as it might have been already created in previous runs\n",
"kg.delete_index()\n",
"\n",
"# Create the index based on a configuration file\n",
"kg.create_index(config_path=Path(graph_dir+\"repo-config.ttl\"))\n",
"\n",
"# Import triples of subject, predicate, and object statements from a ttl file\n",
"kg.import_from_ttl_file(index=\"tutorial_10_index\", path=Path(graph_dir+\"triples.ttl\"))\n",
"print(f\"The last triple stored in the knowledge graph is: {kg.get_all_triples()[-1]}\")\n",
"print(f\"There are {len(kg.get_all_triples())} triples stored in the knowledge graph.\")"
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Define prefixes for names of resources so that we can use shorter resource names in queries\n",
"prefixes = \"\"\"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n",
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n",
"PREFIX hp: <https://deepset.ai/harry_potter/>\n",
"\"\"\"\n",
"kg.prefixes = prefixes\n",
"\n",
"# Load a pre-trained model that translates text queries to SPARQL queries\n",
"kgqa_retriever = Text2SparqlRetriever(knowledge_graph=kg, model_name_or_path=model_dir+\"hp_v3.4\")"
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Query Execution\n",
"\n",
"We can now ask questions that will be answered by our knowledge graph!\n",
"One limitation though: our pre-trained model can only generate questions about resources it has seen during training.\n",
"Otherwise, it cannot translate the name of the resource to the identifier used in the knowledge graph.\n",
"E.g. \"Harry\" -> \"hp:Harry_potter\""
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"query = \"In which house is Harry Potter?\"\n",
"print(f\"Translating the text query \\\"{query}\\\" to a SPARQL query and executing it on the knowledge graph...\")\n",
"result = kgqa_retriever.retrieve(query=query)\n",
"print(result)\n",
"# Correct SPARQL query: select ?a { hp:Harry_potter hp:house ?a . }\n",
"# Correct answer: Gryffindor\n",
"\n",
"print(\"Executing a SPARQL query with prefixed names of resources...\")\n",
"result = kgqa_retriever._query_kg(sparql_query=\"select distinct ?sbj where { ?sbj hp:job hp:Keeper_of_keys_and_grounds . }\")\n",
"print(result)\n",
"# Paraphrased question: Who is the keeper of keys and grounds?\n",
"# Correct answer: Rubeus Hagrid\n",
"\n",
"print(\"Executing a SPARQL query with full names of resources...\")\n",
"result = kgqa_retriever._query_kg(sparql_query=\"select distinct ?obj where { <https://deepset.ai/harry_potter/Hermione_granger> <https://deepset.ai/harry_potter/patronus> ?obj . }\")\n",
"print(result)\n",
"# Paraphrased question: What is the patronus of Hermione?\n",
"# Correct answer: Otter"
],
"outputs": [],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## About us\n",
"\n",
"This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n",
"\n",
"We bring NLP to the industry via open source! \n",
"Our focus: Industry specific language models & large scale QA systems. \n",
" \n",
"Some of our other work: \n",
"- [German BERT](https://deepset.ai/german-bert)\n",
"- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n",
"- [FARM](https://github.com/deepset-ai/FARM)\n",
"\n",
"Get in touch:\n",
"[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n",
"\n",
"By the way: [we're hiring!](https://www.deepset.ai/jobs)"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}