haystack/tutorials/Tutorial10_Knowledge_Graph.ipynb
Julian Risch d38c07e0ee
knowledge graph example (#934)
* Add knowledge graph module

* Fix type hint

* Add graph retriver module

* Change type annotations, change return format

* Add graph retriever that executes questions as sparql queries

* Linking only those entities that are in the knowledge graph

* Added logging and using relations extracted from Knowledge graph for linking

* Preventing entity linking from linking the same token to multiple entities

* Pruning triples that have no variables for select and count queries

* Support knowledge graphs with Pipelines

* Add text2sparql

* Entity linking and relation linking consider more special cases now based on evaluation on labelled data

* Separating example code from KGQA implementation

* Add eval on combined extarctive and kg questions

* Remove references to hp-test

* Add fields sparql_query and long_answer_list to metadata

* Removing modular Question2SPARQL approach

* Removing additional classes used for modular kgqa approach

* preparing lcquad data

* change graph db

* Translating namespaces in knowledge graph queries

* Creating graphdb index and loading triples from .ttl file

* Fetching graph config files, triples and model from S3

* Fix incompatibility issues with BaseGraphRetriever and BaseComponent

* Removing unused utility functions

* Adding doc strings and tutorial header

* Adding sparqlwrapper dependency

* Moving tutorial header

* Sorting tutorials by number within name of notebook

* Add latest docstring and tutorial changes

* Creating test cases for knowledge graph

* Changing knowledge graph example to harry potter

* Add latest docstring and tutorial changes

* Adapting the tutorial notebook to harry potter example

* Add GraphDB fixture for tests

* Add latest docstring and tutorial changes

* Added GraphDB docker launch to CI

* Use correct GraphDB fixture

* Check if GraphDB instance is already running

* Renaming question/query and incorporating other feedback from Timo and Tanay

* Removed type annotation

* Add latest docstring and tutorial changes

Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-08 14:05:33 +02:00

261 lines
8.2 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Question Answering on a Knowledge Graph\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial10_Knowledge_Graph.ipynb)\n",
"\n",
"Haystack allows storing and querying knowledge graphs with the help of pre-trained models that translate text queries to SPARQL queries.\n",
"This tutorial demonstrates how to load an existing knowledge graph into haystack, load a pre-trained retriever, and execute text queries on the knowledge graph.\n",
"The training of models that translate text queries into SPARQL queries is currently not supported."
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Install the latest release of Haystack in your own environment\n",
"#! pip install farm-haystack\n",
"\n",
"# Install the latest master of Haystack\n",
"!pip install git+https://github.com/deepset-ai/haystack.git"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Here are some imports that we'll need\n",
"\n",
"import subprocess\n",
"import time\n",
"from pathlib import Path\n",
"\n",
"from haystack.graph_retriever.text_to_sparql import Text2SparqlRetriever\n",
"from haystack.knowledge_graph.graphdb import GraphDBKnowledgeGraph\n",
"from haystack.preprocessor.utils import fetch_archive_from_http"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Downloading Knowledge Graph and Model"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Let's first fetch some triples that we want to store in our knowledge graph\n",
"# Here: exemplary triples from the wizarding world\n",
"graph_dir = \"../data/tutorial10_knowledge_graph/\"\n",
"s3_url = \"https://fandom-qa.s3-eu-west-1.amazonaws.com/triples_and_config.zip\"\n",
"fetch_archive_from_http(url=s3_url, output_dir=graph_dir)\n",
"\n",
"# Fetch a pre-trained BART model that translates text queries to SPARQL queries\n",
"model_dir = \"../saved_models/tutorial10_knowledge_graph/\"\n",
"s3_url = \"https://fandom-qa.s3-eu-west-1.amazonaws.com/saved_models/hp_v3.4.zip\"\n",
"fetch_archive_from_http(url=s3_url, output_dir=model_dir)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Launching a GraphDB instance"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Unfortunately, there seems to be no good way to run GraphDB in colab environments\n",
"# In your local environment, you could start a GraphDB server with docker\n",
"# Feel free to check GraphDB's website for the free version https://www.ontotext.com/products/graphdb/graphdb-free/\n",
"print(\"Starting GraphDB ...\")\n",
"status = subprocess.run(\n",
" ['docker run -d -p 7200:7200 --name graphdb-instance-tutorial docker-registry.ontotext.com/graphdb-free:9.4.1-adoptopenjdk11'], shell=True\n",
")\n",
"if status.returncode:\n",
" raise Exception(\"Failed to launch GraphDB. Maybe it is already running or you already have a container with that name that you could start?\")\n",
"time.sleep(5)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Creating a new GraphDB repository (also known as index in haystack's document stores)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Initialize a knowledge graph connected to GraphDB and use \"tutorial_10_index\" as the name of the index\n",
"kg = GraphDBKnowledgeGraph(index=\"tutorial_10_index\")\n",
"\n",
"# Delete the index as it might have been already created in previous runs\n",
"kg.delete_index()\n",
"\n",
"# Create the index based on a configuration file\n",
"kg.create_index(config_path=Path(graph_dir+\"repo-config.ttl\"))\n",
"\n",
"# Import triples of subject, predicate, and object statements from a ttl file\n",
"kg.import_from_ttl_file(index=\"tutorial_10_index\", path=Path(graph_dir+\"triples.ttl\"))\n",
"print(f\"The last triple stored in the knowledge graph is: {kg.get_all_triples()[-1]}\")\n",
"print(f\"There are {len(kg.get_all_triples())} triples stored in the knowledge graph.\")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Define prefixes for names of resources so that we can use shorter resource names in queries\n",
"prefixes = \"\"\"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n",
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n",
"PREFIX hp: <https://deepset.ai/harry_potter/>\n",
"\"\"\"\n",
"kg.prefixes = prefixes\n",
"\n",
"# Load a pre-trained model that translates text queries to SPARQL queries\n",
"kgqa_retriever = Text2SparqlRetriever(knowledge_graph=kg, model_name_or_path=model_dir+\"hp_v3.4\")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Query Execution\n",
"\n",
"We can now ask questions that will be answered by our knowledge graph!\n",
"One limitation though: our pre-trained model can only generate questions about resources it has seen during training.\n",
"Otherwise, it cannot translate the name of the resource to the identifier used in the knowledge graph.\n",
"E.g. \"Harry\" -> \"hp:Harry_potter\""
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"source": [
"query = \"In which house is Harry Potter?\"\n",
"print(f\"Translating the text query \\\"{query}\\\" to a SPARQL query and executing it on the knowledge graph...\")\n",
"result = kgqa_retriever.retrieve(query=query)\n",
"print(result)\n",
"# Correct SPARQL query: select ?a { hp:Harry_potter hp:house ?a . }\n",
"# Correct answer: Gryffindor\n",
"\n",
"print(\"Executing a SPARQL query with prefixed names of resources...\")\n",
"result = kgqa_retriever._query_kg(sparql_query=\"select distinct ?sbj where { ?sbj hp:job hp:Keeper_of_keys_and_grounds . }\")\n",
"print(result)\n",
"# Paraphrased question: Who is the keeper of keys and grounds?\n",
"# Correct answer: Rubeus Hagrid\n",
"\n",
"print(\"Executing a SPARQL query with full names of resources...\")\n",
"result = kgqa_retriever._query_kg(sparql_query=\"select distinct ?obj where { <https://deepset.ai/harry_potter/Hermione_granger> <https://deepset.ai/harry_potter/patronus> ?obj . }\")\n",
"print(result)\n",
"# Paraphrased question: What is the patronus of Hermione?\n",
"# Correct answer: Otter"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}