mirror of
				https://github.com/deepset-ai/haystack.git
				synced 2025-10-31 01:39:45 +00:00 
			
		
		
		
	 ac5617e757
			
		
	
	
		ac5617e757
		
			
		
	
	
	
	
		
			
			* add basic telemetry features * change pipeline_config to _component_config * Update Documentation & Code Style * add super().__init__() calls to error classes * make posthog mock work with python 3.7 * Update Documentation & Code Style * update link to docs web page * log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH) * add comment on send_event in BaseComponent.init() and fix mypy * mock NonPrivateParameters and fix pylint undefined-variable * Update Documentation & Code Style * check model path contains multiple / * add test for writing to file * add test for en-/disable telemetry * Update Documentation & Code Style * merge file deletion methods and ignore pylint global statement * Update Documentation & Code Style * set env variable in demo to activate telemetry * fix mock of HAYSTACK_TELEMETRY_ENABLED * fix mypy and linter * add CI as env variable to execution contexts * remove threading, add test for custom error event * Update Documentation & Code Style * simplify config/log file deletion * add test for final event being sent * force writing config file in test * make test compatible with python 3.7 * switch to posthog production server * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
		
			
				
	
	
		
			295 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			295 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "# Question Answering on a Knowledge Graph\n",
 | |
|     "\n",
 | |
|     "[](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial10_Knowledge_Graph.ipynb)\n",
 | |
|     "\n",
 | |
|     "Haystack allows storing and querying knowledge graphs with the help of pre-trained models that translate text queries to SPARQL queries.\n",
 | |
|     "This tutorial demonstrates how to load an existing knowledge graph into haystack, load a pre-trained retriever, and execute text queries on the knowledge graph.\n",
 | |
|     "The training of models that translate text queries into SPARQL queries is currently not supported."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Install the latest release of Haystack in your own environment\n",
 | |
|     "#! pip install farm-haystack\n",
 | |
|     "\n",
 | |
|     "# Install the latest master of Haystack\n",
 | |
|     "!pip install --upgrade pip\n",
 | |
|     "!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,graphdb]"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Here are some imports that we'll need\n",
 | |
|     "\n",
 | |
|     "import subprocess\n",
 | |
|     "import time\n",
 | |
|     "from pathlib import Path\n",
 | |
|     "\n",
 | |
|     "from haystack.nodes import Text2SparqlRetriever\n",
 | |
|     "from haystack.document_stores import GraphDBKnowledgeGraph\n",
 | |
|     "from haystack.utils import fetch_archive_from_http"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Downloading Knowledge Graph and Model"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Let's first fetch some triples that we want to store in our knowledge graph\n",
 | |
|     "# Here: exemplary triples from the wizarding world\n",
 | |
|     "graph_dir = \"data/tutorial10\"\n",
 | |
|     "s3_url = \"https://fandom-qa.s3-eu-west-1.amazonaws.com/triples_and_config.zip\"\n",
 | |
|     "fetch_archive_from_http(url=s3_url, output_dir=graph_dir)\n",
 | |
|     "\n",
 | |
|     "# Fetch a pre-trained BART model that translates text queries to SPARQL queries\n",
 | |
|     "model_dir = \"../saved_models/tutorial10_knowledge_graph/\"\n",
 | |
|     "s3_url = \"https://fandom-qa.s3-eu-west-1.amazonaws.com/saved_models/hp_v3.4.zip\"\n",
 | |
|     "fetch_archive_from_http(url=s3_url, output_dir=model_dir)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Launching a GraphDB instance"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Unfortunately, there seems to be no good way to run GraphDB in colab environments\n",
 | |
|     "# In your local environment, you could start a GraphDB server with docker\n",
 | |
|     "# Feel free to check GraphDB's website for the free version https://www.ontotext.com/products/graphdb/graphdb-free/\n",
 | |
|     "print(\"Starting GraphDB ...\")\n",
 | |
|     "status = subprocess.run(\n",
 | |
|     "    [\n",
 | |
|     "        \"docker run -d -p 7200:7200 --name graphdb-instance-tutorial docker-registry.ontotext.com/graphdb-free:9.4.1-adoptopenjdk11\"\n",
 | |
|     "    ],\n",
 | |
|     "    shell=True,\n",
 | |
|     ")\n",
 | |
|     "if status.returncode:\n",
 | |
|     "    raise Exception(\n",
 | |
|     "        \"Failed to launch GraphDB. Maybe it is already running or you already have a container with that name that you could start?\"\n",
 | |
|     "    )\n",
 | |
|     "time.sleep(5)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Creating a new GraphDB repository (also known as index in haystack's document stores)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Initialize a knowledge graph connected to GraphDB and use \"tutorial_10_index\" as the name of the index\n",
 | |
|     "kg = GraphDBKnowledgeGraph(index=\"tutorial_10_index\")\n",
 | |
|     "\n",
 | |
|     "# Delete the index as it might have been already created in previous runs\n",
 | |
|     "kg.delete_index()\n",
 | |
|     "\n",
 | |
|     "# Create the index based on a configuration file\n",
 | |
|     "kg.create_index(config_path=Path(graph_dir + \"repo-config.ttl\"))\n",
 | |
|     "\n",
 | |
|     "# Import triples of subject, predicate, and object statements from a ttl file\n",
 | |
|     "kg.import_from_ttl_file(index=\"tutorial_10_index\", path=Path(graph_dir + \"triples.ttl\"))\n",
 | |
|     "print(f\"The last triple stored in the knowledge graph is: {kg.get_all_triples()[-1]}\")\n",
 | |
|     "print(f\"There are {len(kg.get_all_triples())} triples stored in the knowledge graph.\")"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Define prefixes for names of resources so that we can use shorter resource names in queries\n",
 | |
|     "prefixes = \"\"\"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n",
 | |
|     "PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n",
 | |
|     "PREFIX hp: <https://deepset.ai/harry_potter/>\n",
 | |
|     "\"\"\"\n",
 | |
|     "kg.prefixes = prefixes\n",
 | |
|     "\n",
 | |
|     "# Load a pre-trained model that translates text queries to SPARQL queries\n",
 | |
|     "kgqa_retriever = Text2SparqlRetriever(knowledge_graph=kg, model_name_or_path=model_dir + \"hp_v3.4\")"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Query Execution\n",
 | |
|     "\n",
 | |
|     "We can now ask questions that will be answered by our knowledge graph!\n",
 | |
|     "One limitation though: our pre-trained model can only generate questions about resources it has seen during training.\n",
 | |
|     "Otherwise, it cannot translate the name of the resource to the identifier used in the knowledge graph.\n",
 | |
|     "E.g. \"Harry\" -> \"hp:Harry_potter\""
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "collapsed": false,
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "query = \"In which house is Harry Potter?\"\n",
 | |
|     "print(f'Translating the text query \"{query}\" to a SPARQL query and executing it on the knowledge graph...')\n",
 | |
|     "result = kgqa_retriever.retrieve(query=query)\n",
 | |
|     "print(result)\n",
 | |
|     "# Correct SPARQL query: select ?a { hp:Harry_potter hp:house ?a . }\n",
 | |
|     "# Correct answer: Gryffindor\n",
 | |
|     "\n",
 | |
|     "print(\"Executing a SPARQL query with prefixed names of resources...\")\n",
 | |
|     "result = kgqa_retriever._query_kg(\n",
 | |
|     "    sparql_query=\"select distinct ?sbj where { ?sbj hp:job hp:Keeper_of_keys_and_grounds . }\"\n",
 | |
|     ")\n",
 | |
|     "print(result)\n",
 | |
|     "# Paraphrased question: Who is the keeper of keys and grounds?\n",
 | |
|     "# Correct answer: Rubeus Hagrid\n",
 | |
|     "\n",
 | |
|     "print(\"Executing a SPARQL query with full names of resources...\")\n",
 | |
|     "result = kgqa_retriever._query_kg(\n",
 | |
|     "    sparql_query=\"select distinct ?obj where { <https://deepset.ai/harry_potter/Hermione_granger> <https://deepset.ai/harry_potter/patronus> ?obj . }\"\n",
 | |
|     ")\n",
 | |
|     "print(result)\n",
 | |
|     "# Paraphrased question: What is the patronus of Hermione?\n",
 | |
|     "# Correct answer: Otter"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": false
 | |
|    },
 | |
|    "source": [
 | |
|     "## About us\n",
 | |
|     "\n",
 | |
|     "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n",
 | |
|     "\n",
 | |
|     "We bring NLP to the industry via open source!  \n",
 | |
|     "Our focus: Industry specific language models & large scale QA systems.  \n",
 | |
|     "  \n",
 | |
|     "Some of our other work: \n",
 | |
|     "- [German BERT](https://deepset.ai/german-bert)\n",
 | |
|     "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n",
 | |
|     "- [FARM](https://github.com/deepset-ai/FARM)\n",
 | |
|     "\n",
 | |
|     "Get in touch:\n",
 | |
|     "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n",
 | |
|     "\n",
 | |
|     "By the way: [we're hiring!](https://www.deepset.ai/jobs)"
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 2
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython2",
 | |
|    "version": "2.7.6"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 2
 | |
| } |