2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 "cells": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "markdown",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {},
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
									
										
										
										
											2020-09-18 12:57:32 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "# Utilizing existing FAQs for Question Answering\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "EXECUTABLE VERSION: [colab](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial4_Tutorial4_FAQ_style_QA.ipynb)\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "While *extractive Question Answering* works on pure texts and is therefore more generalizable, there's also a common alternative that utilizes existing FAQ data.\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-09-18 12:57:32 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "**Pros**:\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "- Very fast at inference time\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "- Utilize existing FAQ data\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "- Quite good control over answers\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-09-18 12:57:32 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "**Cons**:\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-09-18 12:57:32 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "- Generalizability: We can only answer questions that are similar to existing ones in FAQ\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-09-18 12:57:32 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "In some use cases, a combination of extractive QA and FAQ-style can also be an interesting option.\n"
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {},
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
									
										
										
										
											2020-08-19 14:52:50 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "# Install the latest release of Haystack in your own environment \n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "#! pip install farm-haystack\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-09-21 10:26:12 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "# Install the latest master of Haystack\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-10-23 10:43:58 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "!pip install git+https://github.com/deepset-ai/haystack.git\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "!pip install urllib3==1.25.4"
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "pycharm": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "is_executing": false
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "from haystack import Finder\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-09-16 18:33:23 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "from haystack.document_store.elasticsearch import ElasticsearchDocumentStore\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-07-07 14:59:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "from haystack.retriever.dense import EmbeddingRetriever\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "from haystack.utils import print_answers\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "import pandas as pd\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "import requests\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "markdown",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {},
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "### Start an Elasticsearch server\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (eg., in Colab notebooks), then you can manually download and execute Elasticsearch from source."
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {},
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# Recommended: Start Elasticsearch using Docker\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-10-23 17:50:49 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "# ! docker run -d -p 9200:9200 -e \"discovery.type=single-node\" elasticsearch:7.9.2"
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {},
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# In Colab / No Docker environments: Start Elasticsearch from source\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-10-23 17:50:49 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "! chown -R daemon:daemon elasticsearch-7.9.2\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "import os\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "from subprocess import Popen, PIPE, STDOUT\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-10-23 17:50:49 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "es_server = Popen(['elasticsearch-7.9.2/bin/elasticsearch'],\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "                   stdout=PIPE, stderr=STDOUT,\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "                   preexec_fn=lambda: os.setuid(1)  # as daemon\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "                  )\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# wait until ES has started\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "! sleep 30\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "markdown",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "### Init the DocumentStore\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "In contrast to Tutorial 1 (extractive QA), we:\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "* specify the name of our `text_field` in Elasticsearch that we want to return as an answer\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "* specify the name of our `embedding_field` in Elasticsearch where we'll store the embedding of our question and that is used later for calculating our similarity to the incoming user question\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "* set `excluded_meta_data=[\"question_emb\"]` so that we don't return the huge embedding vectors in our search results"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": 2,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "pycharm": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "name": "#%%\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "name": "stderr",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "output_type": "stream",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "text": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								      "04/28/2020 12:27:32 - INFO - elasticsearch -   PUT http://localhost:9200/document [status:400 request:0.010s]\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
									
										
										
										
											2020-09-16 18:33:23 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "from haystack.document_store.elasticsearch import ElasticsearchDocumentStore\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "document_store = ElasticsearchDocumentStore(host=\"localhost\", username=\"\", password=\"\",\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "                                            index=\"document\",\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "                                            embedding_field=\"question_emb\",\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "                                            embedding_dim=768,\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "                                            excluded_meta_data=[\"question_emb\"])"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "markdown",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "### Create a Retriever using embeddings\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "Instead of retrieving via Elasticsearch's plain BM25, we want to use vector similarity of the questions (user question vs. FAQ ones).\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "We can use the `EmbeddingRetriever` for this purpose and specify a model that we use for the embeddings."
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
									
										
										
										
											2020-09-18 18:10:50 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "retriever = EmbeddingRetriever(document_store=document_store, embedding_model=\"deepset/sentence_bert\", use_gpu=True)"
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "pycharm": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "name": "#%%\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "markdown",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "### Prepare & Index FAQ data\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "We create a pandas dataframe containing some FAQ data (i.e curated pairs of question + answer) and index those in elasticsearch.\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "Here: We download some question-answer pairs related to COVID-19"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# Download\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "temp = requests.get(\"https://raw.githubusercontent.com/deepset-ai/COVID-QA/master/data/faqs/faq_covidbert.csv\")\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "open('small_faq_covid.csv', 'wb').write(temp.content)\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# Get dataframe with columns \"question\", \"answer\" and some custom metadata\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "df = pd.read_csv(\"small_faq_covid.csv\")\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# Minimal cleaning\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "df.fillna(value=\"\", inplace=True)\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "df[\"question\"] = df[\"question\"].apply(lambda x: x.strip())\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "print(df.head())\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# Get embeddings for our questions from the FAQs\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "questions = list(df[\"question\"].values)\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-07-13 12:38:01 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "df[\"question_emb\"] = retriever.embed_queries(texts=questions)\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-07-31 11:34:06 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    "df = df.rename(columns={\"answer\": \"text\"})\n",
							 
						 
					
						
							
								
									
										
										
										
											2020-05-07 10:19:26 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    "\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "# Convert Dataframe to list of dicts and index them in our DocumentStore\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "docs_to_index = df.to_dict(orient=\"records\")\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "document_store.write_documents(docs_to_index)"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "pycharm": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "name": "#%%\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "markdown",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "### Ask questions\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "Initialize a Finder (this time without a reader) and ask questions"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "cell_type": "code",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "execution_count": null,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "outputs": [],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "source": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "finder = Finder(reader=None, retriever=retriever)\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "prediction = finder.get_answers_via_similar_questions(question=\"How is the virus spreading?\", top_k_retriever=10)\n",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "print_answers(prediction, details=\"all\")"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "collapsed": false,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "pycharm": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								     "name": "#%%\n"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 "metadata": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  "kernelspec": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "display_name": "Python 3",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "language": "python",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "name": "python3"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  "language_info": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "codemirror_mode": {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "name": "ipython",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    "version": 3
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "file_extension": ".py",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "mimetype": "text/x-python",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "name": "python",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "nbconvert_exporter": "python",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "pygments_lexer": "ipython3",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   "version": "3.7.6"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 "nbformat": 4,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 "nbformat_minor": 2
							 
						 
					
						
							
								
									
										
										
										
											2020-10-23 10:43:58 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}