"## \"FAQ-Style QA\": Utilizing existing FAQs for Question Answering\n",
"\n",
"While *extractive Question Answering* works on pure texts and is therefore more generalizable, there's also a common alternative that utilizes existing FAQ data.\n",
"\n",
"Pros:\n",
"- Very fast at inference time\n",
"- Utilize existing FAQ data\n",
"- Quite good control over answers\n",
"\n",
"Cons:\n",
"- Generalizability: We can only answer questions that are similar to existing ones in FAQ\n",
"\n",
"In some use cases, a combination of extractive QA and FAQ-style can also be an interesting option.\n",
"\n",
"*Use this [link](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial4_Tutorial4_FAQ_style_QA.ipynb) to open the notebook in Google Colab.*\n"
"You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (eg., in Colab notebooks), then you can manually download and execute Elasticsearch from source."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Recommended: Start Elasticsearch using Docker\n",
"# ! docker run -d -p 9200:9200 -e \"discovery.type=single-node\" elasticsearch:7.6.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# In Colab / No Docker environments: Start Elasticsearch from source\n",
"In contrast to Tutorial 1 (extractive QA), we:\n",
"\n",
"* specify the name of our `text_field` in Elasticsearch that we want to return as an answer\n",
"* specify the name of our `embedding_field` in Elasticsearch where we'll store the embedding of our question and that is used later for calculating our similarity to the incoming user question\n",
"* set `excluded_meta_data=[\"question_emb\"]` so that we don't return the huge embedding vectors in our search results"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"04/28/2020 12:27:32 - INFO - elasticsearch - PUT http://localhost:9200/document [status:400 request:0.010s]\n"