haystack/tutorials/Tutorial12_LFQA.ipynb

{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "name": "LFQA_via_Haystack.ipynb",
   "provenance": [],
   "collapsed_sections": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bEH-CRbeA6NU"
   },
   "source": [
    "# Long-Form Question Answering\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial12_LFQA.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3K27Y5FbA6NV"
   },
   "source": [
    "### Prepare environment\n",
    "\n",
    "#### Colab: Enable the GPU runtime\n",
    "Make sure you enable the GPU runtime to experience decent speed in this tutorial.  \n",
    "**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/_src/img/colab_gpu_runtime.jpg\">"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "JlZgP8q1A6NW"
   },
   "source": [
    "# Make sure you have a GPU running\n",
    "!nvidia-smi"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "NM36kbRFA6Nc"
   },
   "source": [
    "# Install the latest master of Haystack\n",
    "!pip install git+https://github.com/deepset-ai/haystack.git\n",
    "\n",
    "# If you run this notebook on Google Colab, you might need to\n",
    "# restart the runtime after installing haystack."
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "xmRuhTQ7A6Nh"
   },
   "source": [
    "from haystack.preprocessor.cleaning import clean_wiki_text\n",
    "from haystack.preprocessor.utils import convert_files_to_dicts, fetch_archive_from_http\n",
    "from haystack.generator.transformers import Seq2SeqGenerator"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "q3dSo7ZtA6Nl"
   },
   "source": [
    "### Document Store\n",
    "\n",
    "FAISS is a library for efficient similarity search on a cluster of dense vectors.\n",
    "The `FAISSDocumentStore` uses a SQL(SQLite in-memory be default) database under-the-hood\n",
    "to store the document text and other meta data. The vector embeddings of the text are\n",
    "indexed on a FAISS Index that later is queried for searching answers.\n",
    "The default flavour of FAISSDocumentStore is \"Flat\" but can also be set to \"HNSW\" for\n",
    "faster search at the expense of some accuracy. Just set the faiss_index_factor_str argument in the constructor.\n",
    "For more info on which suits your use case: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "1cYgDJmrA6Nv",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "from haystack.document_store.faiss import FAISSDocumentStore\n",
    "\n",
    "document_store = FAISSDocumentStore(vector_dim=128, faiss_index_factory_str=\"Flat\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "06LatTJBA6N0",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Cleaning & indexing documents\n",
    "\n",
    "Similarly to the previous tutorials, we download, convert and index some Game of Thrones articles to our DocumentStore"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "iqKnu6wxA6N1",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "# Let's first get some files that we want to use\n",
    "doc_dir = \"data/article_txt_got\"\n",
    "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip\"\n",
    "fetch_archive_from_http(url=s3_url, output_dir=doc_dir)\n",
    "\n",
    "# Convert files to dicts\n",
    "dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)\n",
    "    \n",
    "# Now, let's write the dicts containing documents to our DB.\n",
    "document_store.write_documents(dicts)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wgjedxx_A6N6"
   },
   "source": [
    "### Initalize Retriever and Reader/Generator\n",
    "\n",
    "#### Retriever\n",
    "\n",
    "**Here:** We use a `RetribertRetriever` and we invoke `update_embeddings` to index the embeddings of documents in the `FAISSDocumentStore`\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "kFwiPP60A6N7",
    "pycharm": {
     "is_executing": true
    }
   },
   "source": [
    "from haystack.retriever.dense import EmbeddingRetriever\n",
    "\n",
    "retriever = EmbeddingRetriever(document_store=document_store,\n",
    "                               embedding_model=\"yjernite/retribert-base-uncased\",\n",
    "                               model_format=\"retribert\")\n",
    "\n",
    "document_store.update_embeddings(retriever)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sMlVEnJ2NkZZ"
   },
   "source": [
    "Before we blindly use the `RetribertRetriever` let's empirically test it to make sure a simple search indeed finds the relevant documents."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "qpu-t9rndgpe"
   },
   "source": [
    "from haystack.utils import print_documents\n",
    "from haystack.pipeline import DocumentSearchPipeline\n",
    "\n",
    "p_retrieval = DocumentSearchPipeline(retriever)\n",
    "res = p_retrieval.run(\n",
    "    query=\"Tell me something about Arya Stark?\",\n",
    "    params={\"Retriever\": {\"top_k\": 10}}\n",
    ")\n",
    "print_documents(res, max_text_len=512)\n"
   ],
   "execution_count": 1,
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "EOL while scanning string literal (<ipython-input-1-cc681f017dc5>, line 7)",
     "output_type": "error",
     "traceback": [
      "\u001B[0;36m  File \u001B[0;32m\"<ipython-input-1-cc681f017dc5>\"\u001B[0;36m, line \u001B[0;32m7\u001B[0m\n\u001B[0;31m    params={\"top_k_retriever=5\u001B[0m\n\u001B[0m                              ^\u001B[0m\n\u001B[0;31mSyntaxError\u001B[0m\u001B[0;31m:\u001B[0m EOL while scanning string literal\n"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rnVR28OXA6OA"
   },
   "source": [
    "#### Reader/Generator\n",
    "\n",
    "Similar to previous Tutorials we now initalize our reader/generator.\n",
    "\n",
    "Here we use a `Seq2SeqGenerator` with the *yjernite/bart_eli5* model (see: https://huggingface.co/yjernite/bart_eli5)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "fyIuWVwhA6OB"
   },
   "source": [
    "generator = Seq2SeqGenerator(model_name_or_path=\"yjernite/bart_eli5\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "unhLD18yA6OF"
   },
   "source": [
    "### Pipeline\n",
    "\n",
    "With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.\n",
    "Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.\n",
    "To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `GenerativeQAPipeline` that combines a retriever and a reader/generator to answer our questions.\n",
    "You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "TssPQyzWA6OG"
   },
   "source": [
    "from haystack.pipeline import GenerativeQAPipeline\n",
    "pipe = GenerativeQAPipeline(generator, retriever)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bXlBBxKXA6OL"
   },
   "source": [
    "## Voilà! Ask a question!"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Zi97Hif2A6OM"
   },
   "source": [
    "pipe.run(\n",
    "    query=\"Why did Arya Stark's character get portrayed in a television adaptation?\",\n",
    "    params={\"Retriever\": {\"top_k\": 1}}\n",
    ")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "zvHb8SvMblw9"
   },
   "source": [
    "pipe.run(query=\"What kind of character does Arya Stark play?\", params={\"Retriever\": {\"top_k\": 1}})"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "## About us\n",
    "\n",
    "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n",
    "\n",
    "We bring NLP to the industry via open source!\n",
    "Our focus: Industry specific language models & large scale QA systems.\n",
    "\n",
    "Some of our other work:\n",
    "- [German BERT](https://deepset.ai/german-bert)\n",
    "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n",
    "- [FARM](https://github.com/deepset-ai/FARM)\n",
    "\n",
    "Get in touch:\n",
    "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n",
    "\n",
    "By the way: [we're hiring!](https://www.deepset.ai/jobs)"
   ],
   "metadata": {
    "collapsed": false
   }
  }
 ]
}
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`{`
			`"nbformat": 4,`
			`"nbformat_minor": 0,`
			`"metadata": {`
			`"accelerator": "GPU",`
			`"colab": {`
			`"name": "LFQA_via_Haystack.ipynb",`
			`"provenance": [],`
			`"collapsed_sections": []`
			`},`
			`"kernelspec": {`
			`"display_name": "Python 3",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.6.9"`
			`}`
			`},`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "bEH-CRbeA6NU"`
			`},`
			`"source": [`
			`"# Long-Form Question Answering\n",`
			`"\n",`
			`"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial12_LFQA.ipynb)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "3K27Y5FbA6NV"`
			`},`
			`"source": [`
			`"### Prepare environment\n",`
			`"\n",`
			`"#### Colab: Enable the GPU runtime\n",`
			`"Make sure you enable the GPU runtime to experience decent speed in this tutorial. \n",`
			`"Runtime -> Change Runtime type -> Hardware accelerator -> GPU\n",`
			`"\n",`
			`"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/_src/img/colab_gpu_runtime.jpg\">"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "JlZgP8q1A6NW"`
			`},`
			`"source": [`
			`"# Make sure you have a GPU running\n",`
			`"!nvidia-smi"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "NM36kbRFA6Nc"`
			`},`
			`"source": [`
			`"# Install the latest master of Haystack\n",`
Add comment to tutorial notebooks about restarting runtime in colab (#1486) * Add comment to tutorial notebooks about restarting runtime in colab * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-09-23 14:36:20 +02:00			`"!pip install git+https://github.com/deepset-ai/haystack.git\n",`
			`"\n",`
			`"# If you run this notebook on Google Colab, you might need to\n",`
			`"# restart the runtime after installing haystack."`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "xmRuhTQ7A6Nh"`
			`},`
			`"source": [`
			`"from haystack.preprocessor.cleaning import clean_wiki_text\n",`
			`"from haystack.preprocessor.utils import convert_files_to_dicts, fetch_archive_from_http\n",`
			`"from haystack.generator.transformers import Seq2SeqGenerator"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "q3dSo7ZtA6Nl"`
			`},`
			`"source": [`
			`"### Document Store\n",`
			`"\n",`
			`"FAISS is a library for efficient similarity search on a cluster of dense vectors.\n",`
			"The `FAISSDocumentStore` uses a SQL(SQLite in-memory be default) database under-the-hood\n",
			`"to store the document text and other meta data. The vector embeddings of the text are\n",`
			`"indexed on a FAISS Index that later is queried for searching answers.\n",`
			`"The default flavour of FAISSDocumentStore is \"Flat\" but can also be set to \"HNSW\" for\n",`
			`"faster search at the expense of some accuracy. Just set the faiss_index_factor_str argument in the constructor.\n",`
			`"For more info on which suits your use case: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "1cYgDJmrA6Nv",`
			`"pycharm": {`
			`"name": "#%%\n"`
			`}`
			`},`
			`"source": [`
			`"from haystack.document_store.faiss import FAISSDocumentStore\n",`
			`"\n",`
			`"document_store = FAISSDocumentStore(vector_dim=128, faiss_index_factory_str=\"Flat\")"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "06LatTJBA6N0",`
			`"pycharm": {`
			`"name": "#%% md\n"`
			`}`
			`},`
			`"source": [`
			`"### Cleaning & indexing documents\n",`
			`"\n",`
			`"Similarly to the previous tutorials, we download, convert and index some Game of Thrones articles to our DocumentStore"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "iqKnu6wxA6N1",`
			`"pycharm": {`
			`"name": "#%%\n"`
			`}`
			`},`
			`"source": [`
			`"# Let's first get some files that we want to use\n",`
			`"doc_dir = \"data/article_txt_got\"\n",`
			`"s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip\"\n",`
			`"fetch_archive_from_http(url=s3_url, output_dir=doc_dir)\n",`
			`"\n",`
			`"# Convert files to dicts\n",`
			`"dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)\n",`
			`" \n",`
			`"# Now, let's write the dicts containing documents to our DB.\n",`
			`"document_store.write_documents(dicts)"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "wgjedxx_A6N6"`
			`},`
			`"source": [`
			`"### Initalize Retriever and Reader/Generator\n",`
			`"\n",`
			`"#### Retriever\n",`
			`"\n",`
			"Here: We use a `RetribertRetriever` and we invoke `update_embeddings` to index the embeddings of documents in the `FAISSDocumentStore`\n",
			`"\n"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "kFwiPP60A6N7",`
			`"pycharm": {`
			`"is_executing": true`
			`}`
			`},`
			`"source": [`
			`"from haystack.retriever.dense import EmbeddingRetriever\n",`
			`"\n",`
			`"retriever = EmbeddingRetriever(document_store=document_store,\n",`
			`" embedding_model=\"yjernite/retribert-base-uncased\",\n",`
			`" model_format=\"retribert\")\n",`
			`"\n",`
			`"document_store.update_embeddings(retriever)"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "sMlVEnJ2NkZZ"`
			`},`
			`"source": [`
			"Before we blindly use the `RetribertRetriever` let's empirically test it to make sure a simple search indeed finds the relevant documents."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "qpu-t9rndgpe"`
			`},`
			`"source": [`
Fix parameter names in tutorial 5 and 12 (#1639) * Fix parameter names in tutorial 5 * Update parameters in tutorial notebook * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-10-22 17:22:51 +02:00			`"from haystack.utils import print_documents\n",`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`"from haystack.pipeline import DocumentSearchPipeline\n",`
			`"\n",`
			`"p_retrieval = DocumentSearchPipeline(retriever)\n",`
			`"res = p_retrieval.run(\n",`
			`" query=\"Tell me something about Arya Stark?\",\n",`
Fix parameter names in tutorial 5 and 12 (#1639) * Fix parameter names in tutorial 5 * Update parameters in tutorial notebook * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-10-22 17:22:51 +02:00			`" params={\"Retriever\": {\"top_k\": 10}}\n",`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`")\n",`
			`"print_documents(res, max_text_len=512)\n"`
			`],`
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00			`"execution_count": 1,`
			`"outputs": [`
			`{`
			`"ename": "SyntaxError",`
			`"evalue": "EOL while scanning string literal (<ipython-input-1-cc681f017dc5>, line 7)",`
			`"output_type": "error",`
			`"traceback": [`
			`"\u001B[0;36m File \u001B[0;32m\"<ipython-input-1-cc681f017dc5>\"\u001B[0;36m, line \u001B[0;32m7\u001B[0m\n\u001B[0;31m params={\"top_k_retriever=5\u001B[0m\n\u001B[0m ^\u001B[0m\n\u001B[0;31mSyntaxError\u001B[0m\u001B[0;31m:\u001B[0m EOL while scanning string literal\n"`
			`]`
			`}`
			`]`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "rnVR28OXA6OA"`
			`},`
			`"source": [`
			`"#### Reader/Generator\n",`
			`"\n",`
			`"Similar to previous Tutorials we now initalize our reader/generator.\n",`
			`"\n",`
			"Here we use a `Seq2SeqGenerator` with the yjernite/bart_eli5 model (see: https://huggingface.co/yjernite/bart_eli5)\n",
			`"\n"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "fyIuWVwhA6OB"`
			`},`
			`"source": [`
			`"generator = Seq2SeqGenerator(model_name_or_path=\"yjernite/bart_eli5\")"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "unhLD18yA6OF"`
			`},`
			`"source": [`
			`"### Pipeline\n",`
			`"\n",`
			"With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.\n",
			"Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.\n",
			"To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `GenerativeQAPipeline` that combines a retriever and a reader/generator to answer our questions.\n",
			"You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd)."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "TssPQyzWA6OG"`
			`},`
			`"source": [`
			`"from haystack.pipeline import GenerativeQAPipeline\n",`
			`"pipe = GenerativeQAPipeline(generator, retriever)"`
			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"id": "bXlBBxKXA6OL"`
			`},`
			`"source": [`
			`"## Voilà! Ask a question!"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "Zi97Hif2A6OM"`
			`},`
			`"source": [`
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00			`"pipe.run(\n",`
			`" query=\"Why did Arya Stark's character get portrayed in a television adaptation?\",\n",`
			`" params={\"Retriever\": {\"top_k\": 1}}\n",`
			`")"`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`],`
			`"execution_count": null,`
			`"outputs": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"metadata": {`
			`"id": "zvHb8SvMblw9"`
			`},`
			`"source": [`
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00			`"pipe.run(query=\"What kind of character does Arya Stark play?\", params={\"Retriever\": {\"top_k\": 1}})"`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`],`
			`"execution_count": null,`
			`"outputs": []`
Add about sections (#1195) 2021-06-14 18:37:00 +02:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"source": [`
			`"## About us\n",`
			`"\n",`
			`"This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n",`
			`"\n",`
			`"We bring NLP to the industry via open source!\n",`
			`"Our focus: Industry specific language models & large scale QA systems.\n",`
			`"\n",`
			`"Some of our other work:\n",`
			`"- [German BERT](https://deepset.ai/german-bert)\n",`
			`"- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n",`
			`"- [FARM](https://github.com/deepset-ai/FARM)\n",`
			`"\n",`
			`"Get in touch:\n",`
			`"[Twitter](https://twitter.com/deepset_ai) \| [LinkedIn](https://www.linkedin.com/company/deepset-ai/) \| [Slack](https://haystack.deepset.ai/community/join) \| [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) \| [Website](https://deepset.ai)\n",`
			`"\n",`
Update jobs link to personio (#1611) * Update jobs link to personio * Add latest docstring and tutorial changes * Change jobs link to main website * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2021-10-21 11:42:32 +02:00			`"By the way: [we're hiring!](https://www.deepset.ai/jobs)"`
Add about sections (#1195) 2021-06-14 18:37:00 +02:00			`],`
			`"metadata": {`
			`"collapsed": false`
			`}`
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever (#1086) * Integrate LFQA with Haystack * Integrate LFQA with Haystack - unit tests * Properly initialize conftest default value for vector_dim * Update PR after inital feedback * Fix conftest.py import * Seq2SeqGenerator uses Callables instead of subclasses for custom model input * Update docstring * Fix Callable use * Add LFQA tutorials * Improve type error reporting for invalid input converter Callable * Generate docstrings * Format comments in tutorial script * Generate tutorial md * Add usage page Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: brandenchan <brandenchan@icloud.com> 2021-06-14 17:53:43 +02:00			`}`
			`]`
			`}`