"[](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial11_Pipelines.ipynb)\n",
"\n",
"In this tutorial, you will learn how the `Pipeline` class acts as a connector between all the different\n",
"building blocks that are found in FARM. Whether you are using a Reader, Generator, Summarizer\n",
"or Retriever (or 2), the `Pipeline` class will help you build a Directed Acyclic Graph (DAG) that\n",
"determines how to route the output of one component into the input of another.\n"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Setting Up the Environment\n",
"\n",
"Let's start by ensuring we have a GPU running to ensure decent speed in this tutorial.\n",
"In Google colab, you can change to a GPU runtime in the menu:\n",
" top_k_retriever=5 #This is top_k per retriever\n",
")\n",
"print_answers(res, details=\"minimal\")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Custom Nodes\n",
"\n",
"Nodes are relatively simple objects\n",
"and we encourage our users to design their own if they don't see on that fits their use case\n",
"\n",
"The only requirements are:\n",
"- Add a method run(self, **kwargs) to your class. **kwargs will contain the output from the previous node in your graph.\n",
"- Do whatever you want within run() (e.g. reformatting the query)\n",
"- Return a tuple that contains your output data (for the next node)\n",
"and the name of the outgoing edge (by default \"output_1\" for nodes that have one output)\n",
"- Add a class attribute outgoing_edges = 1 that defines the number of output options from your node. You only need a higher number here if you have a decision node (see below).\n",
"\n",
"Here we have a template for a Node:"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"class NodeTemplate():\n",
" outgoing_edges = 1\n",
"\n",
" def run(self, **kwargs):\n",
" # Insert code here to manipulate the variables in kwarg\n",
" return (kwargs, \"output_1\")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Decision Nodes\n",
"\n",
"Decision Nodes help you route your data so that only certain branches of your `Pipeline` are run.\n",
"One popular use case for such query classifiers is routing keyword queries to Elasticsearch and questions to DPR + Reader.\n",
"With this approach you keep optimal speed and simplicity for keywords while going deep with transformers when it's most helpful.\n",
"# Run only the dense retriever on the full sentence query\n",
"res_1 = p_classifier.run(\n",
" query=\"Who is the father of Arya Stark?\",\n",
" top_k_retriever=10\n",
")\n",
"print(\"DPR Results\" + \"\\n\" + \"=\"*15)\n",
"print_answers(res_1)\n",
"\n",
"# Run only the sparse retriever on a keyword based query\n",
"res_2 = p_classifier.run(\n",
" query=\"Arya Stark father\",\n",
" top_k_retriever=10\n",
")\n",
"print(\"ES Results\" + \"\\n\" + \"=\"*15)\n",
"print_answers(res_2)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## Evaluation Nodes\n",
"\n",
"We have also designed a set of nodes that can be used to evaluate the performance of a system.\n",
"Have a look at our [tutorial](https://haystack.deepset.ai/docs/latest/tutorial5md) to get hands on with the code and learn more about Evaluation Nodes!\n"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"## YAML Configs\n",
"\n",
"A full `Pipeline` can be defined in a YAML file and simply loaded.\n",
"Having your pipeline available in a YAML is particularly useful\n",
"when you move between experimentation and production environments.\n",
"Just export the YAML from your notebook / IDE and import it into your production environment.\n",
"It also helps with version control of pipelines,\n",
"allows you to share your pipeline easily with colleagues,\n",
"and simplifies the configuration of pipeline parameters in production.\n",
"\n",
"It consists of two main sections: you define all objects (e.g. a reader) in components\n",
"and then stick them together to a pipeline in pipelines.\n",
"You can also set one component to be multiple nodes of a pipeline or to be a node across multiple pipelines.\n",
"It will be loaded just once in memory and therefore doesn't hurt your resources more than actually needed.\n",
"\n",
"The contents of a YAML file should look something like this:"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "markdown",
"source": [
"```yaml\n",
"version: '0.7'\n",
"components: # define all the building-blocks for Pipeline\n",
"- name: MyReader # custom-name for the component; helpful for visualization & debugging\n",
" type: FARMReader # Haystack Class name for the component\n",