mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-07-20 23:41:36 +00:00

* Upgrade torch to v1.10.0 * Adapt torch version for torch-scatter in TableQA tutorial * Add latest docstring and tutorial changes * Make torch version more flexible Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
761 lines
37 KiB
Plaintext
761 lines
37 KiB
Plaintext
{
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2,
|
||
"metadata": {
|
||
"colab": {
|
||
"name": "Tutorial15_TableQA.ipynb",
|
||
"provenance": []
|
||
},
|
||
"kernelspec": {
|
||
"name": "python3",
|
||
"display_name": "Python 3"
|
||
},
|
||
"language_info": {
|
||
"name": "python"
|
||
},
|
||
"accelerator": "GPU"
|
||
},
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Open-Domain QA on Tables\n",
|
||
"[](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial15_TableQA.ipynb)\n",
|
||
"\n",
|
||
"This tutorial shows you how to perform question-answering on tables using the `TableTextRetriever` or `ElasticsearchRetriever` as retriever node and the `TableReader` as reader node."
|
||
],
|
||
"metadata": {
|
||
"id": "DeAkZwDhufYA"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"### Prepare environment\n",
|
||
"\n",
|
||
"#### Colab: Enable the GPU runtime\n",
|
||
"Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n",
|
||
"**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n",
|
||
"\n",
|
||
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/_src/img/colab_gpu_runtime.jpg\">"
|
||
],
|
||
"metadata": {
|
||
"id": "vbR3bETlvi-3"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"# Make sure you have a GPU running\n",
|
||
"!nvidia-smi"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "HW66x0rfujyO"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"# Install the latest release of Haystack in your own environment \n",
|
||
"#! pip install farm-haystack\n",
|
||
"\n",
|
||
"# Install the latest master of Haystack\n",
|
||
"!pip install grpcio-tools==1.34.1\n",
|
||
"!pip install git+https://github.com/deepset-ai/haystack.git\n",
|
||
"\n",
|
||
"# The TaPAs-based TableReader requires the torch-scatter library\n",
|
||
"!pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.0+cu113.html\n",
|
||
"\n",
|
||
"# If you run this notebook on Google Colab, you might need to\n",
|
||
"# restart the runtime after installing haystack."
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "_ZXoyhOAvn7M"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"### Start an Elasticsearch server\n",
|
||
"You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source."
|
||
],
|
||
"metadata": {
|
||
"id": "K_XJhluXwF5_"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"# Recommended: Start Elasticsearch using Docker via the Haystack utility function\n",
|
||
"from haystack.utils import launch_es\n",
|
||
"\n",
|
||
"launch_es()"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "frDqgzK7v2i1"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"source": [
|
||
"# In Colab / No Docker environments: Start Elasticsearch from source\n",
|
||
"! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q\n",
|
||
"! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz\n",
|
||
"! chown -R daemon:daemon elasticsearch-7.9.2\n",
|
||
"\n",
|
||
"import os\n",
|
||
"from subprocess import Popen, PIPE, STDOUT\n",
|
||
"es_server = Popen(['elasticsearch-7.9.2/bin/elasticsearch'],\n",
|
||
" stdout=PIPE, stderr=STDOUT,\n",
|
||
" preexec_fn=lambda: os.setuid(1) # as daemon\n",
|
||
" )\n",
|
||
"# wait until ES has started\n",
|
||
"! sleep 30"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "S4PGj1A6wKWu"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"# Connect to Elasticsearch\n",
|
||
"from haystack.document_stores import ElasticsearchDocumentStore\n",
|
||
"\n",
|
||
"# We want to use a small model producing 512-dimensional embeddings, so we need to set embedding_dim to 512\n",
|
||
"document_store = ElasticsearchDocumentStore(host=\"localhost\",\n",
|
||
" username=\"\",\n",
|
||
" password=\"\",\n",
|
||
" index=\"document\",\n",
|
||
" embedding_dim=512)"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "RmxepXZtwQ0E"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"## Add Tables to DocumentStore\n",
|
||
"To quickly demonstrate the capabilities of the `TableTextRetriever` and the `TableReader` we use a subset of 1000 tables of the [Open Table-and-Text Question Answering (OTT-QA) dataset](https://github.com/wenhuchen/OTT-QA).\n",
|
||
"\n",
|
||
"Just as text passages, tables are represented as `Document` objects in Haystack. The content field, though, is a pandas DataFrame instead of a string."
|
||
],
|
||
"metadata": {
|
||
"id": "fFh26LIlxldw"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"# Let's first fetch some tables that we want to query\n",
|
||
"# Here: 1000 tables from OTT-QA\n",
|
||
"from haystack.utils import fetch_archive_from_http\n",
|
||
"\n",
|
||
"doc_dir = \"data\"\n",
|
||
"s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/ottqa_tables_sample.json.zip\"\n",
|
||
"fetch_archive_from_http(url=s3_url, output_dir=doc_dir)"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "nM63uwbd8zd6"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"source": [
|
||
"# Add the tables to the DocumentStore\n",
|
||
"\n",
|
||
"import json\n",
|
||
"from haystack import Document\n",
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"def read_ottqa_tables(filename):\n",
|
||
" processed_tables = []\n",
|
||
" with open(filename) as tables:\n",
|
||
" tables = json.load(tables)\n",
|
||
" for key, table in tables.items():\n",
|
||
" current_columns = table[\"header\"]\n",
|
||
" current_rows = table[\"data\"]\n",
|
||
" current_df = pd.DataFrame(columns=current_columns, data=current_rows)\n",
|
||
" current_doc_title = table[\"title\"]\n",
|
||
" current_section_title = table[\"section_title\"]\n",
|
||
" document = Document(\n",
|
||
" content=current_df,\n",
|
||
" content_type=\"table\",\n",
|
||
" meta={\"title\": current_doc_title, \"section_title\": current_section_title},\n",
|
||
" id=key\n",
|
||
" )\n",
|
||
" processed_tables.append(document)\n",
|
||
"\n",
|
||
" return processed_tables\n",
|
||
"\n",
|
||
"\n",
|
||
"tables = read_ottqa_tables(\"data/ottqa_tables_sample.json\")\n",
|
||
"document_store.write_documents(tables, index=\"document\")\n",
|
||
"\n",
|
||
"# Showing content field and meta field of one of the Documents of content_type 'table'\n",
|
||
"print(tables[0].content)\n",
|
||
"print(tables[0].meta)"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
" Result ... Score\n",
|
||
"0 Winner ... 6-1 , 6-1\n",
|
||
"1 Winner ... 6-2 , 4-6 , 6-3\n",
|
||
"2 Winner ... 6-2 , 6-2\n",
|
||
"3 Runner-up ... 3-6 , 2-6\n",
|
||
"4 Winner ... 6-7 , 6-3 , 6-0\n",
|
||
"5 Winner ... 6-1 , 6-0\n",
|
||
"6 Winner ... 6-2 , 2-6 , 6-2\n",
|
||
"7 Winner ... 6-0 , 6-4\n",
|
||
"\n",
|
||
"[8 rows x 8 columns]\n",
|
||
"{'title': 'Rewa Hudson', 'section_title': 'ITF finals ( 7–3 ) -- Doubles ( 7–1 )'}\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"id": "SKjw2LuXxlGh",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "c24f8ca0-1a58-44ea-f01d-414db4c8f1f4"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"## Initalize Retriever, Reader, & Pipeline\n",
|
||
"\n",
|
||
"### Retriever\n",
|
||
"\n",
|
||
"Retrievers help narrowing down the scope for the Reader to a subset of tables where a given question could be answered.\n",
|
||
"They use some simple but fast algorithm.\n",
|
||
"\n",
|
||
"**Here:** We use the `TableTextRetriever` capable of retrieving relevant content among a database\n",
|
||
"of texts and tables using dense embeddings. It is an extension of the `DensePassageRetriever` and consists of three encoders (one query encoder, one text passage encoder and one table encoder) that create embeddings in the same vector space. More details on the `TableTextRetriever` and how it is trained can be found in [this paper](https://arxiv.org/abs/2108.04049).\n",
|
||
"\n",
|
||
"**Alternatives:**\n",
|
||
"\n",
|
||
"- `ElasticsearchRetriever` that uses BM25 algorithm\n"
|
||
],
|
||
"metadata": {
|
||
"id": "hmQC1sDmw3d7"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"from haystack.nodes.retriever import TableTextRetriever\n",
|
||
"\n",
|
||
"retriever = TableTextRetriever(\n",
|
||
" document_store=document_store,\n",
|
||
" query_embedding_model=\"deepset/bert-small-mm_retrieval-question_encoder\",\n",
|
||
" passage_embedding_model=\"deepset/bert-small-mm_retrieval-passage_encoder\",\n",
|
||
" table_embedding_model=\"deepset/bert-small-mm_retrieval-table_encoder\",\n",
|
||
" embed_meta_fields=[\"title\", \"section_title\"]\n",
|
||
")"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "EY_qvdV6wyK5"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"# Add table embeddings to the tables in DocumentStore\n",
|
||
"document_store.update_embeddings(retriever=retriever)"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "jasi1RM2zIJ7"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"## Alternative: ElasticsearchRetriever\n",
|
||
"#from haystack.nodes.retriever import ElasticsearchRetriever\n",
|
||
"#retriever = ElasticsearchRetriever(document_store=document_store)"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "XM-ijy6Zz11L"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"source": [
|
||
"# Try the Retriever\n",
|
||
"from haystack.utils import print_documents\n",
|
||
"\n",
|
||
"retrieved_tables = retriever.retrieve(\"How many twin buildings are under construction?\", top_k=5)\n",
|
||
"# Get highest scored table\n",
|
||
"print(retrieved_tables[0].content)"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
" Name ... Status\n",
|
||
"0 Twin Towers II ... Never built\n",
|
||
"1 World Trade Center ... Destroyed\n",
|
||
"2 Three Sixty West ... Under construction\n",
|
||
"3 Gateway Towers ... Under construction\n",
|
||
"4 Rustomjee Crown ... Under construction\n",
|
||
"5 Orchid Heights ... On-hold\n",
|
||
"6 Hermitage Towers ... Proposed\n",
|
||
"7 Lokhandwala Minerva ... Under construction\n",
|
||
"8 Lamar Towers ... Under construction\n",
|
||
"9 Indonesia One Towers ... Under construction\n",
|
||
"10 Sky link ... Approved\n",
|
||
"11 Vida Za'abeel ... Proposed\n",
|
||
"12 Broadway Corridor Twin Towers ... Never built\n",
|
||
"13 India Bulls Sky Forest Tower ... Under construction\n",
|
||
"14 Capital Towers ... Under construction\n",
|
||
"15 One Avighna Park ... Under construction\n",
|
||
"16 NEB Towers ... On hold\n",
|
||
"17 The Destiny ( Tower ) ... Under construction\n",
|
||
"18 Oberoi Esquire Towers ... Under construction\n",
|
||
"19 Bhoomi Celestia ... Under construction\n",
|
||
"\n",
|
||
"[20 rows x 6 columns]\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"id": "YHfQWxVI0N2e",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "05976ac9-bee3-4eb8-b36d-01f1db5250db"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"### Reader\n",
|
||
"The `TableReader` is based on TaPas, a transformer-based language model capable of grasping the two-dimensional structure of a table. It scans the tables returned by the retriever and extracts the anser. The available TableReader models can be found [here](https://huggingface.co/models?pipeline_tag=table-question-answering&sort=downloads).\n",
|
||
"\n",
|
||
"**Notice**: The `TableReader` will return an answer for each table, even if the query cannot be answered by the table. Furthermore, the confidence scores are not useful as of now, given that they will *always* be very high (i.e. 1 or close to 1)."
|
||
],
|
||
"metadata": {
|
||
"id": "zbwkXScm2-gy"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"source": [
|
||
"from haystack.nodes import TableReader\n",
|
||
"\n",
|
||
"reader = TableReader(model_name_or_path=\"google/tapas-base-finetuned-wtq\", max_seq_len=512)"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "4APcRoio2RxG"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"source": [
|
||
"# Try the TableReader on one Table (highest-scored retrieved table from previous section)\n",
|
||
"\n",
|
||
"table_doc = document_store.get_document_by_id(\"List_of_tallest_twin_buildings_and_structures_in_the_world_1\")\n",
|
||
"print(table_doc.content)"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
" Name ... Status\n",
|
||
"0 Twin Towers II ... Never built\n",
|
||
"1 World Trade Center ... Destroyed\n",
|
||
"2 Three Sixty West ... Under construction\n",
|
||
"3 Gateway Towers ... Under construction\n",
|
||
"4 Rustomjee Crown ... Under construction\n",
|
||
"5 Orchid Heights ... On-hold\n",
|
||
"6 Hermitage Towers ... Proposed\n",
|
||
"7 Lokhandwala Minerva ... Under construction\n",
|
||
"8 Lamar Towers ... Under construction\n",
|
||
"9 Indonesia One Towers ... Under construction\n",
|
||
"10 Sky link ... Approved\n",
|
||
"11 Vida Za'abeel ... Proposed\n",
|
||
"12 Broadway Corridor Twin Towers ... Never built\n",
|
||
"13 India Bulls Sky Forest Tower ... Under construction\n",
|
||
"14 Capital Towers ... Under construction\n",
|
||
"15 One Avighna Park ... Under construction\n",
|
||
"16 NEB Towers ... On hold\n",
|
||
"17 The Destiny ( Tower ) ... Under construction\n",
|
||
"18 Oberoi Esquire Towers ... Under construction\n",
|
||
"19 Bhoomi Celestia ... Under construction\n",
|
||
"\n",
|
||
"[20 rows x 6 columns]\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"id": "ILuAXkyN4F7x",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "7bdb7190-fcf8-4296-c237-cffc78dac4aa"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"source": [
|
||
"from haystack.utils import print_answers\n",
|
||
"\n",
|
||
"prediction = reader.predict(query=\"How many twin buildings are under construction?\", documents=[table_doc])\n",
|
||
"print_answers(prediction, details=\"all\")"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"{ 'answers': [ Answer(answer='12', type='extractive', score=1.0, context= Name ... Status\n",
|
||
"0 Twin Towers II ... Never built\n",
|
||
"1 World Trade Center ... Destroyed\n",
|
||
"2 Three Sixty West ... Under construction\n",
|
||
"3 Gateway Towers ... Under construction\n",
|
||
"4 Rustomjee Crown ... Under construction\n",
|
||
"5 Orchid Heights ... On-hold\n",
|
||
"6 Hermitage Towers ... Proposed\n",
|
||
"7 Lokhandwala Minerva ... Under construction\n",
|
||
"8 Lamar Towers ... Under construction\n",
|
||
"9 Indonesia One Towers ... Under construction\n",
|
||
"10 Sky link ... Approved\n",
|
||
"11 Vida Za'abeel ... Proposed\n",
|
||
"12 Broadway Corridor Twin Towers ... Never built\n",
|
||
"13 India Bulls Sky Forest Tower ... Under construction\n",
|
||
"14 Capital Towers ... Under construction\n",
|
||
"15 One Avighna Park ... Under construction\n",
|
||
"16 NEB Towers ... On hold\n",
|
||
"17 The Destiny ( Tower ) ... Under construction\n",
|
||
"18 Oberoi Esquire Towers ... Under construction\n",
|
||
"19 Bhoomi Celestia ... Under construction\n",
|
||
"\n",
|
||
"[20 rows x 6 columns], offsets_in_document=[Span(start=12, end=13), Span(start=18, end=19), Span(start=24, end=25), Span(start=42, end=43), Span(start=48, end=49), Span(start=54, end=55), Span(start=78, end=79), Span(start=84, end=85), Span(start=90, end=91), Span(start=102, end=103), Span(start=108, end=109), Span(start=114, end=115)], offsets_in_context=[Span(start=12, end=13), Span(start=18, end=19), Span(start=24, end=25), Span(start=42, end=43), Span(start=48, end=49), Span(start=54, end=55), Span(start=78, end=79), Span(start=84, end=85), Span(start=90, end=91), Span(start=102, end=103), Span(start=108, end=109), Span(start=114, end=115)], document_id='List_of_tallest_twin_buildings_and_structures_in_the_world_1', meta={'aggregation_operator': 'COUNT', 'answer_cells': ['Three Sixty West', 'Gateway Towers', 'Rustomjee Crown', 'Lokhandwala Minerva', 'Lamar Towers', 'Indonesia One Towers', 'India Bulls Sky Forest Tower', 'Capital Towers', 'One Avighna Park', 'The Destiny ( Tower )', 'Oberoi Esquire Towers', 'Bhoomi Celestia']})],\n",
|
||
" 'query': 'How many twin buildings are under construction?'}\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"id": "ilbsecgA4vfN",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "5f4e8f0b-bc9e-485b-c933-546fcad2b411"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"The offsets in the `offsets_in_document` and `offsets_in_context` field indicate the table cells that the model predicts to be part of the answer. They need to be interpreted on the linearized table, i.e., a flat list containing all of the table cells.\n",
|
||
"\n",
|
||
"In the `Answer`'s meta field, you can find the aggreagtion operator used to construct the answer (in this case `COUNT`) and the answer cells as strings."
|
||
],
|
||
"metadata": {
|
||
"id": "jkAYNMb7R9qu"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"source": [
|
||
"print(f\"Predicted answer: {prediction['answers'][0].answer}\")\n",
|
||
"print(f\"Meta field: {prediction['answers'][0].meta}\")"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"Predicted answer: 12\n",
|
||
"Meta field: {'aggregation_operator': 'COUNT', 'answer_cells': ['Three Sixty West', 'Gateway Towers', 'Rustomjee Crown', 'Lokhandwala Minerva', 'Lamar Towers', 'Indonesia One Towers', 'India Bulls Sky Forest Tower', 'Capital Towers', 'One Avighna Park', 'The Destiny ( Tower )', 'Oberoi Esquire Towers', 'Bhoomi Celestia']}\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "It8XYT2ZTVJs",
|
||
"outputId": "5bd712a0-9f22-4fc0-a4f1-b01b15cb9916"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"### Pipeline\n",
|
||
"The Retriever and the Reader can be sticked together to a pipeline in order to first retrieve relevant tables and then extract the answer.\n",
|
||
"\n",
|
||
"**Notice**: Given that the `TableReader` does not provide useful confidence scores and returns an answer for each of the tables, the sorting of the answers might be not helpful."
|
||
],
|
||
"metadata": {
|
||
"id": "pgmG7pzL5ceh"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"source": [
|
||
"# Initialize pipeline\n",
|
||
"from haystack import Pipeline\n",
|
||
"\n",
|
||
"table_qa_pipeline = Pipeline()\n",
|
||
"table_qa_pipeline.add_node(component=retriever, name=\"TableTextRetriever\", inputs=[\"Query\"])\n",
|
||
"table_qa_pipeline.add_node(component=reader, name=\"TableReader\", inputs=[\"TableTextRetriever\"])"
|
||
],
|
||
"outputs": [],
|
||
"metadata": {
|
||
"id": "G-aZZvyv4-Mf"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"source": [
|
||
"prediction = table_qa_pipeline.run(\"How many twin buildings are under construction?\")\n",
|
||
"print_answers(prediction, details=\"minimum\")"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"[ { 'answer': '12',\n",
|
||
" 'context': Name ... Status\n",
|
||
"0 Twin Towers II ... Never built\n",
|
||
"1 World Trade Center ... Destroyed\n",
|
||
"2 Three Sixty West ... Under construction\n",
|
||
"3 Gateway Towers ... Under construction\n",
|
||
"4 Rustomjee Crown ... Under construction\n",
|
||
"5 Orchid Heights ... On-hold\n",
|
||
"6 Hermitage Towers ... Proposed\n",
|
||
"7 Lokhandwala Minerva ... Under construction\n",
|
||
"8 Lamar Towers ... Under construction\n",
|
||
"9 Indonesia One Towers ... Under construction\n",
|
||
"10 Sky link ... Approved\n",
|
||
"11 Vida Za'abeel ... Proposed\n",
|
||
"12 Broadway Corridor Twin Towers ... Never built\n",
|
||
"13 India Bulls Sky Forest Tower ... Under construction\n",
|
||
"14 Capital Towers ... Under construction\n",
|
||
"15 One Avighna Park ... Under construction\n",
|
||
"16 NEB Towers ... On hold\n",
|
||
"17 The Destiny ( Tower ) ... Under construction\n",
|
||
"18 Oberoi Esquire Towers ... Under construction\n",
|
||
"19 Bhoomi Celestia ... Under construction\n",
|
||
"\n",
|
||
"[20 rows x 6 columns]},\n",
|
||
" { 'answer': '7',\n",
|
||
" 'context': Building or structure ... Listing\n",
|
||
"0 Ford Assembly Plant Building Now Public Storage ... Seattle landmark\n",
|
||
"1 Immanuel Lutheran Church ... Seattle landmark NRHP\n",
|
||
"2 Jensen Block ... Seattle landmark\n",
|
||
"3 Lake Union Steam Plant and Hydro House Now Zymogenetics ... Seattle landmark\n",
|
||
"4 New Richmond Laundry Now part of the Alley24 development ... Seattle landmark\n",
|
||
"5 St. Spiridon Russian Orthodox Cathedral ... Seattle landmark\n",
|
||
"6 Supply Laundry Building Now part of the Stackhouse development ... Seattle landmark NRHP\n",
|
||
"\n",
|
||
"[7 rows x 3 columns]},\n",
|
||
" { 'answer': '8',\n",
|
||
" 'context': Years Venue Location\n",
|
||
"0 1989 Bamm Hollow Country Club Lincroft , New Jersey\n",
|
||
"1 1987-88 Navesink Country Club Middletown , New Jersey\n",
|
||
"2 1985-86 Fairmount Country Club Chatham , New Jersey\n",
|
||
"3 1983-84 Upper Montclair Country Club Clifton , New Jersey\n",
|
||
"4 1982 Wykagyl Country Club New Rochelle , New York\n",
|
||
"5 1981 Ridgewood Country Club Paramus , New Jersey\n",
|
||
"6 1979-80 Upper Montclair Country Club Clifton , New Jersey\n",
|
||
"7 1976-78 Forsgate Country Club Monroe Township , New Jersey},\n",
|
||
" { 'answer': '8',\n",
|
||
" 'context': Model Specification ... Prime mover Power output\n",
|
||
"0 RS-1 E-1641A ... 6-539T 1,000 hp ( 0.75 MW )\n",
|
||
"1 RS-2 E-1661 , E-1661A , E-1661B ... 12-244 1,500 hp ( 1.12 MW )\n",
|
||
"2 RS-2 E-1661C ... 12-244 1,600 hp ( 1.19 MW )\n",
|
||
"3 RS-3 E-1662 , E-1662A , E-1662B ... 12-244 1,600 hp ( 1.19 MW )\n",
|
||
"4 RS-11 DL-701 ... 12-251 1,800 hp ( 1.34 MW )\n",
|
||
"5 RS-27 DL-640 ... 16-251 2,400 hp ( 1.79 MW )\n",
|
||
"6 RS-32 DL-721 ... 12-251 2,000 hp ( 1.49 MW )\n",
|
||
"7 RS-36 DL-701XAP ... 12-251 1,800 hp ( 1.34 MW )\n",
|
||
"\n",
|
||
"[8 rows x 7 columns]},\n",
|
||
" { 'answer': '10',\n",
|
||
" 'context': Name or designation ... Notes\n",
|
||
"0 Aluminum Overcast ... One of only ten flyable B-17s\n",
|
||
"1 Avro Lancaster PA474 ... One of only two Lancasters in flying condition in the world\n",
|
||
"2 Avro Vulcan XH558 , aka Spirit of Great Britain ... The only Cold War / Falklands War -era Vulcan bomber to fly after 1986 . Res...\n",
|
||
"3 Douglas DC-7B N836D ... \n",
|
||
"4 Douglas R4D-3 N763A ... Used by the US Navy during World War II . Placed on the National Register of...\n",
|
||
"5 FIFI ... One of only two B-29s flying\n",
|
||
"6 Glacier Girl ... Forced to land in Greenland in 1942 along with five other P-38s and two B-17...\n",
|
||
"7 Hawker Hurricane PZ865 ... Last Hurricane produced . Retained by Hawker Aircraft for trials work . Give...\n",
|
||
"8 My Gal Sal ... Forced to land on the Greenland icecap during World War II and abandoned , a...\n",
|
||
"9 Piccadilly Lilly II ... Last B-17 to serve in the US Air Force , flying her last mission in 1959 . U...\n",
|
||
"10 The Pink Lady ... Only flying B-17 survivor to have seen action in Europe during World War II\n",
|
||
"11 Sally B ... Only airworthy B-17 left in Europe . Used in the 1990 film Memphis Belle\n",
|
||
"12 Sentimental Journey ... Based at the Commemorative Air Force Museum in Mesa , Arizona , and regularl...\n",
|
||
"13 Shoo Shoo Baby ... Crash-landed in Sweden in 1944 . Restored from 1978 to 1988\n",
|
||
"14 Swamp Ghost ... Ran out of fuel and crash-landed in a swamp in Papua New Guinea . Recovered ...\n",
|
||
"15 Texas Raiders ... Maintained and flown by the Commemorative Air Force ( formerly Confederate A...\n",
|
||
"16 Thunderbird ... Housed at the Lone Star Flight Museum in Galveston , Texas\n",
|
||
"17 Worry Bird ... Served in World War II and the Korean War before being retired in 1957 and p...\n",
|
||
"18 Yankee Lady ... Flyable\n",
|
||
"\n",
|
||
"[19 rows x 6 columns]},\n",
|
||
" { 'answer': '13',\n",
|
||
" 'context': N Year Country ... Link Remark K\n",
|
||
"0 003+ 2013 INDIA ... LK RK K\n",
|
||
"1 005 2006 USA ... LK RK K\n",
|
||
"2 010 2014 ZAF ... LK RK K\n",
|
||
"3 020 2010 USA ... LK RK K\n",
|
||
"4 030 201 ? USA ... LK RK K\n",
|
||
"5 040 2007 USA ... LK RK K\n",
|
||
"6 042 2004 USA ... LK Only G-S With Large Battery K\n",
|
||
"7 050 201 ? USA ... LK RK K\n",
|
||
"8 100 20 ? ? USA ... LK RK K\n",
|
||
"9 200 20 ? ? USA ... LK RK K\n",
|
||
"10 300 2013 EUR ... LK RK K\n",
|
||
"11 400 20 ? ? USA ... LK RK K\n",
|
||
"12 995 20 ? ? USA ... LK RK K\n",
|
||
"\n",
|
||
"[13 rows x 12 columns]},\n",
|
||
" { 'answer': '5',\n",
|
||
" 'context': Team ... Capacity\n",
|
||
"0 Barnsley ... 23,009\n",
|
||
"1 Blackpool ... 16,750\n",
|
||
"2 Bradford City ... 25,136\n",
|
||
"3 Burton Albion ... 6,912\n",
|
||
"4 Bury ... 11,840\n",
|
||
"5 Chesterfield ... 10,400\n",
|
||
"6 Colchester United ... 10,105\n",
|
||
"7 Coventry City ... 32,500\n",
|
||
"8 Crewe Alexandra ... 10,066\n",
|
||
"9 Doncaster Rovers ... 15,231\n",
|
||
"10 Fleetwood Town ... 5,311\n",
|
||
"11 Gillingham ... 11,582\n",
|
||
"12 Millwall ... 20,146\n",
|
||
"13 Oldham Athletic ... 13,512\n",
|
||
"14 Peterborough United ... 14,319\n",
|
||
"15 Port Vale ... 18,947\n",
|
||
"16 Rochdale ... 10,249\n",
|
||
"17 Scunthorpe United ... 9,183\n",
|
||
"18 Sheffield United ... 32,702\n",
|
||
"19 Shrewsbury Town ... 9,875\n",
|
||
"\n",
|
||
"[20 rows x 4 columns]},\n",
|
||
" { 'answer': '7',\n",
|
||
" 'context': Resource Name ... Added\n",
|
||
"0 Whitfield Estates-Broughton Street Historic District ... October 29 , 1993\n",
|
||
"1 John M. Beasley House ... March 5 , 1996\n",
|
||
"2 Whitfield Estates-Lantana Avenue Historic District ... March 8 , 1997\n",
|
||
"3 Austin House ... February 5 , 1998\n",
|
||
"4 Reid-Woods House ... August 31 , 2000\n",
|
||
"5 Villa Serena Apartments ... September 29 , 2000\n",
|
||
"6 Paul M. Souder House ... November 2 , 2000\n",
|
||
"7 Stevens-Gilchrist House ... August 17 , 2001\n",
|
||
"\n",
|
||
"[8 rows x 3 columns]},\n",
|
||
" { 'answer': '19',\n",
|
||
" 'context': Name ( Alternative names in parenthesis ) ... Carries\n",
|
||
"0 Arboretum Sewer Trestle ... Sewer and a footpath\n",
|
||
"1 Ballard Bridge ( 15th Avenue Bridge ) ... 15th Avenue NW\n",
|
||
"2 Cowen Park Bridge ... 15th Avenue NE\n",
|
||
"3 First Avenue South Bridge ... State Route 99\n",
|
||
"4 Fremont Bridge ( Fremont Avenue Bridge ) ... Road connecting Fremont Avenue N and 4th Avenue N\n",
|
||
"5 George Washington Memorial Bridge ( Aurora Bridge ) ... State Route 99\n",
|
||
"6 Homer M. Hadley Memorial Bridge ( Third Lake Washington Bridge ) ... Interstate 90\n",
|
||
"7 Jeanette Williams Memorial Bridge ( West Seattle Bridge ) ... Road connecting Fauntleroy Way SW and the Spokane Street Viaduct\n",
|
||
"8 Jose Rizal Bridge ( 12th Avenue South Bridge ) ... 12th Avenue S and Interstate 90\n",
|
||
"9 Lacey V. Murrow Memorial Bridge ... Interstate 90\n",
|
||
"10 Magnolia Bridge ... W Garfield Street\n",
|
||
"11 Montlake Bridge ... State Route 513\n",
|
||
"12 North Queen Anne Drive Bridge ... N Queen Anne Drive\n",
|
||
"13 Salmon Bay Bridge ... BNSF Railway\n",
|
||
"14 Ship Canal Bridge ... Interstate 5\n",
|
||
"15 Schmitz Park Bridge ... SW Admiral Way\n",
|
||
"16 Spokane Street Bridge ... SW Spokane Street\n",
|
||
"17 SR 520 Albert D. Rosellini Evergreen Point Floating Bridge ( Evergreen Point... ... State Route 520\n",
|
||
"18 20th Avenue NE Bridge ( Ravenna Park Bridge ) ... 20th Avenue NE ( pedestrian access only )\n",
|
||
"19 University Bridge ... Eastlake Avenue NE\n",
|
||
"\n",
|
||
"[20 rows x 6 columns]},\n",
|
||
" { 'answer': '8',\n",
|
||
" 'context': Location ... Comments\n",
|
||
"0 Ayr ... Known as Wonderwest World 1988-1998 ; operated as Craig Tara by Haven since ...\n",
|
||
"1 Bahamas ... The site is now occupied by a new hotel and marina complex known as Old Baha...\n",
|
||
"2 Barry Island ... Operated independently until closure in 1996 . Demolished in 2005\n",
|
||
"3 Bognor Regis ... Known as Southcoast World 1987-1998 . Still open as Butlins Bognor Regis\n",
|
||
"4 Clacton ... Demolished , now a housing estate . Small area yet to be redeveloped\n",
|
||
"5 Filey Holiday Camp ... Operated independently for six weeks in 1986 , but the venture failed and it...\n",
|
||
"6 Minehead ... Known as Somerwest World 1986-1998 . Still open as Butlins Minehead 30 April...\n",
|
||
"7 Mosney ... Operated independently until closure and conversion into an Irish Government...\n",
|
||
"8 Pwllheli ... Known as Starcoast World 1990-1998 ; operated as Hafan Y Mor by Haven since ...\n",
|
||
"9 Skegness ... Known as Funcoast World 1987-1998 . Still open as Butlins Skegness\n",
|
||
"\n",
|
||
"[10 rows x 4 columns]}]\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"id": "m8evexnW6dev",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "290168b1-294e-42ed-c970-e5ddfefb3396"
|
||
}
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"## About us\n",
|
||
"\n",
|
||
"This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n",
|
||
"\n",
|
||
"We bring NLP to the industry via open source! \n",
|
||
"Our focus: Industry specific language models & large scale QA systems. \n",
|
||
" \n",
|
||
"Some of our other work: \n",
|
||
"- [German BERT](https://deepset.ai/german-bert)\n",
|
||
"- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n",
|
||
"- [FARM](https://github.com/deepset-ai/FARM)\n",
|
||
"\n",
|
||
"Get in touch:\n",
|
||
"[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n",
|
||
"\n",
|
||
"By the way: [we're hiring!](https://www.deepset.ai/jobs)\n"
|
||
],
|
||
"metadata": {
|
||
"id": "RyeK3s28_X1C"
|
||
}
|
||
}
|
||
]
|
||
} |