mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-07-23 17:00:41 +00:00
457 lines
21 KiB
Plaintext
457 lines
21 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Better retrieval via \"Dense Passage Retrieval\"\n",
|
|
"\n",
|
|
"\n",
|
|
"### Importance of Retrievers\n",
|
|
"\n",
|
|
"The Retriever has a huge impact on the performance of our overall search pipeline.\n",
|
|
"\n",
|
|
"\n",
|
|
"### Different types of Retrievers\n",
|
|
"#### Sparse\n",
|
|
"Family of algorithms based on counting the occurences of words (bag-of-words) resulting in very sparse vectors with length = vocab size. \n",
|
|
"\n",
|
|
"Examples: BM25, TF-IDF \n",
|
|
"Pros: Simple, fast, well explainable \n",
|
|
"Cons: Relies on exact keyword matches between query and text \n",
|
|
" \n",
|
|
"\n",
|
|
"#### Dense\n",
|
|
"These retrievers use neural network models to create \"dense\" embedding vectors. Within this family there are two different approaches: \n",
|
|
"\n",
|
|
"a) Single encoder: Use a **single model** to embed both query and passage. \n",
|
|
"b) Dual-encoder: Use **two models**, one to embed the query and one to embed the passage\n",
|
|
"\n",
|
|
"Recent work suggests that dual encoders work better, likely because they can deal better with the different nature of query and passage (length, style, syntax ...). \n",
|
|
"\n",
|
|
"Examples: REALM, DPR, Sentence-Transformers ...\n",
|
|
"Pros: Captures semantinc similarity instead of \"word matches\" (e.g. synonyms, related topics ...) \n",
|
|
"Cons: Computationally more heavy, initial training of model \n",
|
|
"\n",
|
|
"\n",
|
|
"### \"Dense Passage Retrieval\"\n",
|
|
"\n",
|
|
"In this Tutorial, we want to highlight one \"Dense Dual-Encoder\" called Dense Passage Retriever. \n",
|
|
"It was introdoced by Karpukhin et al. (2020, https://arxiv.org/abs/2004.04906. \n",
|
|
"\n",
|
|
"Original Abstract: \n",
|
|
"\n",
|
|
"_\"Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.\"_\n",
|
|
"\n",
|
|
"Paper: https://arxiv.org/abs/2004.04906 \n",
|
|
"Original Code: https://fburl.com/qa-dpr \n",
|
|
"\n",
|
|
"\n",
|
|
"*Use this [link](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb) to open the notebook in Google Colab.*\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prepare environment\n",
|
|
"\n",
|
|
"### Colab: Enable the GPU runtime \n",
|
|
"Make sure you enable the GPU runtime to experience decent speed in this tutorial. \n",
|
|
"**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n",
|
|
"\n",
|
|
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/colab_gpu_runtime.jpg\">"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Make sure you have a GPU running\n",
|
|
"!nvidia-smi"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! pip install git+https://github.com/deepset-ai/haystack.git"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from haystack import Finder\n",
|
|
"from haystack.indexing.cleaning import clean_wiki_text\n",
|
|
"from haystack.indexing.utils import convert_files_to_dicts, fetch_archive_from_http\n",
|
|
"from haystack.reader.farm import FARMReader\n",
|
|
"from haystack.reader.transformers import TransformersReader\n",
|
|
"from haystack.utils import print_answers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Document Store\n",
|
|
"\n",
|
|
"FAISS is a library for efficient similarity search on a cluster of dense vectors.\n",
|
|
"The `FAISSDocumentStore` uses a SQL(SQLite in-memory be default) database under-the-hood\n",
|
|
"to store the document text and other meta data. The vector embeddings of the text are\n",
|
|
"indexed on a FAISS Index that later is queried for searching answers."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"outputs": [],
|
|
"source": [
|
|
"from haystack.database.faiss import FAISSDocumentStore\n",
|
|
"\n",
|
|
"document_store = FAISSDocumentStore()"
|
|
],
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Cleaning & indexing documents\n",
|
|
"\n",
|
|
"Similarly to the previous tutorials, we download, convert and index some Game of Thrones articles to our DocumentStore"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"07/03/2020 11:46:28 - INFO - haystack.indexing.utils - Found data stored in `data/article_txt_got`. Delete this first if you really want to fetch new data.\n",
|
|
"07/03/2020 11:46:28 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.300s]\n",
|
|
"07/03/2020 11:46:28 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.209s]\n",
|
|
"07/03/2020 11:46:28 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.124s]\n",
|
|
"07/03/2020 11:46:29 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.101s]\n",
|
|
"07/03/2020 11:46:29 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.125s]\n",
|
|
"07/03/2020 11:46:29 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.096s]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Let's first get some files that we want to use\n",
|
|
"doc_dir = \"data/article_txt_got\"\n",
|
|
"s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip\"\n",
|
|
"fetch_archive_from_http(url=s3_url, output_dir=doc_dir)\n",
|
|
"\n",
|
|
"# Convert files to dicts\n",
|
|
"dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)\n",
|
|
"\n",
|
|
"# Now, let's write the dicts containing documents to our DB.\n",
|
|
"document_store.write_documents(dicts)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initalize Retriever, Reader, & Finder\n",
|
|
"\n",
|
|
"### Retriever\n",
|
|
"\n",
|
|
"**Here:** We use a `DensePassageRetriever`\n",
|
|
"\n",
|
|
"**Alternatives:**\n",
|
|
"\n",
|
|
"- The `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters\n",
|
|
"- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)\n",
|
|
"- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"07/03/2020 11:46:29 - INFO - haystack.retriever.dpr_utils - Loading saved model from models/dpr/checkpoint/retriever/single/nq/bert-base-encoder.cp\n",
|
|
"07/03/2020 11:46:29 - INFO - haystack.retriever.dense - Loaded encoder params: {'do_lower_case': True, 'pretrained_model_cfg': 'bert-base-uncased', 'encoder_model_type': 'hf_bert', 'pretrained_file': None, 'projection_dim': 0, 'sequence_length': 256}\n",
|
|
"07/03/2020 11:46:36 - INFO - haystack.retriever.dense - Loading saved model state ...\n",
|
|
"07/03/2020 11:46:36 - INFO - haystack.retriever.dense - Loading saved model state ...\n",
|
|
"07/03/2020 11:46:37 - INFO - elasticsearch - GET http://localhost:9200/document/_search?scroll=5m&size=1000 [status:200 request:0.177s]\n",
|
|
"07/03/2020 11:46:37 - INFO - elasticsearch - GET http://localhost:9200/_search/scroll [status:200 request:0.053s]\n",
|
|
"07/03/2020 11:46:37 - INFO - elasticsearch - GET http://localhost:9200/_search/scroll [status:200 request:0.047s]\n",
|
|
"07/03/2020 11:46:37 - INFO - elasticsearch - GET http://localhost:9200/_search/scroll [status:200 request:0.003s]\n",
|
|
"07/03/2020 11:46:37 - INFO - elasticsearch - DELETE http://localhost:9200/_search/scroll [status:200 request:0.005s]\n",
|
|
"07/03/2020 11:46:37 - INFO - haystack.database.elasticsearch - Updating embeddings for 2811 docs ...\n",
|
|
"07/03/2020 11:46:55 - INFO - haystack.retriever.dense - Embedded 80 / 2811 texts\n",
|
|
"07/03/2020 11:47:13 - INFO - haystack.retriever.dense - Embedded 160 / 2811 texts\n",
|
|
"07/03/2020 11:47:35 - INFO - haystack.retriever.dense - Embedded 240 / 2811 texts\n",
|
|
"07/03/2020 11:47:55 - INFO - haystack.retriever.dense - Embedded 320 / 2811 texts\n",
|
|
"07/03/2020 11:48:15 - INFO - haystack.retriever.dense - Embedded 400 / 2811 texts\n",
|
|
"07/03/2020 11:48:34 - INFO - haystack.retriever.dense - Embedded 480 / 2811 texts\n",
|
|
"07/03/2020 11:48:53 - INFO - haystack.retriever.dense - Embedded 560 / 2811 texts\n",
|
|
"07/03/2020 11:49:15 - INFO - haystack.retriever.dense - Embedded 640 / 2811 texts\n",
|
|
"07/03/2020 11:49:35 - INFO - haystack.retriever.dense - Embedded 720 / 2811 texts\n",
|
|
"07/03/2020 11:49:57 - INFO - haystack.retriever.dense - Embedded 800 / 2811 texts\n",
|
|
"07/03/2020 11:50:20 - INFO - haystack.retriever.dense - Embedded 880 / 2811 texts\n",
|
|
"07/03/2020 11:50:44 - INFO - haystack.retriever.dense - Embedded 960 / 2811 texts\n",
|
|
"07/03/2020 11:51:07 - INFO - haystack.retriever.dense - Embedded 1040 / 2811 texts\n",
|
|
"07/03/2020 11:51:29 - INFO - haystack.retriever.dense - Embedded 1120 / 2811 texts\n",
|
|
"07/03/2020 11:51:52 - INFO - haystack.retriever.dense - Embedded 1200 / 2811 texts\n",
|
|
"07/03/2020 11:52:14 - INFO - haystack.retriever.dense - Embedded 1280 / 2811 texts\n",
|
|
"07/03/2020 11:52:38 - INFO - haystack.retriever.dense - Embedded 1360 / 2811 texts\n",
|
|
"07/03/2020 11:53:00 - INFO - haystack.retriever.dense - Embedded 1440 / 2811 texts\n",
|
|
"07/03/2020 11:53:23 - INFO - haystack.retriever.dense - Embedded 1520 / 2811 texts\n",
|
|
"07/03/2020 11:53:48 - INFO - haystack.retriever.dense - Embedded 1600 / 2811 texts\n",
|
|
"07/03/2020 11:54:09 - INFO - haystack.retriever.dense - Embedded 1680 / 2811 texts\n",
|
|
"07/03/2020 11:54:33 - INFO - haystack.retriever.dense - Embedded 1760 / 2811 texts\n",
|
|
"07/03/2020 11:54:56 - INFO - haystack.retriever.dense - Embedded 1840 / 2811 texts\n",
|
|
"07/03/2020 11:55:18 - INFO - haystack.retriever.dense - Embedded 1920 / 2811 texts\n",
|
|
"07/03/2020 11:55:41 - INFO - haystack.retriever.dense - Embedded 2000 / 2811 texts\n",
|
|
"07/03/2020 11:56:08 - INFO - haystack.retriever.dense - Embedded 2080 / 2811 texts\n",
|
|
"07/03/2020 11:56:32 - INFO - haystack.retriever.dense - Embedded 2160 / 2811 texts\n",
|
|
"07/03/2020 11:56:54 - INFO - haystack.retriever.dense - Embedded 2240 / 2811 texts\n",
|
|
"07/03/2020 11:57:18 - INFO - haystack.retriever.dense - Embedded 2320 / 2811 texts\n",
|
|
"07/03/2020 11:57:40 - INFO - haystack.retriever.dense - Embedded 2400 / 2811 texts\n",
|
|
"07/03/2020 11:58:04 - INFO - haystack.retriever.dense - Embedded 2480 / 2811 texts\n",
|
|
"07/03/2020 11:58:39 - INFO - haystack.retriever.dense - Embedded 2560 / 2811 texts\n",
|
|
"07/03/2020 11:59:16 - INFO - haystack.retriever.dense - Embedded 2640 / 2811 texts\n",
|
|
"07/03/2020 11:59:53 - INFO - haystack.retriever.dense - Embedded 2720 / 2811 texts\n",
|
|
"07/03/2020 12:00:33 - INFO - haystack.retriever.dense - Embedded 2800 / 2811 texts\n",
|
|
"07/03/2020 12:00:42 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:2.577s]\n",
|
|
"07/03/2020 12:00:44 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:2.085s]\n",
|
|
"07/03/2020 12:00:46 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.918s]\n",
|
|
"07/03/2020 12:00:49 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.933s]\n",
|
|
"07/03/2020 12:00:51 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.562s]\n",
|
|
"07/03/2020 12:00:52 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.150s]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from haystack.retriever.dense import DensePassageRetriever\n",
|
|
"retriever = DensePassageRetriever(document_store=document_store, embedding_model=\"dpr-bert-base-nq\",\n",
|
|
" do_lower_case=True, use_gpu=True)\n",
|
|
"\n",
|
|
"# Important: \n",
|
|
"# Now that after we have the DPR initialized, we need to call update_embeddings() to iterate over all\n",
|
|
"# previously indexed documents and update their embedding representation. \n",
|
|
"# While this can be a time consuming operation (depending on corpus size), it only needs to be done once. \n",
|
|
"# At query time, we only need to embed the query and compare it the existing doc embeddings which is very fast.\n",
|
|
"document_store.update_embeddings(retriever)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Reader\n",
|
|
"\n",
|
|
"Similar to previous Tutorials we now initalize our reader.\n",
|
|
"\n",
|
|
"Here we use a FARMReader with the *deepset/roberta-base-squad2* model (see: https://huggingface.co/deepset/roberta-base-squad2)\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"#### FARMReader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"07/03/2020 12:00:52 - INFO - farm.utils - device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None\n",
|
|
"07/03/2020 12:00:52 - INFO - farm.infer - Could not find `deepset/roberta-base-squad2` locally. Try to download from model hub ...\n",
|
|
"07/03/2020 12:00:59 - WARNING - farm.modeling.language_model - Could not automatically detect from language model name what language it is. \n",
|
|
"\t We guess it's an *ENGLISH* model ... \n",
|
|
"\t If not: Init the language model by supplying the 'language' param.\n",
|
|
"07/03/2020 12:01:07 - WARNING - farm.modeling.prediction_head - Some unused parameters are passed to the QuestionAnsweringHead. Might not be a problem. Params: {\"loss_ignore_index\": -1}\n",
|
|
"/home/mp/miniconda3/envs/py37/lib/python3.7/site-packages/transformers/tokenization_utils.py:831: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.\n",
|
|
" category=FutureWarning,\n",
|
|
"07/03/2020 12:01:11 - INFO - farm.utils - device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None\n",
|
|
"07/03/2020 12:01:11 - INFO - farm.infer - Got ya 7 parallel workers to do inference ...\n",
|
|
"07/03/2020 12:01:11 - INFO - farm.infer - 0 0 0 0 0 0 0 \n",
|
|
"07/03/2020 12:01:11 - INFO - farm.infer - /w\\ /w\\ /w\\ /w\\ /w\\ /w\\ /w\\\n",
|
|
"07/03/2020 12:01:11 - INFO - farm.infer - /'\\ / \\ /'\\ /'\\ / \\ / \\ /'\\\n",
|
|
"07/03/2020 12:01:11 - INFO - farm.infer - \n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Load a local model or any of the QA models on\n",
|
|
"# Hugging Face's model hub (https://huggingface.co/models)\n",
|
|
"\n",
|
|
"reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Finder\n",
|
|
"\n",
|
|
"The Finder sticks together reader and retriever in a pipeline to answer our actual questions. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"finder = Finder(reader, retriever)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Voilà! Ask a question!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"07/03/2020 12:07:16 - INFO - elasticsearch - GET http://localhost:9200/document/_search [status:200 request:0.018s]\n",
|
|
"07/03/2020 12:07:16 - INFO - haystack.finder - Reader is looking for detailed answer in 11362 chars ...\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 1.09 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 2.05 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00, 1.76s/ Batches]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 1.50 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 1.93 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 2.15 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 1.64 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 1.93 Batches/s]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:02<00:00, 2.60s/ Batches]\n",
|
|
"Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 1.10 Batches/s]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# You can configure how many candidates the reader and retriever shall return\n",
|
|
"# The higher top_k_retriever, the better (but also the slower) your answers. \n",
|
|
"prediction = finder.get_answers(question=\"Who created the Dothraki vocabulary?\", top_k_retriever=10, top_k_reader=5)\n",
|
|
"\n",
|
|
"#prediction = finder.get_answers(question=\"Who is the father of Arya Stark?\", top_k_retriever=10, top_k_reader=5)\n",
|
|
"#prediction = finder.get_answers(question=\"Who is the sister of Sansa?\", top_k_retriever=10, top_k_reader=5)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%%\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[ { 'answer': 'David J. Peterson',\n",
|
|
" 'context': \"ge for ''Game of Thrones''\\n\"\n",
|
|
" 'The Dothraki vocabulary was created by David J. Peterson '\n",
|
|
" 'well in advance of the adaptation. HBO hired the Language '\n",
|
|
" 'Creation'},\n",
|
|
" { 'answer': 'David J. Peterson',\n",
|
|
" 'context': '\\n'\n",
|
|
" '===Creation===\\n'\n",
|
|
" 'David J. Peterson, creator of the spoken Valyrian '\n",
|
|
" \"languages for ''Game of Thrones''\\n\"\n",
|
|
" 'To create the Dothraki and Valyrian languages to b'},\n",
|
|
" { 'answer': 'David J. Peterson',\n",
|
|
" 'context': 'orld. The language was developed for the TV series by the '\n",
|
|
" 'linguist David J. Peterson, working off the Dothraki words '\n",
|
|
" \"and phrases in Martin's novels.\\n\"\n",
|
|
" ','},\n",
|
|
" { 'answer': 'Peterson',\n",
|
|
" 'context': \"e does not exist in the fictional world of ''A Song of Ice \"\n",
|
|
" \"and Fire'', Peterson chose to treat the similarity as \"\n",
|
|
" \"coincidental and made ''dracarys'' an\"},\n",
|
|
" { 'answer': 'Peterson',\n",
|
|
" 'context': 'ystem. Another word, \\'\\'trēsy\\'\\', meaning \"son\", was '\n",
|
|
" \"coined in honour of Peterson's 3000th Twitter follower.\\n\"\n",
|
|
" 'Peterson did not create a High Valyrian wri'}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print_answers(prediction, details=\"minimal\")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |