update tutorials

This commit is contained in:
Malte Pietsch 2020-04-30 19:00:41 +02:00
parent 92429a40e6
commit 7972038afc
2 changed files with 28 additions and 15 deletions

View File

@ -56,9 +56,10 @@
"Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`.\n",
"\n",
"**Here:** We recommended Elasticsearch as it comes preloaded with features like [full-text queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html), [BM25 retrieval](https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25), and [vector storage for text embeddings](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dense-vector.html).\n",
"\n",
"**Alternatives:** If you are unable to setup an Elasticsearch instance, then follow the [Tutorial 3](https://github.com/deepset-ai/haystack/blob/master/tutorials/Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb) for using SQL/InMemory document stores.\n",
"**Hint**:\n",
"This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.\n",
"\n",
"**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.\n",
"\n",
"### Start an Elasticsearch server\n",
"You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (eg., in Colab notebooks), then you can manually download and execute Elasticsearch from source."
@ -70,8 +71,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Start Elasticsearch using Docker\n",
"! docker run -d -p 9200:9200 -e \"discovery.type=single-node\" elasticsearch:7.6.2"
"# Recommended: Start Elasticsearch using Docker\n",
"# ! docker run -d -p 9200:9200 -e \"discovery.type=single-node\" elasticsearch:7.6.2"
]
},
{
@ -80,18 +81,19 @@
"metadata": {},
"outputs": [],
"source": [
"# Start Elasticsearch from source (ONLY REQUIRED IN COLAB)\n",
"# In Colab / No Docker environments: Start Elasticsearch from source\n",
"! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz -q\n",
"! tar -xzf elasticsearch-7.6.2-linux-x86_64.tar.gz\n",
"! chown -R daemon:daemon elasticsearch-7.6.2\n",
"\n",
"# !wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz -q\n",
"# !tar -xzf elasticsearch-7.6.2-linux-x86_64.tar.gz\n",
"# !chown -R daemon:daemon elasticsearch-7.6.2\n",
"\n",
"# import os\n",
"# from subprocess import Popen, PIPE, STDOUT\n",
"# es_server = Popen(['elasticsearch-7.0.0/bin/elasticsearch'], \n",
"# stdout=PIPE, stderr=STDOUT,\n",
"# preexec_fn=lambda: os.setuid(1) # as daemon\n",
"# )"
"import os\n",
"from subprocess import Popen, PIPE, STDOUT\n",
"es_server = Popen(['elasticsearch-7.6.2/bin/elasticsearch'],\n",
" stdout=PIPE, stderr=STDOUT,\n",
" preexec_fn=lambda: os.setuid(1) # as daemon\n",
" )\n",
"# wait until ES has started\n",
"! sleep 30"
]
},
{
@ -178,8 +180,11 @@
"\n",
"Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered.\n",
"They use some simple but fast algorithm.\n",
"\n",
"**Here:** We use Elasticsearch's default BM25 algorithm\n",
"\n",
"**Alternatives:**\n",
"\n",
"- Customize the `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters\n",
"- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)\n",
"- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging"
@ -223,9 +228,13 @@
"\n",
"Haystack currently supports Readers based on the frameworks FARM and Transformers.\n",
"With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models).\n",
"\n",
"**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2)\n",
"\n",
"**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package)\n",
"\n",
"**Alternatives (Models):** e.g. \"distilbert-base-uncased-distilled-squad\" (fast) or \"deepset/bert-large-uncased-whole-word-masking-squad2\" (good accuracy)\n",
"\n",
"**Hint:** You can adjust the model to return \"no answer possible\" with the no_ans_boost. Higher values mean the model prefers \"no answer possible\"\n",
"\n",
"#### FARMReader"

View File

@ -165,9 +165,13 @@
"\n",
"Haystack currently supports Readers based on the frameworks FARM and Transformers.\n",
"With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models).\n",
"\n",
"**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2)\n",
"\n",
"**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package)\n",
"\n",
"**Alternatives (Models):** e.g. \"distilbert-base-uncased-distilled-squad\" (fast) or \"deepset/bert-large-uncased-whole-word-masking-squad2\" (good accuracy)\n",
"\n",
"**Hint:** You can adjust the model to return \"no answer possible\" with the no_ans_boost. Higher values mean the model prefers \"no answer possible\"\n",
"\n",
"#### FARMReader"