From a00703256fbed3780bede0595872c60930ab837c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mario=20J=C3=A4ckle?= Date: Fri, 30 Apr 2021 10:38:25 +0100 Subject: [PATCH] docs(document_store): add usage information for aws elastic search (#1008) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Mario Jäckle --- docs/_src/usage/usage/document_store.md | 30 +++++++++++++------------ 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/docs/_src/usage/usage/document_store.md b/docs/_src/usage/usage/document_store.md index ce132cbb2..43baecdb0 100644 --- a/docs/_src/usage/usage/document_store.md +++ b/docs/_src/usage/usage/document_store.md @@ -11,10 +11,10 @@ id: "documentstoremd" # DocumentStores You can think of the DocumentStore as a "database" that: -- stores your texts and meta data -- provides them to the retriever at query time +- stores your texts and meta data +- provides them to the retriever at query time -There are different DocumentStores in Haystack to fit different use cases and tech stacks. +There are different DocumentStores in Haystack to fit different use cases and tech stacks. ## Initialisation @@ -29,7 +29,7 @@ Initialising a new DocumentStore within Haystack is straight forward. [Install](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) Elasticsearch and then [start](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) -an instance. +an instance. If you have Docker set up, we recommend pulling the Docker image and running it. ```bash @@ -49,6 +49,8 @@ Note that we also support [Open Distro for Elasticsearch](https://opendistro.git Follow [their documentation](https://opendistro.github.io/for-elasticsearch-docs/docs/install/) to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentStore` class. +We further support [AWS Elastic Search Service](https://aws.amazon.com/elasticsearch-service/) with [signed Requests](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html): +Use e.g. [aws-requests-auth](https://github.com/davidmuller/aws-requests-auth) to create an auth object and pass it as `aws4auth` to the `ElasticsearchDocumentStore` constructor. @@ -59,7 +61,7 @@ to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentSto
Follow the [official documentation](https://www.milvus.io/docs/v1.0.0/milvus_docker-cpu.md) to start a Milvus instance via Docker - + You can initialize the Haystack object that will connect to this instance as follows: ```python from haystack.document_store import MilvusDocumentStore @@ -75,7 +77,7 @@ document_store = MilvusDocumentStore()
-The `FAISSDocumentStore` requires no external setup. Start it by simply using this line. +The `FAISSDocumentStore` requires no external setup. Start it by simply using this line. ```python from haystack.document_store import FAISSDocumentStore @@ -106,7 +108,7 @@ document_store = InMemoryDocumentStore()
The `SQLDocumentStore` requires SQLite, PostgresQL or MySQL to be installed and started. -Note that SQLite already comes packaged with most operating systems. +Note that SQLite already comes packaged with most operating systems. ```python from haystack.document_store import SQLDocumentStore @@ -174,7 +176,7 @@ Having GPU acceleration will significantly speed this up. ## Choosing the Right Document Store -The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment: +The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
@@ -183,13 +185,13 @@ The Document Stores have different characteristics. You should choose one depend
-**Pros:** +**Pros:** - Fast & accurate sparse retrieval with many tuning options - Basic support for dense retrieval - Production-ready - Support also for Open Distro -**Cons:** +**Cons:** - Slow for dense retrieval with more than ~ 1 Mio documents
@@ -200,7 +202,7 @@ The Document Stores have different characteristics. You should choose one depend
-**Pros:** +**Pros:** - Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR) - Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability - Runs as a separate service (e.g. a Docker container) @@ -217,7 +219,7 @@ The Document Stores have different characteristics. You should choose one depend
-**Pros:** +**Pros:** - Fast & accurate dense retrieval - Highly scalable due to approximate nearest neighbour algorithms (ANN) - Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index)) @@ -255,7 +257,7 @@ The Document Stores have different characteristics. You should choose one depend - No database requirements - Supports MySQL, PostgreSQL and SQLite -**Cons:** +**Cons:** - Not scalable - Not persisting your data on disk @@ -268,7 +270,7 @@ The Document Stores have different characteristics. You should choose one depend #### Our Recommendations -**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases +**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases **Allrounder:** Use the `ElasticSearchDocumentStore`, if you want to evaluate the performance of different retrieval options (dense vs. sparse) and are aiming for a smooth transition from PoC to production