From a00703256fbed3780bede0595872c60930ab837c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mario=20J=C3=A4ckle?= <github@jacksbox.de>
Date: Fri, 30 Apr 2021 10:38:25 +0100
Subject: [PATCH] docs(document_store): add usage information for aws elastic
 search (#1008)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
---
 docs/_src/usage/usage/document_store.md | 30 +++++++++++++------------
 1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/docs/_src/usage/usage/document_store.md b/docs/_src/usage/usage/document_store.md
index ce132cbb2..43baecdb0 100644
--- a/docs/_src/usage/usage/document_store.md
+++ b/docs/_src/usage/usage/document_store.md
@@ -11,10 +11,10 @@ id: "documentstoremd"
 # DocumentStores
 
 You can think of the DocumentStore as a "database" that:
-- stores your texts and meta data  
-- provides them to the retriever at query time 
+- stores your texts and meta data
+- provides them to the retriever at query time
 
-There are different DocumentStores in Haystack to fit different use cases and tech stacks. 
+There are different DocumentStores in Haystack to fit different use cases and tech stacks.
 
 ## Initialisation
 
@@ -29,7 +29,7 @@ Initialising a new DocumentStore within Haystack is straight forward.
 
 [Install](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html)
 Elasticsearch and then [start](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html)
-an instance. 
+an instance.
 
 If you have Docker set up, we recommend pulling the Docker image and running it.
 ```bash
@@ -49,6 +49,8 @@ Note that we also support [Open Distro for Elasticsearch](https://opendistro.git
 Follow [their documentation](https://opendistro.github.io/for-elasticsearch-docs/docs/install/)
 to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentStore` class.
 
+We further support [AWS Elastic Search Service](https://aws.amazon.com/elasticsearch-service/) with [signed Requests](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html):
+Use e.g. [aws-requests-auth](https://github.com/davidmuller/aws-requests-auth) to create an auth object and pass it as `aws4auth` to the `ElasticsearchDocumentStore` constructor.
 
 </div>
 </div>
@@ -59,7 +61,7 @@ to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentSto
 <div class="tabcontent">
 
 Follow the [official documentation](https://www.milvus.io/docs/v1.0.0/milvus_docker-cpu.md) to start a Milvus instance via Docker
- 
+
 You can initialize the Haystack object that will connect to this instance as follows:
 ```python
 from haystack.document_store import MilvusDocumentStore
@@ -75,7 +77,7 @@ document_store = MilvusDocumentStore()
 <label class="labelouter" for="tab-1-3">FAISS</label>
 <div class="tabcontent">
 
-The `FAISSDocumentStore` requires no external setup. Start it by simply using this line. 
+The `FAISSDocumentStore` requires no external setup. Start it by simply using this line.
 ```python
 from haystack.document_store import FAISSDocumentStore
 
@@ -106,7 +108,7 @@ document_store = InMemoryDocumentStore()
 <div class="tabcontent">
 
 The `SQLDocumentStore` requires SQLite, PostgresQL or MySQL to be installed and started.
-Note that SQLite already comes packaged with most operating systems. 
+Note that SQLite already comes packaged with most operating systems.
 
 ```python
 from haystack.document_store import SQLDocumentStore
@@ -174,7 +176,7 @@ Having GPU acceleration will significantly speed this up.
 <!-- _comment: !! Make this a tab element to show how different datastores are initialized !! -->
 ## Choosing the Right Document Store
 
-The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment: 
+The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
 
 <div class="tabs tabsdschoose">
 
@@ -183,13 +185,13 @@ The Document Stores have different characteristics. You should choose one depend
 <label class="labelouter" for="tab-2-1">Elasticsearch</label>
 <div class="tabcontent">
 
-**Pros:** 
+**Pros:**
 - Fast & accurate sparse retrieval with many tuning options
 - Basic support for dense retrieval
 - Production-ready
 - Support also for Open Distro
 
-**Cons:** 
+**Cons:**
 - Slow for dense retrieval with more than ~ 1 Mio documents
 
 </div>
@@ -200,7 +202,7 @@ The Document Stores have different characteristics. You should choose one depend
 <label class="labelouter" for="tab-2-2">Milvus</label>
 <div class="tabcontent">
 
-**Pros:** 
+**Pros:**
 - Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR)
 - Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability
 - Runs as a separate service (e.g. a Docker container)
@@ -217,7 +219,7 @@ The Document Stores have different characteristics. You should choose one depend
 <label class="labelouter" for="tab-2-3">FAISS</label>
 <div class="tabcontent">
 
-**Pros:** 
+**Pros:**
 - Fast & accurate dense retrieval
 - Highly scalable due to approximate nearest neighbour algorithms (ANN)
 - Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index))
@@ -255,7 +257,7 @@ The Document Stores have different characteristics. You should choose one depend
 - No database requirements
 - Supports MySQL, PostgreSQL and SQLite
 
-**Cons:** 
+**Cons:**
 - Not scalable
 - Not persisting your data on disk
 
@@ -268,7 +270,7 @@ The Document Stores have different characteristics. You should choose one depend
 
 #### Our Recommendations
 
-**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases  
+**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases
 
 **Allrounder:** Use the `ElasticSearchDocumentStore`, if you want to evaluate the performance of different retrieval options (dense vs. sparse) and are aiming for a smooth transition from PoC to production