mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-08-26 17:36:34 +00:00
docs(document_store): add usage information for aws elastic search (#1008)
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
This commit is contained in:
parent
37a72d2f45
commit
a00703256f
@ -11,10 +11,10 @@ id: "documentstoremd"
|
|||||||
# DocumentStores
|
# DocumentStores
|
||||||
|
|
||||||
You can think of the DocumentStore as a "database" that:
|
You can think of the DocumentStore as a "database" that:
|
||||||
- stores your texts and meta data
|
- stores your texts and meta data
|
||||||
- provides them to the retriever at query time
|
- provides them to the retriever at query time
|
||||||
|
|
||||||
There are different DocumentStores in Haystack to fit different use cases and tech stacks.
|
There are different DocumentStores in Haystack to fit different use cases and tech stacks.
|
||||||
|
|
||||||
## Initialisation
|
## Initialisation
|
||||||
|
|
||||||
@ -29,7 +29,7 @@ Initialising a new DocumentStore within Haystack is straight forward.
|
|||||||
|
|
||||||
[Install](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html)
|
[Install](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html)
|
||||||
Elasticsearch and then [start](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html)
|
Elasticsearch and then [start](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html)
|
||||||
an instance.
|
an instance.
|
||||||
|
|
||||||
If you have Docker set up, we recommend pulling the Docker image and running it.
|
If you have Docker set up, we recommend pulling the Docker image and running it.
|
||||||
```bash
|
```bash
|
||||||
@ -49,6 +49,8 @@ Note that we also support [Open Distro for Elasticsearch](https://opendistro.git
|
|||||||
Follow [their documentation](https://opendistro.github.io/for-elasticsearch-docs/docs/install/)
|
Follow [their documentation](https://opendistro.github.io/for-elasticsearch-docs/docs/install/)
|
||||||
to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentStore` class.
|
to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentStore` class.
|
||||||
|
|
||||||
|
We further support [AWS Elastic Search Service](https://aws.amazon.com/elasticsearch-service/) with [signed Requests](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html):
|
||||||
|
Use e.g. [aws-requests-auth](https://github.com/davidmuller/aws-requests-auth) to create an auth object and pass it as `aws4auth` to the `ElasticsearchDocumentStore` constructor.
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -59,7 +61,7 @@ to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentSto
|
|||||||
<div class="tabcontent">
|
<div class="tabcontent">
|
||||||
|
|
||||||
Follow the [official documentation](https://www.milvus.io/docs/v1.0.0/milvus_docker-cpu.md) to start a Milvus instance via Docker
|
Follow the [official documentation](https://www.milvus.io/docs/v1.0.0/milvus_docker-cpu.md) to start a Milvus instance via Docker
|
||||||
|
|
||||||
You can initialize the Haystack object that will connect to this instance as follows:
|
You can initialize the Haystack object that will connect to this instance as follows:
|
||||||
```python
|
```python
|
||||||
from haystack.document_store import MilvusDocumentStore
|
from haystack.document_store import MilvusDocumentStore
|
||||||
@ -75,7 +77,7 @@ document_store = MilvusDocumentStore()
|
|||||||
<label class="labelouter" for="tab-1-3">FAISS</label>
|
<label class="labelouter" for="tab-1-3">FAISS</label>
|
||||||
<div class="tabcontent">
|
<div class="tabcontent">
|
||||||
|
|
||||||
The `FAISSDocumentStore` requires no external setup. Start it by simply using this line.
|
The `FAISSDocumentStore` requires no external setup. Start it by simply using this line.
|
||||||
```python
|
```python
|
||||||
from haystack.document_store import FAISSDocumentStore
|
from haystack.document_store import FAISSDocumentStore
|
||||||
|
|
||||||
@ -106,7 +108,7 @@ document_store = InMemoryDocumentStore()
|
|||||||
<div class="tabcontent">
|
<div class="tabcontent">
|
||||||
|
|
||||||
The `SQLDocumentStore` requires SQLite, PostgresQL or MySQL to be installed and started.
|
The `SQLDocumentStore` requires SQLite, PostgresQL or MySQL to be installed and started.
|
||||||
Note that SQLite already comes packaged with most operating systems.
|
Note that SQLite already comes packaged with most operating systems.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from haystack.document_store import SQLDocumentStore
|
from haystack.document_store import SQLDocumentStore
|
||||||
@ -174,7 +176,7 @@ Having GPU acceleration will significantly speed this up.
|
|||||||
<!-- _comment: !! Make this a tab element to show how different datastores are initialized !! -->
|
<!-- _comment: !! Make this a tab element to show how different datastores are initialized !! -->
|
||||||
## Choosing the Right Document Store
|
## Choosing the Right Document Store
|
||||||
|
|
||||||
The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
|
The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
|
||||||
|
|
||||||
<div class="tabs tabsdschoose">
|
<div class="tabs tabsdschoose">
|
||||||
|
|
||||||
@ -183,13 +185,13 @@ The Document Stores have different characteristics. You should choose one depend
|
|||||||
<label class="labelouter" for="tab-2-1">Elasticsearch</label>
|
<label class="labelouter" for="tab-2-1">Elasticsearch</label>
|
||||||
<div class="tabcontent">
|
<div class="tabcontent">
|
||||||
|
|
||||||
**Pros:**
|
**Pros:**
|
||||||
- Fast & accurate sparse retrieval with many tuning options
|
- Fast & accurate sparse retrieval with many tuning options
|
||||||
- Basic support for dense retrieval
|
- Basic support for dense retrieval
|
||||||
- Production-ready
|
- Production-ready
|
||||||
- Support also for Open Distro
|
- Support also for Open Distro
|
||||||
|
|
||||||
**Cons:**
|
**Cons:**
|
||||||
- Slow for dense retrieval with more than ~ 1 Mio documents
|
- Slow for dense retrieval with more than ~ 1 Mio documents
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
@ -200,7 +202,7 @@ The Document Stores have different characteristics. You should choose one depend
|
|||||||
<label class="labelouter" for="tab-2-2">Milvus</label>
|
<label class="labelouter" for="tab-2-2">Milvus</label>
|
||||||
<div class="tabcontent">
|
<div class="tabcontent">
|
||||||
|
|
||||||
**Pros:**
|
**Pros:**
|
||||||
- Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR)
|
- Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR)
|
||||||
- Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability
|
- Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability
|
||||||
- Runs as a separate service (e.g. a Docker container)
|
- Runs as a separate service (e.g. a Docker container)
|
||||||
@ -217,7 +219,7 @@ The Document Stores have different characteristics. You should choose one depend
|
|||||||
<label class="labelouter" for="tab-2-3">FAISS</label>
|
<label class="labelouter" for="tab-2-3">FAISS</label>
|
||||||
<div class="tabcontent">
|
<div class="tabcontent">
|
||||||
|
|
||||||
**Pros:**
|
**Pros:**
|
||||||
- Fast & accurate dense retrieval
|
- Fast & accurate dense retrieval
|
||||||
- Highly scalable due to approximate nearest neighbour algorithms (ANN)
|
- Highly scalable due to approximate nearest neighbour algorithms (ANN)
|
||||||
- Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index))
|
- Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index))
|
||||||
@ -255,7 +257,7 @@ The Document Stores have different characteristics. You should choose one depend
|
|||||||
- No database requirements
|
- No database requirements
|
||||||
- Supports MySQL, PostgreSQL and SQLite
|
- Supports MySQL, PostgreSQL and SQLite
|
||||||
|
|
||||||
**Cons:**
|
**Cons:**
|
||||||
- Not scalable
|
- Not scalable
|
||||||
- Not persisting your data on disk
|
- Not persisting your data on disk
|
||||||
|
|
||||||
@ -268,7 +270,7 @@ The Document Stores have different characteristics. You should choose one depend
|
|||||||
|
|
||||||
#### Our Recommendations
|
#### Our Recommendations
|
||||||
|
|
||||||
**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases
|
**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases
|
||||||
|
|
||||||
**Allrounder:** Use the `ElasticSearchDocumentStore`, if you want to evaluate the performance of different retrieval options (dense vs. sparse) and are aiming for a smooth transition from PoC to production
|
**Allrounder:** Use the `ElasticSearchDocumentStore`, if you want to evaluate the performance of different retrieval options (dense vs. sparse) and are aiming for a smooth transition from PoC to production
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user