mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-08-26 17:36:34 +00:00
docs(document_store): add usage information for aws elastic search (#1008)
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
This commit is contained in:
parent
37a72d2f45
commit
a00703256f
@ -11,10 +11,10 @@ id: "documentstoremd"
|
||||
# DocumentStores
|
||||
|
||||
You can think of the DocumentStore as a "database" that:
|
||||
- stores your texts and meta data
|
||||
- provides them to the retriever at query time
|
||||
- stores your texts and meta data
|
||||
- provides them to the retriever at query time
|
||||
|
||||
There are different DocumentStores in Haystack to fit different use cases and tech stacks.
|
||||
There are different DocumentStores in Haystack to fit different use cases and tech stacks.
|
||||
|
||||
## Initialisation
|
||||
|
||||
@ -29,7 +29,7 @@ Initialising a new DocumentStore within Haystack is straight forward.
|
||||
|
||||
[Install](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html)
|
||||
Elasticsearch and then [start](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html)
|
||||
an instance.
|
||||
an instance.
|
||||
|
||||
If you have Docker set up, we recommend pulling the Docker image and running it.
|
||||
```bash
|
||||
@ -49,6 +49,8 @@ Note that we also support [Open Distro for Elasticsearch](https://opendistro.git
|
||||
Follow [their documentation](https://opendistro.github.io/for-elasticsearch-docs/docs/install/)
|
||||
to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentStore` class.
|
||||
|
||||
We further support [AWS Elastic Search Service](https://aws.amazon.com/elasticsearch-service/) with [signed Requests](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html):
|
||||
Use e.g. [aws-requests-auth](https://github.com/davidmuller/aws-requests-auth) to create an auth object and pass it as `aws4auth` to the `ElasticsearchDocumentStore` constructor.
|
||||
|
||||
</div>
|
||||
</div>
|
||||
@ -59,7 +61,7 @@ to run it and connect to it using Haystack's `OpenDistroElasticsearchDocumentSto
|
||||
<div class="tabcontent">
|
||||
|
||||
Follow the [official documentation](https://www.milvus.io/docs/v1.0.0/milvus_docker-cpu.md) to start a Milvus instance via Docker
|
||||
|
||||
|
||||
You can initialize the Haystack object that will connect to this instance as follows:
|
||||
```python
|
||||
from haystack.document_store import MilvusDocumentStore
|
||||
@ -75,7 +77,7 @@ document_store = MilvusDocumentStore()
|
||||
<label class="labelouter" for="tab-1-3">FAISS</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
The `FAISSDocumentStore` requires no external setup. Start it by simply using this line.
|
||||
The `FAISSDocumentStore` requires no external setup. Start it by simply using this line.
|
||||
```python
|
||||
from haystack.document_store import FAISSDocumentStore
|
||||
|
||||
@ -106,7 +108,7 @@ document_store = InMemoryDocumentStore()
|
||||
<div class="tabcontent">
|
||||
|
||||
The `SQLDocumentStore` requires SQLite, PostgresQL or MySQL to be installed and started.
|
||||
Note that SQLite already comes packaged with most operating systems.
|
||||
Note that SQLite already comes packaged with most operating systems.
|
||||
|
||||
```python
|
||||
from haystack.document_store import SQLDocumentStore
|
||||
@ -174,7 +176,7 @@ Having GPU acceleration will significantly speed this up.
|
||||
<!-- _comment: !! Make this a tab element to show how different datastores are initialized !! -->
|
||||
## Choosing the Right Document Store
|
||||
|
||||
The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
|
||||
The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:
|
||||
|
||||
<div class="tabs tabsdschoose">
|
||||
|
||||
@ -183,13 +185,13 @@ The Document Stores have different characteristics. You should choose one depend
|
||||
<label class="labelouter" for="tab-2-1">Elasticsearch</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
**Pros:**
|
||||
- Fast & accurate sparse retrieval with many tuning options
|
||||
- Basic support for dense retrieval
|
||||
- Production-ready
|
||||
- Support also for Open Distro
|
||||
|
||||
**Cons:**
|
||||
**Cons:**
|
||||
- Slow for dense retrieval with more than ~ 1 Mio documents
|
||||
|
||||
</div>
|
||||
@ -200,7 +202,7 @@ The Document Stores have different characteristics. You should choose one depend
|
||||
<label class="labelouter" for="tab-2-2">Milvus</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
**Pros:**
|
||||
- Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR)
|
||||
- Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability
|
||||
- Runs as a separate service (e.g. a Docker container)
|
||||
@ -217,7 +219,7 @@ The Document Stores have different characteristics. You should choose one depend
|
||||
<label class="labelouter" for="tab-2-3">FAISS</label>
|
||||
<div class="tabcontent">
|
||||
|
||||
**Pros:**
|
||||
**Pros:**
|
||||
- Fast & accurate dense retrieval
|
||||
- Highly scalable due to approximate nearest neighbour algorithms (ANN)
|
||||
- Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index))
|
||||
@ -255,7 +257,7 @@ The Document Stores have different characteristics. You should choose one depend
|
||||
- No database requirements
|
||||
- Supports MySQL, PostgreSQL and SQLite
|
||||
|
||||
**Cons:**
|
||||
**Cons:**
|
||||
- Not scalable
|
||||
- Not persisting your data on disk
|
||||
|
||||
@ -268,7 +270,7 @@ The Document Stores have different characteristics. You should choose one depend
|
||||
|
||||
#### Our Recommendations
|
||||
|
||||
**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases
|
||||
**Restricted environment:** Use the `InMemoryDocumentStore`, if you are just giving Haystack a quick try on a small sample and are working in a restricted environment that complicates running Elasticsearch or other databases
|
||||
|
||||
**Allrounder:** Use the `ElasticSearchDocumentStore`, if you want to evaluate the performance of different retrieval options (dense vs. sparse) and are aiming for a smooth transition from PoC to production
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user