mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-19 11:08:40 +00:00
1000 lines
36 KiB
Markdown
1000 lines
36 KiB
Markdown
|
|
---
|
||
|
|
title: "OpenSearch"
|
||
|
|
id: integrations-opensearch
|
||
|
|
description: "OpenSearch integration for Haystack"
|
||
|
|
slug: "/integrations-opensearch"
|
||
|
|
---
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever"></a>
|
||
|
|
|
||
|
|
# Module haystack\_integrations.components.retrievers.opensearch.bm25\_retriever
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever"></a>
|
||
|
|
|
||
|
|
## OpenSearchBM25Retriever
|
||
|
|
|
||
|
|
Fetches documents from OpenSearchDocumentStore using the keyword-based BM25 algorithm.
|
||
|
|
|
||
|
|
BM25 computes a weighted word overlap between the query string and a document to determine its similarity.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever.__init__"></a>
|
||
|
|
|
||
|
|
#### OpenSearchBM25Retriever.\_\_init\_\_
|
||
|
|
|
||
|
|
```python
|
||
|
|
def __init__(*,
|
||
|
|
document_store: OpenSearchDocumentStore,
|
||
|
|
filters: Optional[Dict[str, Any]] = None,
|
||
|
|
fuzziness: Union[int, str] = "AUTO",
|
||
|
|
top_k: int = 10,
|
||
|
|
scale_score: bool = False,
|
||
|
|
all_terms_must_match: bool = False,
|
||
|
|
filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE,
|
||
|
|
custom_query: Optional[Dict[str, Any]] = None,
|
||
|
|
raise_on_failure: bool = True)
|
||
|
|
```
|
||
|
|
|
||
|
|
Creates the OpenSearchBM25Retriever component.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `document_store`: An instance of OpenSearchDocumentStore to use with the Retriever.
|
||
|
|
- `filters`: Filters to narrow down the search for documents in the Document Store.
|
||
|
|
- `fuzziness`: Determines how approximate string matching is applied in full-text queries.
|
||
|
|
This parameter sets the number of character edits (insertions, deletions, or substitutions)
|
||
|
|
required to transform one word into another. For example, the "fuzziness" between the words
|
||
|
|
"wined" and "wind" is 1 because only one edit is needed to match them.
|
||
|
|
|
||
|
|
Use "AUTO" (the default) for automatic adjustment based on term length, which is optimal for
|
||
|
|
most scenarios. For detailed guidance, refer to the
|
||
|
|
[OpenSearch fuzzy query documentation](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/).
|
||
|
|
- `top_k`: Maximum number of documents to return.
|
||
|
|
- `scale_score`: If `True`, scales the score of retrieved documents to a range between 0 and 1.
|
||
|
|
This is useful when comparing documents across different indexes.
|
||
|
|
- `all_terms_must_match`: If `True`, all terms in the query string must be present in the
|
||
|
|
retrieved documents. This is useful when searching for short text where even one term
|
||
|
|
can make a difference.
|
||
|
|
- `filter_policy`: Policy to determine how filters are applied. Possible options:
|
||
|
|
- `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope
|
||
|
|
for specific queries.
|
||
|
|
- `merge`: Runtime filters are merged with initialization filters.
|
||
|
|
- `custom_query`: The query containing a mandatory `$query` and an optional `$filters` placeholder.
|
||
|
|
**An example custom_query:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"should": [{"multi_match": {
|
||
|
|
"query": "$query", // mandatory query placeholder
|
||
|
|
"type": "most_fields",
|
||
|
|
"fields": ["content", "title"]}}],
|
||
|
|
"filter": "$filters" // optional filter placeholder
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
An example `run()` method for this `custom_query`:
|
||
|
|
|
||
|
|
```python
|
||
|
|
retriever.run(
|
||
|
|
query="Why did the revenue increase?",
|
||
|
|
filters={
|
||
|
|
"operator": "AND",
|
||
|
|
"conditions": [
|
||
|
|
{"field": "meta.years", "operator": "==", "value": "2019"},
|
||
|
|
{"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
|
||
|
|
],
|
||
|
|
},
|
||
|
|
)
|
||
|
|
```
|
||
|
|
- `raise_on_failure`: Whether to raise an exception if the API call fails. Otherwise log a warning and return an empty list.
|
||
|
|
|
||
|
|
**Raises**:
|
||
|
|
|
||
|
|
- `ValueError`: If `document_store` is not an instance of OpenSearchDocumentStore.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever.to_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchBM25Retriever.to\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
def to_dict() -> Dict[str, Any]
|
||
|
|
```
|
||
|
|
|
||
|
|
Serializes the component to a dictionary.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Dictionary with serialized data.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever.from_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchBM25Retriever.from\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
@classmethod
|
||
|
|
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchBM25Retriever"
|
||
|
|
```
|
||
|
|
|
||
|
|
Deserializes the component from a dictionary.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `data`: Dictionary to deserialize from.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Deserialized component.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever.run"></a>
|
||
|
|
|
||
|
|
#### OpenSearchBM25Retriever.run
|
||
|
|
|
||
|
|
```python
|
||
|
|
@component.output_types(documents=List[Document])
|
||
|
|
def run(
|
||
|
|
query: str,
|
||
|
|
filters: Optional[Dict[str, Any]] = None,
|
||
|
|
all_terms_must_match: Optional[bool] = None,
|
||
|
|
top_k: Optional[int] = None,
|
||
|
|
fuzziness: Optional[Union[int, str]] = None,
|
||
|
|
scale_score: Optional[bool] = None,
|
||
|
|
custom_query: Optional[Dict[str, Any]] = None,
|
||
|
|
document_store: Optional[OpenSearchDocumentStore] = None
|
||
|
|
) -> Dict[str, List[Document]]
|
||
|
|
```
|
||
|
|
|
||
|
|
Retrieve documents using BM25 retrieval.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `query`: The query string.
|
||
|
|
- `filters`: Filters applied to the retrieved documents. The way runtime filters are applied depends on
|
||
|
|
the `filter_policy` specified at Retriever's initialization.
|
||
|
|
- `all_terms_must_match`: If `True`, all terms in the query string must be present in the
|
||
|
|
retrieved documents.
|
||
|
|
- `top_k`: Maximum number of documents to return.
|
||
|
|
- `fuzziness`: Fuzziness parameter for full-text queries to apply approximate string matching.
|
||
|
|
For more information, see [OpenSearch fuzzy query](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/).
|
||
|
|
- `scale_score`: If `True`, scales the score of retrieved documents to a range between 0 and 1.
|
||
|
|
This is useful when comparing documents across different indexes.
|
||
|
|
- `custom_query`: A custom OpenSearch query. It must include a `$query` and may optionally
|
||
|
|
include a `$filters` placeholder.
|
||
|
|
|
||
|
|
**An example custom_query:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"should": [{"multi_match": {
|
||
|
|
"query": "$query", // mandatory query placeholder
|
||
|
|
"type": "most_fields",
|
||
|
|
"fields": ["content", "title"]}}],
|
||
|
|
"filter": "$filters" // optional filter placeholder
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**For this custom_query, a sample `run()` could be:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
retriever.run(
|
||
|
|
query="Why did the revenue increase?",
|
||
|
|
filters={
|
||
|
|
"operator": "AND",
|
||
|
|
"conditions": [
|
||
|
|
{"field": "meta.years", "operator": "==", "value": "2019"},
|
||
|
|
{"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
|
||
|
|
],
|
||
|
|
},
|
||
|
|
)
|
||
|
|
```
|
||
|
|
- `document_store`: Optionally, an instance of OpenSearchDocumentStore to use with the Retriever
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
A dictionary containing the retrieved documents with the following structure:
|
||
|
|
- documents: List of retrieved Documents.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever.run_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchBM25Retriever.run\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
@component.output_types(documents=List[Document])
|
||
|
|
async def run_async(
|
||
|
|
query: str,
|
||
|
|
filters: Optional[Dict[str, Any]] = None,
|
||
|
|
all_terms_must_match: Optional[bool] = None,
|
||
|
|
top_k: Optional[int] = None,
|
||
|
|
fuzziness: Optional[Union[int, str]] = None,
|
||
|
|
scale_score: Optional[bool] = None,
|
||
|
|
custom_query: Optional[Dict[str, Any]] = None,
|
||
|
|
document_store: Optional[OpenSearchDocumentStore] = None
|
||
|
|
) -> Dict[str, List[Document]]
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously retrieve documents using BM25 retrieval.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `query`: The query string.
|
||
|
|
- `filters`: Filters applied to the retrieved documents. The way runtime filters are applied depends on
|
||
|
|
the `filter_policy` specified at Retriever's initialization.
|
||
|
|
- `all_terms_must_match`: If `True`, all terms in the query string must be present in the
|
||
|
|
retrieved documents.
|
||
|
|
- `top_k`: Maximum number of documents to return.
|
||
|
|
- `fuzziness`: Fuzziness parameter for full-text queries to apply approximate string matching.
|
||
|
|
For more information, see [OpenSearch fuzzy query](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/).
|
||
|
|
- `scale_score`: If `True`, scales the score of retrieved documents to a range between 0 and 1.
|
||
|
|
This is useful when comparing documents across different indexes.
|
||
|
|
- `custom_query`: A custom OpenSearch query. It must include a `$query` and may optionally
|
||
|
|
include a `$filters` placeholder.
|
||
|
|
- `document_store`: Optionally, an instance of OpenSearchDocumentStore to use with the Retriever
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
A dictionary containing the retrieved documents with the following structure:
|
||
|
|
- documents: List of retrieved Documents.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever"></a>
|
||
|
|
|
||
|
|
# Module haystack\_integrations.components.retrievers.opensearch.embedding\_retriever
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever"></a>
|
||
|
|
|
||
|
|
## OpenSearchEmbeddingRetriever
|
||
|
|
|
||
|
|
Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric.
|
||
|
|
|
||
|
|
Must be connected to the OpenSearchDocumentStore to run.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever.__init__"></a>
|
||
|
|
|
||
|
|
#### OpenSearchEmbeddingRetriever.\_\_init\_\_
|
||
|
|
|
||
|
|
```python
|
||
|
|
def __init__(*,
|
||
|
|
document_store: OpenSearchDocumentStore,
|
||
|
|
filters: Optional[Dict[str, Any]] = None,
|
||
|
|
top_k: int = 10,
|
||
|
|
filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE,
|
||
|
|
custom_query: Optional[Dict[str, Any]] = None,
|
||
|
|
raise_on_failure: bool = True,
|
||
|
|
efficient_filtering: bool = False)
|
||
|
|
```
|
||
|
|
|
||
|
|
Create the OpenSearchEmbeddingRetriever component.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `document_store`: An instance of OpenSearchDocumentStore to use with the Retriever.
|
||
|
|
- `filters`: Filters applied when fetching documents from the Document Store.
|
||
|
|
Filters are applied during the approximate kNN search to ensure the Retriever returns
|
||
|
|
`top_k` matching documents.
|
||
|
|
- `top_k`: Maximum number of documents to return.
|
||
|
|
- `filter_policy`: Policy to determine how filters are applied. Possible options:
|
||
|
|
- `merge`: Runtime filters are merged with initialization filters.
|
||
|
|
- `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope.
|
||
|
|
- `custom_query`: The custom OpenSearch query containing a mandatory `$query_embedding` and
|
||
|
|
an optional `$filters` placeholder.
|
||
|
|
|
||
|
|
**An example custom_query:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"must": [
|
||
|
|
{
|
||
|
|
"knn": {
|
||
|
|
"embedding": {
|
||
|
|
"vector": "$query_embedding", // mandatory query placeholder
|
||
|
|
"k": 10000,
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
],
|
||
|
|
"filter": "$filters" // optional filter placeholder
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
For this `custom_query`, an example `run()` could be:
|
||
|
|
|
||
|
|
```python
|
||
|
|
retriever.run(
|
||
|
|
query_embedding=embedding,
|
||
|
|
filters={
|
||
|
|
"operator": "AND",
|
||
|
|
"conditions": [
|
||
|
|
{"field": "meta.years", "operator": "==", "value": "2019"},
|
||
|
|
{"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
|
||
|
|
],
|
||
|
|
},
|
||
|
|
)
|
||
|
|
```
|
||
|
|
- `raise_on_failure`: If `True`, raises an exception if the API call fails.
|
||
|
|
If `False`, logs a warning and returns an empty list.
|
||
|
|
- `efficient_filtering`: If `True`, the filter will be applied during the approximate kNN search.
|
||
|
|
This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".
|
||
|
|
|
||
|
|
**Raises**:
|
||
|
|
|
||
|
|
- `ValueError`: If `document_store` is not an instance of OpenSearchDocumentStore.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever.to_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchEmbeddingRetriever.to\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
def to_dict() -> Dict[str, Any]
|
||
|
|
```
|
||
|
|
|
||
|
|
Serializes the component to a dictionary.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Dictionary with serialized data.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever.from_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchEmbeddingRetriever.from\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
@classmethod
|
||
|
|
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchEmbeddingRetriever"
|
||
|
|
```
|
||
|
|
|
||
|
|
Deserializes the component from a dictionary.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `data`: Dictionary to deserialize from.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Deserialized component.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever.run"></a>
|
||
|
|
|
||
|
|
#### OpenSearchEmbeddingRetriever.run
|
||
|
|
|
||
|
|
```python
|
||
|
|
@component.output_types(documents=List[Document])
|
||
|
|
def run(
|
||
|
|
query_embedding: List[float],
|
||
|
|
filters: Optional[Dict[str, Any]] = None,
|
||
|
|
top_k: Optional[int] = None,
|
||
|
|
custom_query: Optional[Dict[str, Any]] = None,
|
||
|
|
efficient_filtering: Optional[bool] = None,
|
||
|
|
document_store: Optional[OpenSearchDocumentStore] = None
|
||
|
|
) -> Dict[str, List[Document]]
|
||
|
|
```
|
||
|
|
|
||
|
|
Retrieve documents using a vector similarity metric.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `query_embedding`: Embedding of the query.
|
||
|
|
- `filters`: Filters applied when fetching documents from the Document Store.
|
||
|
|
Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching
|
||
|
|
documents.
|
||
|
|
The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
|
||
|
|
- `top_k`: Maximum number of documents to return.
|
||
|
|
- `custom_query`: A custom OpenSearch query containing a mandatory `$query_embedding` and an
|
||
|
|
optional `$filters` placeholder.
|
||
|
|
|
||
|
|
**An example custom_query:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"must": [
|
||
|
|
{
|
||
|
|
"knn": {
|
||
|
|
"embedding": {
|
||
|
|
"vector": "$query_embedding", // mandatory query placeholder
|
||
|
|
"k": 10000,
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
],
|
||
|
|
"filter": "$filters" // optional filter placeholder
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
For this `custom_query`, an example `run()` could be:
|
||
|
|
|
||
|
|
```python
|
||
|
|
retriever.run(
|
||
|
|
query_embedding=embedding,
|
||
|
|
filters={
|
||
|
|
"operator": "AND",
|
||
|
|
"conditions": [
|
||
|
|
{"field": "meta.years", "operator": "==", "value": "2019"},
|
||
|
|
{"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
|
||
|
|
],
|
||
|
|
},
|
||
|
|
)
|
||
|
|
```
|
||
|
|
- `efficient_filtering`: If `True`, the filter will be applied during the approximate kNN search.
|
||
|
|
This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".
|
||
|
|
- `document_store`: Optional instance of OpenSearchDocumentStore to use with the Retriever.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Dictionary with key "documents" containing the retrieved Documents.
|
||
|
|
- documents: List of Document similar to `query_embedding`.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever.run_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchEmbeddingRetriever.run\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
@component.output_types(documents=List[Document])
|
||
|
|
async def run_async(
|
||
|
|
query_embedding: List[float],
|
||
|
|
filters: Optional[Dict[str, Any]] = None,
|
||
|
|
top_k: Optional[int] = None,
|
||
|
|
custom_query: Optional[Dict[str, Any]] = None,
|
||
|
|
efficient_filtering: Optional[bool] = None,
|
||
|
|
document_store: Optional[OpenSearchDocumentStore] = None
|
||
|
|
) -> Dict[str, List[Document]]
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously retrieve documents using a vector similarity metric.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `query_embedding`: Embedding of the query.
|
||
|
|
- `filters`: Filters applied when fetching documents from the Document Store.
|
||
|
|
Filters are applied during the approximate kNN search to ensure the Retriever
|
||
|
|
returns `top_k` matching documents.
|
||
|
|
The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
|
||
|
|
- `top_k`: Maximum number of documents to return.
|
||
|
|
- `custom_query`: A custom OpenSearch query containing a mandatory `$query_embedding` and an
|
||
|
|
optional `$filters` placeholder.
|
||
|
|
|
||
|
|
**An example custom_query:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"must": [
|
||
|
|
{
|
||
|
|
"knn": {
|
||
|
|
"embedding": {
|
||
|
|
"vector": "$query_embedding", // mandatory query placeholder
|
||
|
|
"k": 10000,
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
],
|
||
|
|
"filter": "$filters" // optional filter placeholder
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
For this `custom_query`, an example `run()` could be:
|
||
|
|
|
||
|
|
```python
|
||
|
|
retriever.run(
|
||
|
|
query_embedding=embedding,
|
||
|
|
filters={
|
||
|
|
"operator": "AND",
|
||
|
|
"conditions": [
|
||
|
|
{"field": "meta.years", "operator": "==", "value": "2019"},
|
||
|
|
{"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
|
||
|
|
],
|
||
|
|
},
|
||
|
|
)
|
||
|
|
```
|
||
|
|
- `efficient_filtering`: If `True`, the filter will be applied during the approximate kNN search.
|
||
|
|
This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".
|
||
|
|
- `document_store`: Optional instance of OpenSearchDocumentStore to use with the Retriever.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Dictionary with key "documents" containing the retrieved Documents.
|
||
|
|
- documents: List of Document similar to `query_embedding`.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever"></a>
|
||
|
|
|
||
|
|
# Module haystack\_integrations.components.retrievers.opensearch.open\_search\_hybrid\_retriever
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever"></a>
|
||
|
|
|
||
|
|
## OpenSearchHybridRetriever
|
||
|
|
|
||
|
|
A hybrid retriever that combines embedding-based and keyword-based retrieval from OpenSearch.
|
||
|
|
|
||
|
|
Example usage:
|
||
|
|
|
||
|
|
Make sure you have "sentence-transformers>=3.0.0":
|
||
|
|
|
||
|
|
pip install haystack-ai datasets "sentence-transformers>=3.0.0"
|
||
|
|
|
||
|
|
|
||
|
|
And OpenSearch running. You can run OpenSearch with Docker:
|
||
|
|
|
||
|
|
docker run -d --name opensearch-nosec -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node"
|
||
|
|
-e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:2.12.0
|
||
|
|
|
||
|
|
```python
|
||
|
|
from haystack import Document
|
||
|
|
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
|
||
|
|
from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever
|
||
|
|
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
|
||
|
|
|
||
|
|
# Initialize the document store
|
||
|
|
doc_store = OpenSearchDocumentStore(
|
||
|
|
hosts=["<http://localhost:9200>"],
|
||
|
|
index="document_store",
|
||
|
|
embedding_dim=384,
|
||
|
|
)
|
||
|
|
|
||
|
|
# Create some sample documents
|
||
|
|
docs = [
|
||
|
|
Document(content="Machine learning is a subset of artificial intelligence."),
|
||
|
|
Document(content="Deep learning is a subset of machine learning."),
|
||
|
|
Document(content="Natural language processing is a field of AI."),
|
||
|
|
Document(content="Reinforcement learning is a type of machine learning."),
|
||
|
|
Document(content="Supervised learning is a type of machine learning."),
|
||
|
|
]
|
||
|
|
|
||
|
|
# Embed the documents and add them to the document store
|
||
|
|
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
|
||
|
|
doc_embedder.warm_up()
|
||
|
|
docs = doc_embedder.run(docs)
|
||
|
|
doc_store.write_documents(docs['documents'])
|
||
|
|
|
||
|
|
# Initialize some haystack text embedder, in this case the SentenceTransformersTextEmbedder
|
||
|
|
embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
|
||
|
|
|
||
|
|
# Initialize the hybrid retriever
|
||
|
|
retriever = OpenSearchHybridRetriever(
|
||
|
|
document_store=doc_store,
|
||
|
|
embedder=embedder,
|
||
|
|
top_k_bm25=3,
|
||
|
|
top_k_embedding=3,
|
||
|
|
join_mode="reciprocal_rank_fusion"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Run the retriever
|
||
|
|
results = retriever.run(query="What is reinforcement learning?", filters_bm25=None, filters_embedding=None)
|
||
|
|
|
||
|
|
>> results['documents']
|
||
|
|
{'documents': [Document(id=..., content: 'Reinforcement learning is a type of machine learning.', score: 1.0),
|
||
|
|
Document(id=..., content: 'Supervised learning is a type of machine learning.', score: 0.9760624679979518),
|
||
|
|
Document(id=..., content: 'Deep learning is a subset of machine learning.', score: 0.4919354838709677),
|
||
|
|
Document(id=..., content: 'Machine learning is a subset of artificial intelligence.', score: 0.4841269841269841)]}
|
||
|
|
```
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever.__init__"></a>
|
||
|
|
|
||
|
|
#### OpenSearchHybridRetriever.\_\_init\_\_
|
||
|
|
|
||
|
|
```python
|
||
|
|
def __init__(document_store: OpenSearchDocumentStore,
|
||
|
|
*,
|
||
|
|
embedder: TextEmbedder,
|
||
|
|
filters_bm25: Optional[Dict[str, Any]] = None,
|
||
|
|
fuzziness: Union[int, str] = "AUTO",
|
||
|
|
top_k_bm25: int = 10,
|
||
|
|
scale_score: bool = False,
|
||
|
|
all_terms_must_match: bool = False,
|
||
|
|
filter_policy_bm25: Union[str,
|
||
|
|
FilterPolicy] = FilterPolicy.REPLACE,
|
||
|
|
custom_query_bm25: Optional[Dict[str, Any]] = None,
|
||
|
|
filters_embedding: Optional[Dict[str, Any]] = None,
|
||
|
|
top_k_embedding: int = 10,
|
||
|
|
filter_policy_embedding: Union[
|
||
|
|
str, FilterPolicy] = FilterPolicy.REPLACE,
|
||
|
|
custom_query_embedding: Optional[Dict[str, Any]] = None,
|
||
|
|
join_mode: Union[str, JoinMode] = JoinMode.RECIPROCAL_RANK_FUSION,
|
||
|
|
weights: Optional[List[float]] = None,
|
||
|
|
top_k: Optional[int] = None,
|
||
|
|
sort_by_score: bool = True,
|
||
|
|
**kwargs: Any) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Initialize the OpenSearchHybridRetriever, a super component to retrieve documents from OpenSearch using
|
||
|
|
|
||
|
|
both embedding-based and keyword-based retrieval methods.
|
||
|
|
|
||
|
|
We don't explicitly define all the init parameters of the components in the constructor, for each
|
||
|
|
of the components, since that would be around 20+ parameters. Instead, we define the most important ones
|
||
|
|
and pass the rest as kwargs. This is to keep the constructor clean and easy to read.
|
||
|
|
|
||
|
|
If you need to pass extra parameters to the components, you can do so by passing them as kwargs. It expects
|
||
|
|
a dictionary with the component name as the key and the parameters as the value. The component name should be:
|
||
|
|
|
||
|
|
- "bm25_retriever" -> OpenSearchBM25Retriever
|
||
|
|
- "embedding_retriever" -> OpenSearchEmbeddingRetriever
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `document_store`: The OpenSearchDocumentStore to use for retrieval.
|
||
|
|
- `embedder`: A TextEmbedder to use for embedding the query.
|
||
|
|
See `haystack.components.embedders.types.protocol.TextEmbedder` for more information.
|
||
|
|
- `filters_bm25`: Filters for the BM25 retriever.
|
||
|
|
- `fuzziness`: The fuzziness for the BM25 retriever.
|
||
|
|
- `top_k_bm25`: The number of results to return from the BM25 retriever.
|
||
|
|
- `scale_score`: Whether to scale the score for the BM25 retriever.
|
||
|
|
- `all_terms_must_match`: Whether all terms must match for the BM25 retriever.
|
||
|
|
- `filter_policy_bm25`: The filter policy for the BM25 retriever.
|
||
|
|
- `custom_query_bm25`: A custom query for the BM25 retriever.
|
||
|
|
- `filters_embedding`: Filters for the embedding retriever.
|
||
|
|
- `top_k_embedding`: The number of results to return from the embedding retriever.
|
||
|
|
- `filter_policy_embedding`: The filter policy for the embedding retriever.
|
||
|
|
- `custom_query_embedding`: A custom query for the embedding retriever.
|
||
|
|
- `join_mode`: The mode to use for joining the results from the BM25 and embedding retrievers.
|
||
|
|
- `weights`: The weights for the joiner.
|
||
|
|
- `top_k`: The number of results to return from the joiner.
|
||
|
|
- `sort_by_score`: Whether to sort the results by score.
|
||
|
|
- `**kwargs`: Additional keyword arguments. Use the following keys to pass extra parameters to the retrievers:
|
||
|
|
- "bm25_retriever" -> OpenSearchBM25Retriever
|
||
|
|
- "embedding_retriever" -> OpenSearchEmbeddingRetriever
|
||
|
|
|
||
|
|
<a id="haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever.to_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchHybridRetriever.to\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
def to_dict()
|
||
|
|
```
|
||
|
|
|
||
|
|
Serialize OpenSearchHybridRetriever to a dictionary.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Dictionary with serialized data.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store"></a>
|
||
|
|
|
||
|
|
# Module haystack\_integrations.document\_stores.opensearch.document\_store
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore"></a>
|
||
|
|
|
||
|
|
## OpenSearchDocumentStore
|
||
|
|
|
||
|
|
An instance of an OpenSearch database you can use to store all types of data.
|
||
|
|
|
||
|
|
This document store is a thin wrapper around the OpenSearch client.
|
||
|
|
It allows you to store and retrieve documents from an OpenSearch index.
|
||
|
|
|
||
|
|
Usage example:
|
||
|
|
```python
|
||
|
|
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
|
||
|
|
from haystack import Document
|
||
|
|
|
||
|
|
document_store = OpenSearchDocumentStore(hosts="localhost:9200")
|
||
|
|
|
||
|
|
document_store.write_documents(
|
||
|
|
[
|
||
|
|
Document(content="My first document", id="1"),
|
||
|
|
Document(content="My second document", id="2"),
|
||
|
|
]
|
||
|
|
)
|
||
|
|
|
||
|
|
print(document_store.count_documents())
|
||
|
|
# 2
|
||
|
|
|
||
|
|
print(document_store.filter_documents())
|
||
|
|
# [Document(id='1', content='My first document', ...), Document(id='2', content='My second document', ...)]
|
||
|
|
```
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.__init__"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.\_\_init\_\_
|
||
|
|
|
||
|
|
```python
|
||
|
|
def __init__(
|
||
|
|
*,
|
||
|
|
hosts: Optional[Hosts] = None,
|
||
|
|
index: str = "default",
|
||
|
|
max_chunk_bytes: int = DEFAULT_MAX_CHUNK_BYTES,
|
||
|
|
embedding_dim: int = 768,
|
||
|
|
return_embedding: bool = False,
|
||
|
|
method: Optional[Dict[str, Any]] = None,
|
||
|
|
mappings: Optional[Dict[str, Any]] = None,
|
||
|
|
settings: Optional[Dict[str, Any]] = DEFAULT_SETTINGS,
|
||
|
|
create_index: bool = True,
|
||
|
|
http_auth: Any = (
|
||
|
|
Secret.from_env_var("OPENSEARCH_USERNAME",
|
||
|
|
strict=False), # noqa: B008
|
||
|
|
Secret.from_env_var("OPENSEARCH_PASSWORD",
|
||
|
|
strict=False), # noqa: B008
|
||
|
|
),
|
||
|
|
use_ssl: Optional[bool] = None,
|
||
|
|
verify_certs: Optional[bool] = None,
|
||
|
|
timeout: Optional[int] = None,
|
||
|
|
**kwargs: Any) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Creates a new OpenSearchDocumentStore instance.
|
||
|
|
|
||
|
|
The ``embeddings_dim``, ``method``, ``mappings``, and ``settings`` arguments are only used if the index does not
|
||
|
|
exist and needs to be created. If the index already exists, its current configurations will be used.
|
||
|
|
|
||
|
|
For more information on connection parameters, see the [official OpenSearch documentation](https://opensearch.org/docs/latest/clients/python-low-level/`connecting`-to-opensearch)
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `hosts`: List of hosts running the OpenSearch client. Defaults to None
|
||
|
|
- `index`: Name of index in OpenSearch, if it doesn't exist it will be created. Defaults to "default"
|
||
|
|
- `max_chunk_bytes`: Maximum size of the requests in bytes. Defaults to 100MB
|
||
|
|
- `embedding_dim`: Dimension of the embeddings. Defaults to 768
|
||
|
|
- `return_embedding`: Whether to return the embedding of the retrieved Documents. This parameter also applies to the
|
||
|
|
`filter_documents` and `filter_documents_async` methods.
|
||
|
|
- `method`: The method definition of the underlying configuration of the approximate k-NN algorithm. Please
|
||
|
|
see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/`method`-definitions)
|
||
|
|
for more information. Defaults to None
|
||
|
|
- `mappings`: The mapping of how the documents are stored and indexed. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/field-types/)
|
||
|
|
for more information. If None, it uses the embedding_dim and method arguments to create default mappings.
|
||
|
|
Defaults to None
|
||
|
|
- `settings`: The settings of the index to be created. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/`index`-settings)
|
||
|
|
for more information. Defaults to `{"index.knn": True}`.
|
||
|
|
- `create_index`: Whether to create the index if it doesn't exist. Defaults to True
|
||
|
|
- `http_auth`: http_auth param passed to the underlying connection class.
|
||
|
|
For basic authentication with default connection class `Urllib3HttpConnection` this can be
|
||
|
|
- a tuple of (username, password)
|
||
|
|
- a list of [username, password]
|
||
|
|
- a string of "username:password"
|
||
|
|
If not provided, will read values from OPENSEARCH_USERNAME and OPENSEARCH_PASSWORD environment variables.
|
||
|
|
For AWS authentication with `Urllib3HttpConnection` pass an instance of `AWSAuth`.
|
||
|
|
Defaults to None
|
||
|
|
- `use_ssl`: Whether to use SSL. Defaults to None
|
||
|
|
- `verify_certs`: Whether to verify certificates. Defaults to None
|
||
|
|
- `timeout`: Timeout in seconds. Defaults to None
|
||
|
|
- `**kwargs`: Optional arguments that ``OpenSearch`` takes. For the full list of supported kwargs,
|
||
|
|
see the [official OpenSearch reference](https://opensearch-project.github.io/opensearch-py/api-ref/clients/opensearch_client.html)
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.create_index"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.create\_index
|
||
|
|
|
||
|
|
```python
|
||
|
|
def create_index(index: Optional[str] = None,
|
||
|
|
mappings: Optional[Dict[str, Any]] = None,
|
||
|
|
settings: Optional[Dict[str, Any]] = None) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Creates an index in OpenSearch.
|
||
|
|
|
||
|
|
Note that this method ignores the `create_index` argument from the constructor.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `index`: Name of the index to create. If None, the index name from the constructor is used.
|
||
|
|
- `mappings`: The mapping of how the documents are stored and indexed. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/field-types/)
|
||
|
|
for more information. If None, the mappings from the constructor are used.
|
||
|
|
- `settings`: The settings of the index to be created. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/`index`-settings)
|
||
|
|
for more information. If None, the settings from the constructor are used.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.to_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.to\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
def to_dict() -> Dict[str, Any]
|
||
|
|
```
|
||
|
|
|
||
|
|
Serializes the component to a dictionary.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Dictionary with serialized data.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.from_dict"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.from\_dict
|
||
|
|
|
||
|
|
```python
|
||
|
|
@classmethod
|
||
|
|
def from_dict(cls, data: Dict[str, Any]) -> "OpenSearchDocumentStore"
|
||
|
|
```
|
||
|
|
|
||
|
|
Deserializes the component from a dictionary.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `data`: Dictionary to deserialize from.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
Deserialized component.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.count_documents"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.count\_documents
|
||
|
|
|
||
|
|
```python
|
||
|
|
def count_documents() -> int
|
||
|
|
```
|
||
|
|
|
||
|
|
Returns how many documents are present in the document store.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.count_documents_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.count\_documents\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
async def count_documents_async() -> int
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously returns the total number of documents in the document store.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.filter_documents"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.filter\_documents
|
||
|
|
|
||
|
|
```python
|
||
|
|
def filter_documents(
|
||
|
|
filters: Optional[Dict[str, Any]] = None) -> List[Document]
|
||
|
|
```
|
||
|
|
|
||
|
|
Returns the documents that match the filters provided.
|
||
|
|
|
||
|
|
For a detailed specification of the filters,
|
||
|
|
refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering)
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `filters`: The filters to apply to the document list.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
A list of Documents that match the given filters.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.filter_documents_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.filter\_documents\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
async def filter_documents_async(
|
||
|
|
filters: Optional[Dict[str, Any]] = None) -> List[Document]
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously returns the documents that match the filters provided.
|
||
|
|
|
||
|
|
For a detailed specification of the filters,
|
||
|
|
refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering)
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `filters`: The filters to apply to the document list.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
A list of Documents that match the given filters.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.write_documents"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.write\_documents
|
||
|
|
|
||
|
|
```python
|
||
|
|
def write_documents(documents: List[Document],
|
||
|
|
policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int
|
||
|
|
```
|
||
|
|
|
||
|
|
Writes documents to the document store.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `documents`: A list of Documents to write to the document store.
|
||
|
|
- `policy`: The duplicate policy to use when writing documents.
|
||
|
|
|
||
|
|
**Raises**:
|
||
|
|
|
||
|
|
- `DuplicateDocumentError`: If a document with the same id already exists in the document store
|
||
|
|
and the policy is set to `DuplicatePolicy.FAIL` (or not specified).
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
The number of documents written to the document store.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.write_documents_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.write\_documents\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
async def write_documents_async(
|
||
|
|
documents: List[Document],
|
||
|
|
policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously writes documents to the document store.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `documents`: A list of Documents to write to the document store.
|
||
|
|
- `policy`: The duplicate policy to use when writing documents.
|
||
|
|
|
||
|
|
**Returns**:
|
||
|
|
|
||
|
|
The number of documents written to the document store.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.delete_documents"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.delete\_documents
|
||
|
|
|
||
|
|
```python
|
||
|
|
def delete_documents(document_ids: List[str]) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Deletes documents that match the provided `document_ids` from the document store.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `document_ids`: the document ids to delete
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.delete_documents_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.delete\_documents\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
async def delete_documents_async(document_ids: List[str]) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously deletes documents that match the provided `document_ids` from the document store.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `document_ids`: the document ids to delete
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.delete_all_documents"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.delete\_all\_documents
|
||
|
|
|
||
|
|
```python
|
||
|
|
def delete_all_documents(recreate_index: bool = False) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Deletes all documents in the document store.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `recreate_index`: If True, the index will be deleted and recreated with the original mappings and
|
||
|
|
settings. If False, all documents will be deleted using the `delete_by_query` API.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore.delete_all_documents_async"></a>
|
||
|
|
|
||
|
|
#### OpenSearchDocumentStore.delete\_all\_documents\_async
|
||
|
|
|
||
|
|
```python
|
||
|
|
async def delete_all_documents_async(recreate_index: bool = False) -> None
|
||
|
|
```
|
||
|
|
|
||
|
|
Asynchronously deletes all documents in the document store.
|
||
|
|
|
||
|
|
**Arguments**:
|
||
|
|
|
||
|
|
- `recreate_index`: If True, the index will be deleted and recreated with the original mappings and
|
||
|
|
settings. If False, all documents will be deleted using the `delete_by_query` API.
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.filters"></a>
|
||
|
|
|
||
|
|
# Module haystack\_integrations.document\_stores.opensearch.filters
|
||
|
|
|
||
|
|
<a id="haystack_integrations.document_stores.opensearch.filters.normalize_filters"></a>
|
||
|
|
|
||
|
|
#### normalize\_filters
|
||
|
|
|
||
|
|
```python
|
||
|
|
def normalize_filters(filters: Dict[str, Any]) -> Dict[str, Any]
|
||
|
|
```
|
||
|
|
|
||
|
|
Converts Haystack filters in OpenSearch compatible filters.
|