mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-24 13:38:53 +00:00
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25 * Update Documentation & Code Style * Update Documentation & Code Style * Update Documentation & Code Style * Add back all_terms_must_match * fix syntax * Update Documentation & Code Style * Update Documentation & Code Style * Creating a wrapper for old ES retriever with deprecated wrapper * Update Documentation & Code Style * New method for deprecating old ESRetriever * New attempt for deprecating the ESRetriever * Reverting to the simplest solution - warning logged * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
This commit is contained in:
parent
34bca2ba83
commit
d49e92e21c
@ -47,7 +47,7 @@ With this document_classifier, you can directly get predictions via predict()
|
||||
**Usage example at query time:**
|
||||
```python
|
||||
| ...
|
||||
| retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
| retriever = BM25Retriever(document_store=document_store)
|
||||
| document_classifier = TransformersDocumentClassifier(model_name_or_path="bhadresh-savani/distilbert-base-uncased-emotion")
|
||||
| p = Pipeline()
|
||||
| p.add_node(component=retriever, name="Retriever", inputs=["Query"])
|
||||
|
||||
@ -434,7 +434,7 @@ A DocumentStore using Elasticsearch to store and query the documents for our sea
|
||||
- `aws4auth`: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
|
||||
- `index`: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
|
||||
- `label_index`: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
|
||||
- `search_fields`: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
- `search_fields`: Name of fields used by BM25Retriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
- `content_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
|
||||
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
|
||||
- `name_field`: Name of field that contains the title of the the doc
|
||||
@ -1250,7 +1250,7 @@ the KNN plugin that can scale to a large number of documents.
|
||||
- `aws4auth`: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
|
||||
- `index`: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
|
||||
- `label_index`: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
|
||||
- `search_fields`: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
- `search_fields`: Name of fields used by BM25Retriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
- `content_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
|
||||
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
|
||||
- `name_field`: Name of field that contains the title of the the doc
|
||||
|
||||
@ -104,7 +104,7 @@ Here's a sample configuration:
|
||||
| },
|
||||
| {
|
||||
| "name": "MyESRetriever",
|
||||
| "type": "ElasticsearchRetriever",
|
||||
| "type": "BM25Retriever",
|
||||
| "params": {
|
||||
| "document_store": "MyDocumentStore", # params can reference other components defined in the YAML
|
||||
| "custom_query": None,
|
||||
@ -161,7 +161,7 @@ Here's a sample configuration:
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
@ -374,8 +374,8 @@ Add a new node to the pipeline.
|
||||
method to process incoming data from predecessor node.
|
||||
- `name`: The name for the node. It must not contain any dots.
|
||||
- `inputs`: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
|
||||
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
|
||||
of node is sufficient. For instance, a 'BM25Retriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["BM25Retriever"].
|
||||
|
||||
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
|
||||
must be specified explicitly as "QueryClassifier.output_2".
|
||||
@ -672,7 +672,7 @@ Here's a sample configuration:
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
@ -730,7 +730,7 @@ Here's a sample configuration:
|
||||
| },
|
||||
| {
|
||||
| "name": "MyESRetriever",
|
||||
| "type": "ElasticsearchRetriever",
|
||||
| "type": "BM25Retriever",
|
||||
| "params": {
|
||||
| "document_store": "MyDocumentStore", # params can reference other components defined in the YAML
|
||||
| "custom_query": None,
|
||||
@ -882,7 +882,7 @@ Here's a sample configuration:
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
@ -1017,8 +1017,8 @@ Add a new node to the pipeline.
|
||||
method to process incoming data from predecessor node.
|
||||
- `name`: The name for the node. It must not contain any dots.
|
||||
- `inputs`: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
|
||||
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
|
||||
of node is sufficient. For instance, a 'BM25Retriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["BM25Retriever"].
|
||||
|
||||
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
|
||||
must be specified explicitly as "QueryClassifier.output_2".
|
||||
@ -1107,7 +1107,7 @@ Here's a sample configuration:
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
|
||||
@ -81,7 +81,7 @@ https://www.sbert.net/docs/pretrained-models/ce-msmarco.html#usage-with-transfor
|
||||
|
||||
Usage example:
|
||||
...
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
ranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")
|
||||
p = Pipeline()
|
||||
p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])
|
||||
|
||||
@ -94,15 +94,15 @@ contains the keys "predictions" and "metrics".
|
||||
|
||||
# Module sparse
|
||||
|
||||
<a id="sparse.ElasticsearchRetriever"></a>
|
||||
<a id="sparse.BM25Retriever"></a>
|
||||
|
||||
## ElasticsearchRetriever
|
||||
## BM25Retriever
|
||||
|
||||
```python
|
||||
class ElasticsearchRetriever(BaseRetriever)
|
||||
class BM25Retriever(BaseRetriever)
|
||||
```
|
||||
|
||||
<a id="sparse.ElasticsearchRetriever.__init__"></a>
|
||||
<a id="sparse.BM25Retriever.__init__"></a>
|
||||
|
||||
#### \_\_init\_\_
|
||||
|
||||
@ -183,7 +183,7 @@ Defaults to False.
|
||||
```
|
||||
- `top_k`: How many documents to return per query.
|
||||
|
||||
<a id="sparse.ElasticsearchRetriever.retrieve"></a>
|
||||
<a id="sparse.BM25Retriever.retrieve"></a>
|
||||
|
||||
#### retrieve
|
||||
|
||||
@ -209,7 +209,7 @@ Check out https://www.elastic.co/guide/en/elasticsearch/reference/current/http-c
|
||||
## ElasticsearchFilterOnlyRetriever
|
||||
|
||||
```python
|
||||
class ElasticsearchFilterOnlyRetriever(ElasticsearchRetriever)
|
||||
class ElasticsearchFilterOnlyRetriever(BM25Retriever)
|
||||
```
|
||||
|
||||
Naive "Retriever" that returns all documents that match the given filters. No impact of query at all.
|
||||
|
||||
@ -150,16 +150,16 @@ They use some simple but fast algorithm.
|
||||
|
||||
**Alternatives:**
|
||||
|
||||
- Customize the `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters
|
||||
- Customize the `BM25Retriever`with custom queries (e.g. boosting) and filters
|
||||
- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging
|
||||
- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)
|
||||
- Use `DensePassageRetriever` to use different embedding models for passage and query (see Tutorial 6)
|
||||
|
||||
|
||||
```python
|
||||
from haystack.nodes import ElasticsearchRetriever
|
||||
from haystack.nodes import BM25Retriever
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
```
|
||||
|
||||
|
||||
|
||||
@ -91,7 +91,7 @@ got_docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, s
|
||||
```
|
||||
|
||||
Here we initialize the core components that we will be gluing together using the `Pipeline` class.
|
||||
We have a `DocumentStore`, an `ElasticsearchRetriever` and a `FARMReader`.
|
||||
We have a `DocumentStore`, an `BM25Retriever` and a `FARMReader`.
|
||||
These can be combined to create a classic Retriever-Reader pipeline that is designed
|
||||
to perform Open Domain Question Answering.
|
||||
|
||||
@ -100,7 +100,7 @@ to perform Open Domain Question Answering.
|
||||
from haystack import Pipeline
|
||||
from haystack.utils import launch_es
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.nodes import ElasticsearchRetriever, EmbeddingRetriever, FARMReader
|
||||
from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader
|
||||
|
||||
|
||||
# Initialize DocumentStore and index documents
|
||||
@ -110,7 +110,7 @@ document_store.delete_documents()
|
||||
document_store.write_documents(got_docs)
|
||||
|
||||
# Initialize Sparse retriever
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
es_retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Initialize dense retriever
|
||||
embedding_retriever = EmbeddingRetriever(
|
||||
@ -220,7 +220,7 @@ p_extractive.draw("pipeline_extractive.png")
|
||||
|
||||
Pipelines offer a very simple way to ensemble together different components.
|
||||
In this example, we are going to combine the power of an `EmbeddingRetriever`
|
||||
with the keyword based `ElasticsearchRetriever`.
|
||||
with the keyword based `BM25Retriever`.
|
||||
See our [documentation](https://haystack.deepset.ai/docs/latest/retrievermd) to understand why
|
||||
we might want to combine a dense and sparse retriever.
|
||||
|
||||
@ -373,7 +373,7 @@ components: # define all the building-blocks for Pipeline
|
||||
no_ans_boost: -10
|
||||
model_name_or_path: deepset/roberta-base-squad2
|
||||
- name: MyESRetriever
|
||||
type: ElasticsearchRetriever
|
||||
type: BM25Retriever
|
||||
params:
|
||||
document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
custom_query: null
|
||||
|
||||
@ -38,7 +38,7 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial
|
||||
|
||||
from pprint import pprint
|
||||
from tqdm import tqdm
|
||||
from haystack.nodes import QuestionGenerator, ElasticsearchRetriever, FARMReader
|
||||
from haystack.nodes import QuestionGenerator, BM25Retriever, FARMReader
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.pipelines import (
|
||||
QuestionGenerationPipeline,
|
||||
@ -112,7 +112,7 @@ This pipeline takes a query as input. It retrieves relevant documents and then g
|
||||
|
||||
|
||||
```python
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
rqg_pipeline = RetrieverQuestionGenerationPipeline(retriever, question_generator)
|
||||
|
||||
print(f"\n * Generating questions for documents matching the query 'Arya Stark'\n")
|
||||
|
||||
@ -102,7 +102,7 @@ from haystack.utils import (
|
||||
from haystack.pipelines import Pipeline
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.nodes import (
|
||||
ElasticsearchRetriever,
|
||||
BM25Retriever,
|
||||
EmbeddingRetriever,
|
||||
FARMReader,
|
||||
TransformersQueryClassifier,
|
||||
@ -124,7 +124,7 @@ document_store.delete_documents()
|
||||
document_store.write_documents(got_docs)
|
||||
|
||||
# Initialize Sparse retriever
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
es_retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Initialize dense retriever
|
||||
embedding_retriever = EmbeddingRetriever(
|
||||
|
||||
@ -10,7 +10,7 @@ id: "tutorial15md"
|
||||
# Open-Domain QA on Tables
|
||||
[](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial15_TableQA.ipynb)
|
||||
|
||||
This tutorial shows you how to perform question-answering on tables using the `TableTextRetriever` or `ElasticsearchRetriever` as retriever node and the `TableReader` as reader node.
|
||||
This tutorial shows you how to perform question-answering on tables using the `TableTextRetriever` or `BM25Retriever` as retriever node and the `TableReader` as reader node.
|
||||
|
||||
### Prepare environment
|
||||
|
||||
@ -142,7 +142,7 @@ of texts and tables using dense embeddings. It is an extension of the `DensePass
|
||||
|
||||
**Alternatives:**
|
||||
|
||||
- `ElasticsearchRetriever` that uses BM25 algorithm
|
||||
- `BM25Retriever` that uses BM25 algorithm
|
||||
|
||||
|
||||
|
||||
@ -165,9 +165,9 @@ document_store.update_embeddings(retriever=retriever)
|
||||
|
||||
|
||||
```python
|
||||
## Alternative: ElasticsearchRetriever
|
||||
# from haystack.nodes.retriever import ElasticsearchRetriever
|
||||
# retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
## Alternative: BM25Retriever
|
||||
# from haystack.nodes.retriever import BM25Retriever
|
||||
# retriever = BM25Retriever(document_store=document_store)
|
||||
```
|
||||
|
||||
|
||||
|
||||
@ -40,7 +40,7 @@ This tutorial will show you how to integrate a classification model into your pr
|
||||
```python
|
||||
# Here are the imports we need
|
||||
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
from haystack.nodes import PreProcessor, TransformersDocumentClassifier, FARMReader, ElasticsearchRetriever
|
||||
from haystack.nodes import PreProcessor, TransformersDocumentClassifier, FARMReader, BM25Retriever
|
||||
from haystack.schema import Document
|
||||
from haystack.utils import convert_files_to_docs, fetch_archive_from_http, print_answers
|
||||
```
|
||||
@ -163,7 +163,7 @@ All we have to do to filter for one of our classes is to set a filter on "classi
|
||||
# Initialize QA-Pipeline
|
||||
from haystack.pipelines import ExtractiveQAPipeline
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
|
||||
pipe = ExtractiveQAPipeline(reader, retriever)
|
||||
```
|
||||
|
||||
@ -130,9 +130,9 @@ document_store.add_eval_data(
|
||||
|
||||
```python
|
||||
# Initialize Retriever
|
||||
from haystack.nodes import ElasticsearchRetriever
|
||||
from haystack.nodes import BM25Retriever
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Alternative: Evaluate dense retrievers (EmbeddingRetriever or DensePassageRetriever)
|
||||
# The EmbeddingRetriever uses a single transformer based encoder model for query and document.
|
||||
|
||||
@ -155,7 +155,7 @@ document_store.write_documents(docs)
|
||||
|
||||
**Alternatives:**
|
||||
|
||||
- The `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters
|
||||
- The `BM25Retriever`with custom queries (e.g. boosting) and filters
|
||||
- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)
|
||||
- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging
|
||||
|
||||
|
||||
@ -82,7 +82,7 @@ class ElasticsearchDocumentStore(KeywordDocumentStore):
|
||||
:param aws4auth: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
|
||||
:param index: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
|
||||
:param label_index: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
|
||||
:param search_fields: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
:param search_fields: Name of fields used by BM25Retriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
:param content_field: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
|
||||
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
|
||||
:param name_field: Name of field that contains the title of the the doc
|
||||
@ -1644,7 +1644,7 @@ class OpenSearchDocumentStore(ElasticsearchDocumentStore):
|
||||
:param aws4auth: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
|
||||
:param index: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
|
||||
:param label_index: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
|
||||
:param search_fields: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
:param search_fields: Name of fields used by BM25Retriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
|
||||
:param content_field: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
|
||||
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
|
||||
:param name_field: Name of field that contains the title of the the doc
|
||||
|
||||
@ -50,6 +50,9 @@
|
||||
{
|
||||
"$ref": "#/definitions/AzureConverterComponent"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/BM25RetrieverComponent"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/CrawlerComponent"
|
||||
},
|
||||
@ -65,9 +68,6 @@
|
||||
{
|
||||
"$ref": "#/definitions/ElasticsearchFilterOnlyRetrieverComponent"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/ElasticsearchRetrieverComponent"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/EmbeddingRetrieverComponent"
|
||||
},
|
||||
@ -1226,6 +1226,56 @@
|
||||
],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"BM25RetrieverComponent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"title": "Name",
|
||||
"description": "Custom name for the component. Helpful for visualization and debugging.",
|
||||
"type": "string"
|
||||
},
|
||||
"type": {
|
||||
"title": "Type",
|
||||
"description": "Haystack Class name for the component.",
|
||||
"type": "string",
|
||||
"const": "BM25Retriever"
|
||||
},
|
||||
"params": {
|
||||
"title": "Parameters",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"document_store": {
|
||||
"title": "Document Store",
|
||||
"type": "string"
|
||||
},
|
||||
"top_k": {
|
||||
"title": "Top K",
|
||||
"default": 10,
|
||||
"type": "integer"
|
||||
},
|
||||
"all_terms_must_match": {
|
||||
"title": "All Terms Must Match",
|
||||
"default": false,
|
||||
"type": "boolean"
|
||||
},
|
||||
"custom_query": {
|
||||
"title": "Custom Query",
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"document_store"
|
||||
],
|
||||
"additionalProperties": false,
|
||||
"description": "Each parameter can reference other components defined in the same YAML file."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"type",
|
||||
"name"
|
||||
],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"CrawlerComponent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
@ -1546,51 +1596,6 @@
|
||||
],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"ElasticsearchRetrieverComponent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"title": "Name",
|
||||
"description": "Custom name for the component. Helpful for visualization and debugging.",
|
||||
"type": "string"
|
||||
},
|
||||
"type": {
|
||||
"title": "Type",
|
||||
"description": "Haystack Class name for the component.",
|
||||
"type": "string",
|
||||
"const": "ElasticsearchRetriever"
|
||||
},
|
||||
"params": {
|
||||
"title": "Parameters",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"document_store": {
|
||||
"title": "Document Store",
|
||||
"type": "string"
|
||||
},
|
||||
"top_k": {
|
||||
"title": "Top K",
|
||||
"default": 10,
|
||||
"type": "integer"
|
||||
},
|
||||
"custom_query": {
|
||||
"title": "Custom Query",
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"document_store"
|
||||
],
|
||||
"additionalProperties": false,
|
||||
"description": "Each parameter can reference other components defined in the same YAML file."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"type",
|
||||
"name"
|
||||
],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"EmbeddingRetrieverComponent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
|
||||
@ -53,6 +53,9 @@
|
||||
{
|
||||
"$ref": "#/definitions/AzureConverterComponent"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/BM25RetrieverComponent"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/CrawlerComponent"
|
||||
},
|
||||
@ -1700,6 +1703,56 @@
|
||||
],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"BM25RetrieverComponent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"title": "Name",
|
||||
"description": "Custom name for the component. Helpful for visualization and debugging.",
|
||||
"type": "string"
|
||||
},
|
||||
"type": {
|
||||
"title": "Type",
|
||||
"description": "Haystack Class name for the component.",
|
||||
"type": "string",
|
||||
"const": "BM25Retriever"
|
||||
},
|
||||
"params": {
|
||||
"title": "Parameters",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"document_store": {
|
||||
"title": "Document Store",
|
||||
"type": "string"
|
||||
},
|
||||
"top_k": {
|
||||
"title": "Top K",
|
||||
"default": 10,
|
||||
"type": "integer"
|
||||
},
|
||||
"all_terms_must_match": {
|
||||
"title": "All Terms Must Match",
|
||||
"default": false,
|
||||
"type": "boolean"
|
||||
},
|
||||
"custom_query": {
|
||||
"title": "Custom Query",
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"document_store"
|
||||
],
|
||||
"additionalProperties": false,
|
||||
"description": "Each parameter can reference other components defined in the same YAML file."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"type",
|
||||
"name"
|
||||
],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"CrawlerComponent": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
|
||||
@ -30,6 +30,7 @@ from haystack.nodes.retriever import (
|
||||
BaseRetriever,
|
||||
DensePassageRetriever,
|
||||
EmbeddingRetriever,
|
||||
BM25Retriever,
|
||||
ElasticsearchRetriever,
|
||||
ElasticsearchFilterOnlyRetriever,
|
||||
TfidfRetriever,
|
||||
|
||||
@ -28,7 +28,7 @@ class TransformersDocumentClassifier(BaseDocumentClassifier):
|
||||
**Usage example at query time:**
|
||||
```python
|
||||
| ...
|
||||
| retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
| retriever = BM25Retriever(document_store=document_store)
|
||||
| document_classifier = TransformersDocumentClassifier(model_name_or_path="bhadresh-savani/distilbert-base-uncased-emotion")
|
||||
| p = Pipeline()
|
||||
| p.add_node(component=retriever, name="Retriever", inputs=["Query"])
|
||||
|
||||
@ -28,7 +28,7 @@ class SentenceTransformersRanker(BaseRanker):
|
||||
|
||||
Usage example:
|
||||
...
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
ranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")
|
||||
p = Pipeline()
|
||||
p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])
|
||||
|
||||
@ -1,4 +1,9 @@
|
||||
from haystack.nodes.retriever.base import BaseRetriever
|
||||
from haystack.nodes.retriever.dense import DensePassageRetriever, EmbeddingRetriever, TableTextRetriever
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever
|
||||
from haystack.nodes.retriever.sparse import (
|
||||
BM25Retriever,
|
||||
ElasticsearchRetriever,
|
||||
ElasticsearchFilterOnlyRetriever,
|
||||
TfidfRetriever,
|
||||
)
|
||||
from haystack.nodes.retriever.text2sparql import Text2SparqlRetriever
|
||||
|
||||
@ -15,7 +15,7 @@ from haystack.nodes.retriever import BaseRetriever
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ElasticsearchRetriever(BaseRetriever):
|
||||
class BM25Retriever(BaseRetriever):
|
||||
def __init__(
|
||||
self,
|
||||
document_store: KeywordDocumentStore,
|
||||
@ -139,7 +139,19 @@ class ElasticsearchRetriever(BaseRetriever):
|
||||
return documents
|
||||
|
||||
|
||||
class ElasticsearchFilterOnlyRetriever(ElasticsearchRetriever):
|
||||
class ElasticsearchRetriever(BM25Retriever):
|
||||
def __init__(
|
||||
self,
|
||||
document_store: KeywordDocumentStore,
|
||||
top_k: int = 10,
|
||||
all_terms_must_match: bool = False,
|
||||
custom_query: Optional[str] = None,
|
||||
):
|
||||
logger.warn("This class is now deprecated. Please use the BM25Retriever instead")
|
||||
super().__init__(document_store, top_k, all_terms_must_match, custom_query)
|
||||
|
||||
|
||||
class ElasticsearchFilterOnlyRetriever(BM25Retriever):
|
||||
"""
|
||||
Naive "Retriever" that returns all documents that match the given filters. No impact of query at all.
|
||||
Helpful for benchmarking, testing and if you want to do QA on small documents without an "active" retriever.
|
||||
|
||||
@ -167,7 +167,7 @@ class BasePipeline(ABC):
|
||||
| },
|
||||
| {
|
||||
| "name": "MyESRetriever",
|
||||
| "type": "ElasticsearchRetriever",
|
||||
| "type": "BM25Retriever",
|
||||
| "params": {
|
||||
| "document_store": "MyDocumentStore", # params can reference other components defined in the YAML
|
||||
| "custom_query": None,
|
||||
@ -217,7 +217,7 @@ class BasePipeline(ABC):
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
@ -492,8 +492,8 @@ class Pipeline(BasePipeline):
|
||||
method to process incoming data from predecessor node.
|
||||
:param name: The name for the node. It must not contain any dots.
|
||||
:param inputs: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
|
||||
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
|
||||
of node is sufficient. For instance, a 'BM25Retriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["BM25Retriever"].
|
||||
|
||||
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
|
||||
must be specified explicitly as "QueryClassifier.output_2".
|
||||
@ -1302,7 +1302,7 @@ class Pipeline(BasePipeline):
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
@ -1366,7 +1366,7 @@ class Pipeline(BasePipeline):
|
||||
| },
|
||||
| {
|
||||
| "name": "MyESRetriever",
|
||||
| "type": "ElasticsearchRetriever",
|
||||
| "type": "BM25Retriever",
|
||||
| "params": {
|
||||
| "document_store": "MyDocumentStore", # params can reference other components defined in the YAML
|
||||
| "custom_query": None,
|
||||
@ -1682,7 +1682,7 @@ class RayPipeline(Pipeline):
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
@ -1800,8 +1800,8 @@ class RayPipeline(Pipeline):
|
||||
from Python: https://docs.ray.io/en/master/serve/package-ref.html#servehandle-api.
|
||||
:param name: The name for the node. It must not contain any dots.
|
||||
:param inputs: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
|
||||
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
|
||||
of node is sufficient. For instance, a 'BM25Retriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["BM25Retriever"].
|
||||
|
||||
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
|
||||
must be specified explicitly as "QueryClassifier.output_2".
|
||||
|
||||
@ -38,8 +38,8 @@ class BaseStandardPipeline(ABC):
|
||||
method to process incoming data from predecessor node.
|
||||
:param name: The name for the node. It must not contain any dots.
|
||||
:param inputs: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
|
||||
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
|
||||
of node is sufficient. For instance, a 'BM25Retriever' node would always output a single
|
||||
edge with a list of documents. It can be represented as ["BM25Retriever"].
|
||||
|
||||
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
|
||||
must be specified explicitly as "QueryClassifier.output_2".
|
||||
@ -100,7 +100,7 @@ class BaseStandardPipeline(ABC):
|
||||
| no_ans_boost: -10
|
||||
| model_name_or_path: deepset/roberta-base-squad2
|
||||
| - name: MyESRetriever
|
||||
| type: ElasticsearchRetriever
|
||||
| type: BM25Retriever
|
||||
| params:
|
||||
| document_store: MyDocumentStore # params can reference other components defined in the YAML
|
||||
| custom_query: null
|
||||
|
||||
@ -70,7 +70,7 @@ from elasticsearch import Elasticsearch
|
||||
from haystack.document_stores.base import BaseDocumentStore
|
||||
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore # keep it here !
|
||||
from haystack.document_stores.faiss import FAISSDocumentStore # keep it here !
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever # keep it here ! # pylint: disable=unused-import
|
||||
from haystack.nodes.retriever.sparse import BM25Retriever # keep it here ! # pylint: disable=unused-import
|
||||
from haystack.nodes.retriever.dense import DensePassageRetriever # keep it here ! # pylint: disable=unused-import
|
||||
from haystack.nodes.preprocessor import PreProcessor
|
||||
from haystack.nodes.retriever.base import BaseRetriever
|
||||
@ -117,10 +117,8 @@ class HaystackDocumentStore:
|
||||
|
||||
class HaystackRetriever:
|
||||
def __init__(self, document_store: BaseDocumentStore, retriever_type: str, **kwargs):
|
||||
if retriever_type not in ["ElasticsearchRetriever", "DensePassageRetriever", "EmbeddingRetriever"]:
|
||||
raise Exception(
|
||||
"Use one of these types: ElasticsearchRetriever", "DensePassageRetriever", "EmbeddingRetriever"
|
||||
)
|
||||
if retriever_type not in ["BM25Retriever", "DensePassageRetriever", "EmbeddingRetriever"]:
|
||||
raise Exception("Use one of these types: BM25Retriever", "DensePassageRetriever", "EmbeddingRetriever")
|
||||
self._retriever_type = retriever_type
|
||||
self._document_store = document_store
|
||||
self._kwargs = kwargs
|
||||
@ -252,7 +250,7 @@ def main(
|
||||
dpr_output_filename: Path,
|
||||
preprocessor,
|
||||
document_store_type_config: Tuple[str, Dict] = ("ElasticsearchDocumentStore", {}),
|
||||
retriever_type_config: Tuple[str, Dict] = ("ElasticsearchRetriever", {}),
|
||||
retriever_type_config: Tuple[str, Dict] = ("BM25Retriever", {}),
|
||||
num_hard_negative_ctxs: int = 30,
|
||||
split_dataset: bool = False,
|
||||
):
|
||||
@ -348,7 +346,7 @@ if __name__ == "__main__":
|
||||
preprocessor=preprocessor,
|
||||
document_store_type_config=("ElasticsearchDocumentStore", store_dpr_config),
|
||||
# retriever_type_config=("DensePassageRetriever", retriever_dpr_config), # dpr
|
||||
retriever_type_config=("ElasticsearchRetriever", retriever_bm25_config), # bm25
|
||||
retriever_type_config=("BM25Retriever", retriever_bm25_config), # bm25
|
||||
num_hard_negative_ctxs=num_hard_negative_ctxs,
|
||||
split_dataset=split_dataset,
|
||||
)
|
||||
|
||||
@ -8,7 +8,7 @@ components: # define all the building-blocks for Pipeline
|
||||
params:
|
||||
host: localhost
|
||||
- name: Retriever
|
||||
type: ElasticsearchRetriever
|
||||
type: BM25Retriever
|
||||
params:
|
||||
document_store: DocumentStore # params can reference other components defined in the YAML
|
||||
top_k: 5
|
||||
|
||||
@ -4,7 +4,7 @@ from haystack.document_stores.memory import InMemoryDocumentStore
|
||||
from haystack.document_stores.elasticsearch import Elasticsearch, ElasticsearchDocumentStore, OpenSearchDocumentStore
|
||||
from haystack.document_stores.faiss import FAISSDocumentStore
|
||||
from haystack.document_stores.milvus import MilvusDocumentStore
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever, TfidfRetriever
|
||||
from haystack.nodes.retriever.sparse import BM25Retriever, TfidfRetriever
|
||||
from haystack.nodes.retriever.dense import DensePassageRetriever, EmbeddingRetriever
|
||||
from haystack.nodes.reader.farm import FARMReader
|
||||
from haystack.nodes.reader.transformers import TransformersReader
|
||||
@ -104,7 +104,7 @@ def get_document_store(document_store_type, similarity="dot_product", index="doc
|
||||
|
||||
def get_retriever(retriever_name, doc_store, devices):
|
||||
if retriever_name == "elastic":
|
||||
return ElasticsearchRetriever(doc_store)
|
||||
return BM25Retriever(doc_store)
|
||||
if retriever_name == "tfidf":
|
||||
return TfidfRetriever(doc_store)
|
||||
if retriever_name == "dpr":
|
||||
|
||||
@ -54,7 +54,7 @@ from haystack.nodes.answer_generator.transformers import Seq2SeqGenerator
|
||||
from haystack.nodes.answer_generator.transformers import RAGenerator
|
||||
from haystack.nodes.ranker import SentenceTransformersRanker
|
||||
from haystack.nodes.document_classifier.transformers import TransformersDocumentClassifier
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchFilterOnlyRetriever, ElasticsearchRetriever, TfidfRetriever
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchFilterOnlyRetriever, BM25Retriever, TfidfRetriever
|
||||
from haystack.nodes.retriever.dense import DensePassageRetriever, EmbeddingRetriever, TableTextRetriever
|
||||
from haystack.nodes.reader.farm import FARMReader
|
||||
from haystack.nodes.reader.transformers import TransformersReader
|
||||
@ -622,7 +622,7 @@ def get_retriever(retriever_type, document_store):
|
||||
embed_title=True,
|
||||
)
|
||||
elif retriever_type == "elasticsearch":
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
elif retriever_type == "es_filter_only":
|
||||
retriever = ElasticsearchFilterOnlyRetriever(document_store=document_store)
|
||||
elif retriever_type == "table_text_retriever":
|
||||
|
||||
@ -11,7 +11,7 @@
|
||||
},
|
||||
{
|
||||
"name": "Retriever",
|
||||
"type": "ElasticsearchRetriever",
|
||||
"type": "BM25Retriever",
|
||||
"params": {
|
||||
"document_store": "DocumentStore",
|
||||
"top_k": 5
|
||||
|
||||
@ -8,7 +8,7 @@ components:
|
||||
model_name_or_path: deepset/roberta-base-squad2
|
||||
num_processes: 0
|
||||
- name: ESRetriever
|
||||
type: ElasticsearchRetriever
|
||||
type: BM25Retriever
|
||||
params:
|
||||
document_store: DocumentStore
|
||||
- name: DocumentStore
|
||||
|
||||
@ -8,7 +8,7 @@ components:
|
||||
model_name_or_path: deepset/roberta-base-squad2
|
||||
num_processes: 0
|
||||
- name: ESRetriever
|
||||
type: ElasticsearchRetriever
|
||||
type: BM25Retriever
|
||||
params:
|
||||
document_store: DocumentStore
|
||||
- name: DocumentStore
|
||||
|
||||
@ -7,7 +7,7 @@ from haystack.nodes.preprocessor import PreProcessor
|
||||
from haystack.nodes.evaluator import EvalAnswers, EvalDocuments
|
||||
from haystack.nodes.query_classifier.transformers import TransformersQueryClassifier
|
||||
from haystack.nodes.retriever.dense import DensePassageRetriever
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever
|
||||
from haystack.nodes.retriever.sparse import BM25Retriever
|
||||
from haystack.pipelines.base import Pipeline
|
||||
from haystack.pipelines import ExtractiveQAPipeline, GenerativeQAPipeline, SearchSummarizationPipeline
|
||||
from haystack.pipelines.standard_pipelines import (
|
||||
@ -950,7 +950,7 @@ def test_question_generation_eval(retriever_with_docs, question_generator):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
@pytest.mark.parametrize("reader", ["farm"], indirect=True)
|
||||
def test_qa_multi_retriever_pipeline_eval(document_store_with_docs, reader):
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
dpr_retriever = DensePassageRetriever(document_store_with_docs)
|
||||
document_store_with_docs.update_embeddings(retriever=dpr_retriever)
|
||||
|
||||
@ -1014,7 +1014,7 @@ def test_qa_multi_retriever_pipeline_eval(document_store_with_docs, reader):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
@pytest.mark.parametrize("reader", ["farm"], indirect=True)
|
||||
def test_multi_retriever_pipeline_eval(document_store_with_docs, reader):
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
dpr_retriever = DensePassageRetriever(document_store_with_docs)
|
||||
document_store_with_docs.update_embeddings(retriever=dpr_retriever)
|
||||
|
||||
@ -1073,7 +1073,7 @@ def test_multi_retriever_pipeline_eval(document_store_with_docs, reader):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
@pytest.mark.parametrize("reader", ["farm"], indirect=True)
|
||||
def test_multi_retriever_pipeline_with_asymmetric_qa_eval(document_store_with_docs, reader):
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
dpr_retriever = DensePassageRetriever(document_store_with_docs)
|
||||
document_store_with_docs.update_embeddings(retriever=dpr_retriever)
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
import pytest
|
||||
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever
|
||||
from haystack.nodes.retriever.sparse import BM25Retriever
|
||||
from haystack.nodes.reader import FARMReader
|
||||
from haystack.pipelines import Pipeline
|
||||
|
||||
@ -10,7 +10,7 @@ from haystack.nodes.extractor import EntityExtractor, simplify_ner_for_qa
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_extractor(document_store_with_docs):
|
||||
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
ner = EntityExtractor()
|
||||
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", num_processes=0)
|
||||
|
||||
@ -30,7 +30,7 @@ def test_extractor(document_store_with_docs):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_extractor_output_simplifier(document_store_with_docs):
|
||||
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
ner = EntityExtractor()
|
||||
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", num_processes=0)
|
||||
|
||||
|
||||
@ -22,7 +22,7 @@ from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
from haystack.nodes.other.join_docs import JoinDocuments
|
||||
from haystack.nodes.base import BaseComponent
|
||||
from haystack.nodes.retriever.base import BaseRetriever
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever
|
||||
from haystack.nodes.retriever.sparse import BM25Retriever
|
||||
from haystack.pipelines import Pipeline, DocumentSearchPipeline, RootNode
|
||||
from haystack.pipelines.config import validate_config_strings
|
||||
from haystack.pipelines.utils import generate_code
|
||||
@ -409,7 +409,7 @@ def test_generate_code_simple_pipeline():
|
||||
"components": [
|
||||
{
|
||||
"name": "retri",
|
||||
"type": "ElasticsearchRetriever",
|
||||
"type": "BM25Retriever",
|
||||
"params": {"document_store": "ElasticsearchDocumentStore", "top_k": 20},
|
||||
},
|
||||
{
|
||||
@ -424,7 +424,7 @@ def test_generate_code_simple_pipeline():
|
||||
code = generate_code(pipeline_config=config, pipeline_variable_name="p", generate_imports=False)
|
||||
assert code == (
|
||||
'elasticsearch_document_store = ElasticsearchDocumentStore(index="my-index")\n'
|
||||
"retri = ElasticsearchRetriever(document_store=elasticsearch_document_store, top_k=20)\n"
|
||||
"retri = BM25Retriever(document_store=elasticsearch_document_store, top_k=20)\n"
|
||||
"\n"
|
||||
"p = Pipeline()\n"
|
||||
'p.add_node(component=retri, name="retri", inputs=["Query"])'
|
||||
@ -436,7 +436,7 @@ def test_generate_code_imports():
|
||||
"version": "master",
|
||||
"components": [
|
||||
{"name": "DocumentStore", "type": "ElasticsearchDocumentStore"},
|
||||
{"name": "retri", "type": "ElasticsearchRetriever", "params": {"document_store": "DocumentStore"}},
|
||||
{"name": "retri", "type": "BM25Retriever", "params": {"document_store": "DocumentStore"}},
|
||||
{"name": "retri2", "type": "TfidfRetriever", "params": {"document_store": "DocumentStore"}},
|
||||
],
|
||||
"pipelines": [
|
||||
@ -450,11 +450,11 @@ def test_generate_code_imports():
|
||||
code = generate_code(pipeline_config=pipeline_config, pipeline_variable_name="p", generate_imports=True)
|
||||
assert code == (
|
||||
"from haystack.document_stores import ElasticsearchDocumentStore\n"
|
||||
"from haystack.nodes import ElasticsearchRetriever, TfidfRetriever\n"
|
||||
"from haystack.nodes import BM25Retriever, TfidfRetriever\n"
|
||||
"from haystack.pipelines import Pipeline\n"
|
||||
"\n"
|
||||
"document_store = ElasticsearchDocumentStore()\n"
|
||||
"retri = ElasticsearchRetriever(document_store=document_store)\n"
|
||||
"retri = BM25Retriever(document_store=document_store)\n"
|
||||
"retri_2 = TfidfRetriever(document_store=document_store)\n"
|
||||
"\n"
|
||||
"p = Pipeline()\n"
|
||||
@ -468,7 +468,7 @@ def test_generate_code_imports_no_pipeline_cls():
|
||||
"version": "master",
|
||||
"components": [
|
||||
{"name": "DocumentStore", "type": "ElasticsearchDocumentStore"},
|
||||
{"name": "retri", "type": "ElasticsearchRetriever", "params": {"document_store": "DocumentStore"}},
|
||||
{"name": "retri", "type": "BM25Retriever", "params": {"document_store": "DocumentStore"}},
|
||||
],
|
||||
"pipelines": [{"name": "Query", "nodes": [{"name": "retri", "inputs": ["Query"]}]}],
|
||||
}
|
||||
@ -481,10 +481,10 @@ def test_generate_code_imports_no_pipeline_cls():
|
||||
)
|
||||
assert code == (
|
||||
"from haystack.document_stores import ElasticsearchDocumentStore\n"
|
||||
"from haystack.nodes import ElasticsearchRetriever\n"
|
||||
"from haystack.nodes import BM25Retriever\n"
|
||||
"\n"
|
||||
"document_store = ElasticsearchDocumentStore()\n"
|
||||
"retri = ElasticsearchRetriever(document_store=document_store)\n"
|
||||
"retri = BM25Retriever(document_store=document_store)\n"
|
||||
"\n"
|
||||
"p = Pipeline()\n"
|
||||
'p.add_node(component=retri, name="retri", inputs=["Query"])'
|
||||
@ -496,7 +496,7 @@ def test_generate_code_comment():
|
||||
"version": "master",
|
||||
"components": [
|
||||
{"name": "DocumentStore", "type": "ElasticsearchDocumentStore"},
|
||||
{"name": "retri", "type": "ElasticsearchRetriever", "params": {"document_store": "DocumentStore"}},
|
||||
{"name": "retri", "type": "BM25Retriever", "params": {"document_store": "DocumentStore"}},
|
||||
],
|
||||
"pipelines": [{"name": "Query", "nodes": [{"name": "retri", "inputs": ["Query"]}]}],
|
||||
}
|
||||
@ -507,11 +507,11 @@ def test_generate_code_comment():
|
||||
"# This is my comment\n"
|
||||
"# ...and here is a new line\n"
|
||||
"from haystack.document_stores import ElasticsearchDocumentStore\n"
|
||||
"from haystack.nodes import ElasticsearchRetriever\n"
|
||||
"from haystack.nodes import BM25Retriever\n"
|
||||
"from haystack.pipelines import Pipeline\n"
|
||||
"\n"
|
||||
"document_store = ElasticsearchDocumentStore()\n"
|
||||
"retri = ElasticsearchRetriever(document_store=document_store)\n"
|
||||
"retri = BM25Retriever(document_store=document_store)\n"
|
||||
"\n"
|
||||
"p = Pipeline()\n"
|
||||
'p.add_node(component=retri, name="retri", inputs=["Query"])'
|
||||
@ -536,7 +536,7 @@ def test_generate_code_is_component_order_invariant():
|
||||
doc_store = {"name": "ElasticsearchDocumentStore", "type": "ElasticsearchDocumentStore"}
|
||||
es_retriever = {
|
||||
"name": "EsRetriever",
|
||||
"type": "ElasticsearchRetriever",
|
||||
"type": "BM25Retriever",
|
||||
"params": {"document_store": "ElasticsearchDocumentStore"},
|
||||
}
|
||||
emb_retriever = {
|
||||
@ -557,7 +557,7 @@ def test_generate_code_is_component_order_invariant():
|
||||
|
||||
expected_code = (
|
||||
"elasticsearch_document_store = ElasticsearchDocumentStore()\n"
|
||||
"es_retriever = ElasticsearchRetriever(document_store=elasticsearch_document_store)\n"
|
||||
"es_retriever = BM25Retriever(document_store=elasticsearch_document_store)\n"
|
||||
'embedding_retriever = EmbeddingRetriever(document_store=elasticsearch_document_store, embedding_model="sentence-transformers/all-MiniLM-L6-v2")\n'
|
||||
"join_results = JoinDocuments()\n"
|
||||
"\n"
|
||||
@ -692,7 +692,7 @@ def test_load_from_deepset_cloud_query():
|
||||
)
|
||||
retriever = query_pipeline.get_node("Retriever")
|
||||
document_store = retriever.document_store
|
||||
assert isinstance(retriever, ElasticsearchRetriever)
|
||||
assert isinstance(retriever, BM25Retriever)
|
||||
assert isinstance(document_store, DeepsetCloudDocumentStore)
|
||||
assert document_store == query_pipeline.get_document_store()
|
||||
|
||||
@ -920,7 +920,7 @@ def test_save_nonexisting_pipeline_to_deepset_cloud():
|
||||
)
|
||||
|
||||
es_document_store = ElasticsearchDocumentStore()
|
||||
es_retriever = ElasticsearchRetriever(document_store=es_document_store)
|
||||
es_retriever = BM25Retriever(document_store=es_document_store)
|
||||
file_converter = TextConverter()
|
||||
preprocessor = PreProcessor()
|
||||
|
||||
|
||||
@ -4,7 +4,7 @@ import json
|
||||
import pytest
|
||||
|
||||
from haystack.pipelines import Pipeline, RootNode
|
||||
from haystack.nodes import FARMReader, ElasticsearchRetriever
|
||||
from haystack.nodes import FARMReader, BM25Retriever
|
||||
|
||||
from .conftest import SAMPLES_PATH, MockRetriever as BaseMockRetriever, MockReader
|
||||
|
||||
@ -26,7 +26,7 @@ class MockRetriever(BaseMockRetriever):
|
||||
def test_node_names_validation(document_store_with_docs, tmp_path):
|
||||
pipeline = Pipeline()
|
||||
pipeline.add_node(
|
||||
component=ElasticsearchRetriever(document_store=document_store_with_docs), name="Retriever", inputs=["Query"]
|
||||
component=BM25Retriever(document_store=document_store_with_docs), name="Retriever", inputs=["Query"]
|
||||
)
|
||||
pipeline.add_node(
|
||||
component=FARMReader(model_name_or_path="deepset/minilm-uncased-squad2", num_processes=0),
|
||||
@ -56,7 +56,7 @@ def test_node_names_validation(document_store_with_docs, tmp_path):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_debug_attributes_global(document_store_with_docs, tmp_path):
|
||||
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
reader = FARMReader(model_name_or_path="deepset/minilm-uncased-squad2", num_processes=0)
|
||||
|
||||
pipeline = Pipeline()
|
||||
@ -86,7 +86,7 @@ def test_debug_attributes_global(document_store_with_docs, tmp_path):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_debug_attributes_per_node(document_store_with_docs, tmp_path):
|
||||
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
reader = FARMReader(model_name_or_path="deepset/minilm-uncased-squad2", num_processes=0)
|
||||
|
||||
pipeline = Pipeline()
|
||||
@ -112,7 +112,7 @@ def test_debug_attributes_per_node(document_store_with_docs, tmp_path):
|
||||
@pytest.mark.parametrize("document_store_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_global_debug_attributes_override_node_ones(document_store_with_docs, tmp_path):
|
||||
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
|
||||
es_retriever = BM25Retriever(document_store=document_store_with_docs)
|
||||
reader = FARMReader(model_name_or_path="deepset/minilm-uncased-squad2", num_processes=0)
|
||||
|
||||
pipeline = Pipeline()
|
||||
|
||||
@ -14,7 +14,7 @@ from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
from haystack.document_stores.faiss import FAISSDocumentStore
|
||||
from haystack.document_stores import MilvusDocumentStore
|
||||
from haystack.nodes.retriever.dense import DensePassageRetriever, EmbeddingRetriever, TableTextRetriever
|
||||
from haystack.nodes.retriever.sparse import ElasticsearchRetriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever
|
||||
from haystack.nodes.retriever.sparse import BM25Retriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever
|
||||
from transformers import DPRContextEncoderTokenizerFast, DPRQuestionEncoderTokenizerFast
|
||||
|
||||
from .conftest import SAMPLES_PATH
|
||||
@ -70,7 +70,7 @@ def docs():
|
||||
indirect=True,
|
||||
)
|
||||
def test_retrieval(retriever_with_docs, document_store_with_docs):
|
||||
if not isinstance(retriever_with_docs, (ElasticsearchRetriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever)):
|
||||
if not isinstance(retriever_with_docs, (BM25Retriever, ElasticsearchFilterOnlyRetriever, TfidfRetriever)):
|
||||
document_store_with_docs.update_embeddings(retriever_with_docs)
|
||||
|
||||
# test without filters
|
||||
@ -121,7 +121,7 @@ def test_elasticsearch_custom_query():
|
||||
document_store.write_documents(documents)
|
||||
|
||||
# test custom "terms" query
|
||||
retriever = ElasticsearchRetriever(
|
||||
retriever = BM25Retriever(
|
||||
document_store=document_store,
|
||||
custom_query="""
|
||||
{
|
||||
@ -136,7 +136,7 @@ def test_elasticsearch_custom_query():
|
||||
assert len(results) == 4
|
||||
|
||||
# test custom "term" query
|
||||
retriever = ElasticsearchRetriever(
|
||||
retriever = BM25Retriever(
|
||||
document_store=document_store,
|
||||
custom_query="""
|
||||
{
|
||||
@ -399,7 +399,7 @@ def test_elasticsearch_highlight():
|
||||
document_store.write_documents(documents)
|
||||
|
||||
# Enabled highlighting on "title"&"content" field only using custom query
|
||||
retriever_1 = ElasticsearchRetriever(
|
||||
retriever_1 = BM25Retriever(
|
||||
document_store=document_store,
|
||||
custom_query="""{
|
||||
"size": 20,
|
||||
@ -441,7 +441,7 @@ def test_elasticsearch_highlight():
|
||||
assert results[0].meta["highlighted"]["content"] == ["The **green**", "**tea** plant", "range of **healthy**"]
|
||||
|
||||
# Enabled highlighting on "title" field only using custom query
|
||||
retriever_2 = ElasticsearchRetriever(
|
||||
retriever_2 = BM25Retriever(
|
||||
document_store=document_store,
|
||||
custom_query="""{
|
||||
"size": 20,
|
||||
|
||||
@ -9,7 +9,7 @@ from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
from haystack.pipelines import Pipeline, FAQPipeline, DocumentSearchPipeline, RootNode, MostSimilarDocumentsPipeline
|
||||
from haystack.nodes import (
|
||||
DensePassageRetriever,
|
||||
ElasticsearchRetriever,
|
||||
BM25Retriever,
|
||||
SklearnQueryClassifier,
|
||||
TransformersQueryClassifier,
|
||||
JoinDocuments,
|
||||
@ -111,7 +111,7 @@ def test_most_similar_documents_pipeline(retriever, document_store):
|
||||
@pytest.mark.elasticsearch
|
||||
@pytest.mark.parametrize("document_store_dot_product_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_join_merge_no_weights(document_store_dot_product_with_docs):
|
||||
es = ElasticsearchRetriever(document_store=document_store_dot_product_with_docs)
|
||||
es = BM25Retriever(document_store=document_store_dot_product_with_docs)
|
||||
dpr = DensePassageRetriever(
|
||||
document_store=document_store_dot_product_with_docs,
|
||||
query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
|
||||
@ -134,7 +134,7 @@ def test_join_merge_no_weights(document_store_dot_product_with_docs):
|
||||
@pytest.mark.elasticsearch
|
||||
@pytest.mark.parametrize("document_store_dot_product_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_join_merge_with_weights(document_store_dot_product_with_docs):
|
||||
es = ElasticsearchRetriever(document_store=document_store_dot_product_with_docs)
|
||||
es = BM25Retriever(document_store=document_store_dot_product_with_docs)
|
||||
dpr = DensePassageRetriever(
|
||||
document_store=document_store_dot_product_with_docs,
|
||||
query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
|
||||
@ -158,7 +158,7 @@ def test_join_merge_with_weights(document_store_dot_product_with_docs):
|
||||
@pytest.mark.elasticsearch
|
||||
@pytest.mark.parametrize("document_store_dot_product_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_join_concatenate(document_store_dot_product_with_docs):
|
||||
es = ElasticsearchRetriever(document_store=document_store_dot_product_with_docs)
|
||||
es = BM25Retriever(document_store=document_store_dot_product_with_docs)
|
||||
dpr = DensePassageRetriever(
|
||||
document_store=document_store_dot_product_with_docs,
|
||||
query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
|
||||
@ -181,7 +181,7 @@ def test_join_concatenate(document_store_dot_product_with_docs):
|
||||
@pytest.mark.elasticsearch
|
||||
@pytest.mark.parametrize("document_store_dot_product_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_join_concatenate_with_topk(document_store_dot_product_with_docs):
|
||||
es = ElasticsearchRetriever(document_store=document_store_dot_product_with_docs)
|
||||
es = BM25Retriever(document_store=document_store_dot_product_with_docs)
|
||||
dpr = DensePassageRetriever(
|
||||
document_store=document_store_dot_product_with_docs,
|
||||
query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
|
||||
@ -207,7 +207,7 @@ def test_join_concatenate_with_topk(document_store_dot_product_with_docs):
|
||||
@pytest.mark.parametrize("document_store_dot_product_with_docs", ["elasticsearch"], indirect=True)
|
||||
@pytest.mark.parametrize("reader", ["farm"], indirect=True)
|
||||
def test_join_with_reader(document_store_dot_product_with_docs, reader):
|
||||
es = ElasticsearchRetriever(document_store=document_store_dot_product_with_docs)
|
||||
es = BM25Retriever(document_store=document_store_dot_product_with_docs)
|
||||
dpr = DensePassageRetriever(
|
||||
document_store=document_store_dot_product_with_docs,
|
||||
query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
|
||||
@ -232,7 +232,7 @@ def test_join_with_reader(document_store_dot_product_with_docs, reader):
|
||||
@pytest.mark.elasticsearch
|
||||
@pytest.mark.parametrize("document_store_dot_product_with_docs", ["elasticsearch"], indirect=True)
|
||||
def test_join_with_rrf(document_store_dot_product_with_docs):
|
||||
es = ElasticsearchRetriever(document_store=document_store_dot_product_with_docs)
|
||||
es = BM25Retriever(document_store=document_store_dot_product_with_docs)
|
||||
dpr = DensePassageRetriever(
|
||||
document_store=document_store_dot_product_with_docs,
|
||||
query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
|
||||
|
||||
@ -191,7 +191,7 @@
|
||||
},
|
||||
"source": [
|
||||
"Here we initialize the core components that we will be gluing together using the `Pipeline` class.\n",
|
||||
"We have a `DocumentStore`, an `ElasticsearchRetriever` and a `FARMReader`.\n",
|
||||
"We have a `DocumentStore`, an `BM25Retriever` and a `FARMReader`.\n",
|
||||
"These can be combined to create a classic Retriever-Reader pipeline that is designed\n",
|
||||
"to perform Open Domain Question Answering."
|
||||
]
|
||||
@ -210,7 +210,7 @@
|
||||
"from haystack import Pipeline\n",
|
||||
"from haystack.utils import launch_es\n",
|
||||
"from haystack.document_stores import ElasticsearchDocumentStore\n",
|
||||
"from haystack.nodes import ElasticsearchRetriever, EmbeddingRetriever, FARMReader\n",
|
||||
"from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Initialize DocumentStore and index documents\n",
|
||||
@ -220,7 +220,7 @@
|
||||
"document_store.write_documents(got_docs)\n",
|
||||
"\n",
|
||||
"# Initialize Sparse retriever\n",
|
||||
"es_retriever = ElasticsearchRetriever(document_store=document_store)\n",
|
||||
"es_retriever = BM25Retriever(document_store=document_store)\n",
|
||||
"\n",
|
||||
"# Initialize dense retriever\n",
|
||||
"embedding_retriever = EmbeddingRetriever(\n",
|
||||
@ -434,7 +434,7 @@
|
||||
"source": [
|
||||
"Pipelines offer a very simple way to ensemble together different components.\n",
|
||||
"In this example, we are going to combine the power of an `EmbeddingRetriever`\n",
|
||||
"with the keyword based `ElasticsearchRetriever`.\n",
|
||||
"with the keyword based `BM25Retriever`.\n",
|
||||
"See our [documentation](https://haystack.deepset.ai/docs/latest/retrievermd) to understand why\n",
|
||||
"we might want to combine a dense and sparse retriever.\n",
|
||||
"\n",
|
||||
@ -683,7 +683,7 @@
|
||||
" no_ans_boost: -10\n",
|
||||
" model_name_or_path: deepset/roberta-base-squad2\n",
|
||||
"- name: MyESRetriever\n",
|
||||
" type: ElasticsearchRetriever\n",
|
||||
" type: BM25Retriever\n",
|
||||
" params:\n",
|
||||
" document_store: MyDocumentStore # params can reference other components defined in the YAML\n",
|
||||
" custom_query: null\n",
|
||||
|
||||
@ -9,14 +9,7 @@ from haystack.utils import (
|
||||
from pprint import pprint
|
||||
from haystack import Pipeline
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.nodes import (
|
||||
ElasticsearchRetriever,
|
||||
EmbeddingRetriever,
|
||||
FARMReader,
|
||||
RAGenerator,
|
||||
BaseComponent,
|
||||
JoinDocuments,
|
||||
)
|
||||
from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader, RAGenerator, BaseComponent, JoinDocuments
|
||||
from haystack.pipelines import ExtractiveQAPipeline, DocumentSearchPipeline, GenerativeQAPipeline
|
||||
|
||||
|
||||
@ -36,7 +29,7 @@ def tutorial11_pipelines():
|
||||
document_store.write_documents(got_docs)
|
||||
|
||||
# Initialize Sparse retriever
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
es_retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Initialize dense retriever
|
||||
embedding_retriever = EmbeddingRetriever(
|
||||
|
||||
@ -69,7 +69,7 @@
|
||||
"\n",
|
||||
"from pprint import pprint\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"from haystack.nodes import QuestionGenerator, ElasticsearchRetriever, FARMReader\n",
|
||||
"from haystack.nodes import QuestionGenerator, BM25Retriever, FARMReader\n",
|
||||
"from haystack.document_stores import ElasticsearchDocumentStore\n",
|
||||
"from haystack.pipelines import (\n",
|
||||
" QuestionGenerationPipeline,\n",
|
||||
@ -228,7 +228,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = ElasticsearchRetriever(document_store=document_store)\n",
|
||||
"retriever = BM25Retriever(document_store=document_store)\n",
|
||||
"rqg_pipeline = RetrieverQuestionGenerationPipeline(retriever, question_generator)\n",
|
||||
"\n",
|
||||
"print(f\"\\n * Generating questions for documents matching the query 'Arya Stark'\\n\")\n",
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
from tqdm import tqdm
|
||||
from haystack.nodes import QuestionGenerator, ElasticsearchRetriever, FARMReader, TransformersTranslator
|
||||
from haystack.nodes import QuestionGenerator, BM25Retriever, FARMReader, TransformersTranslator
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.pipelines import (
|
||||
QuestionGenerationPipeline,
|
||||
@ -56,7 +56,7 @@ def tutorial13_question_generation():
|
||||
print("\RetrieverQuestionGenerationPipeline")
|
||||
print("==================================")
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
rqg_pipeline = RetrieverQuestionGenerationPipeline(retriever, question_generator)
|
||||
|
||||
print(f"\n * Generating questions for documents matching the query 'Arya Stark'\n")
|
||||
|
||||
@ -382,7 +382,7 @@
|
||||
"from haystack.pipelines import Pipeline\n",
|
||||
"from haystack.document_stores import ElasticsearchDocumentStore\n",
|
||||
"from haystack.nodes import (\n",
|
||||
" ElasticsearchRetriever,\n",
|
||||
" BM25Retriever,\n",
|
||||
" EmbeddingRetriever,\n",
|
||||
" FARMReader,\n",
|
||||
" TransformersQueryClassifier,\n",
|
||||
@ -404,7 +404,7 @@
|
||||
"document_store.write_documents(got_docs)\n",
|
||||
"\n",
|
||||
"# Initialize Sparse retriever\n",
|
||||
"es_retriever = ElasticsearchRetriever(document_store=document_store)\n",
|
||||
"es_retriever = BM25Retriever(document_store=document_store)\n",
|
||||
"\n",
|
||||
"# Initialize dense retriever\n",
|
||||
"embedding_retriever = EmbeddingRetriever(\n",
|
||||
|
||||
@ -9,7 +9,7 @@ from haystack.utils import (
|
||||
from haystack.pipelines import Pipeline
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.nodes import (
|
||||
ElasticsearchRetriever,
|
||||
BM25Retriever,
|
||||
EmbeddingRetriever,
|
||||
FARMReader,
|
||||
TransformersQueryClassifier,
|
||||
@ -34,7 +34,7 @@ def tutorial14_query_classifier():
|
||||
document_store.write_documents(got_docs)
|
||||
|
||||
# Initialize Sparse retriever
|
||||
es_retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
es_retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Initialize dense retriever
|
||||
embedding_retriever = EmbeddingRetriever(
|
||||
|
||||
@ -9,7 +9,7 @@
|
||||
"# Open-Domain QA on Tables\n",
|
||||
"[](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial15_TableQA.ipynb)\n",
|
||||
"\n",
|
||||
"This tutorial shows you how to perform question-answering on tables using the `TableTextRetriever` or `ElasticsearchRetriever` as retriever node and the `TableReader` as reader node."
|
||||
"This tutorial shows you how to perform question-answering on tables using the `TableTextRetriever` or `BM25Retriever` as retriever node and the `TableReader` as reader node."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -243,7 +243,7 @@
|
||||
"\n",
|
||||
"**Alternatives:**\n",
|
||||
"\n",
|
||||
"- `ElasticsearchRetriever` that uses BM25 algorithm\n"
|
||||
"- `BM25Retriever` that uses BM25 algorithm\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -284,9 +284,9 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## Alternative: ElasticsearchRetriever\n",
|
||||
"# from haystack.nodes.retriever import ElasticsearchRetriever\n",
|
||||
"# retriever = ElasticsearchRetriever(document_store=document_store)"
|
||||
"## Alternative: BM25Retriever\n",
|
||||
"# from haystack.nodes.retriever import BM25Retriever\n",
|
||||
"# retriever = BM25Retriever(document_store=document_store)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@ -67,9 +67,9 @@ def tutorial15_tableqa():
|
||||
# Add table embeddings to the tables in DocumentStore
|
||||
document_store.update_embeddings(retriever=retriever)
|
||||
|
||||
## Alternative: ElasticsearchRetriever
|
||||
# from haystack.nodes.retriever import ElasticsearchRetriever
|
||||
# retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
## Alternative: BM25Retriever
|
||||
# from haystack.nodes.retriever import BM25Retriever
|
||||
# retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Try the Retriever
|
||||
from haystack.utils import print_documents
|
||||
|
||||
@ -68,7 +68,7 @@
|
||||
"source": [
|
||||
"# Here are the imports we need\n",
|
||||
"from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore\n",
|
||||
"from haystack.nodes import PreProcessor, TransformersDocumentClassifier, FARMReader, ElasticsearchRetriever\n",
|
||||
"from haystack.nodes import PreProcessor, TransformersDocumentClassifier, FARMReader, BM25Retriever\n",
|
||||
"from haystack.schema import Document\n",
|
||||
"from haystack.utils import convert_files_to_docs, fetch_archive_from_http, print_answers"
|
||||
]
|
||||
@ -335,7 +335,7 @@
|
||||
"# Initialize QA-Pipeline\n",
|
||||
"from haystack.pipelines import ExtractiveQAPipeline\n",
|
||||
"\n",
|
||||
"retriever = ElasticsearchRetriever(document_store=document_store)\n",
|
||||
"retriever = BM25Retriever(document_store=document_store)\n",
|
||||
"reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)\n",
|
||||
"pipe = ExtractiveQAPipeline(reader, retriever)"
|
||||
]
|
||||
|
||||
@ -19,7 +19,7 @@
|
||||
|
||||
# Here are the imports we need
|
||||
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
from haystack.nodes import PreProcessor, TransformersDocumentClassifier, FARMReader, ElasticsearchRetriever
|
||||
from haystack.nodes import PreProcessor, TransformersDocumentClassifier, FARMReader, BM25Retriever
|
||||
from haystack.schema import Document
|
||||
from haystack.utils import convert_files_to_docs, fetch_archive_from_http, print_answers, launch_es
|
||||
|
||||
@ -98,7 +98,7 @@ def tutorial16_document_classifier_at_index_time():
|
||||
# Initialize QA-Pipeline
|
||||
from haystack.pipelines import ExtractiveQAPipeline
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
|
||||
pipe = ExtractiveQAPipeline(reader, retriever)
|
||||
|
||||
|
||||
@ -213,7 +213,7 @@
|
||||
"\n",
|
||||
"**Alternatives:**\n",
|
||||
"\n",
|
||||
"- Customize the `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters\n",
|
||||
"- Customize the `BM25Retriever`with custom queries (e.g. boosting) and filters\n",
|
||||
"- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging\n",
|
||||
"- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)\n",
|
||||
"- Use `DensePassageRetriever` to use different embedding models for passage and query (see Tutorial 6)"
|
||||
@ -225,9 +225,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from haystack.nodes import ElasticsearchRetriever\n",
|
||||
"from haystack.nodes import BM25Retriever\n",
|
||||
"\n",
|
||||
"retriever = ElasticsearchRetriever(document_store=document_store)"
|
||||
"retriever = BM25Retriever(document_store=document_store)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@ -11,8 +11,8 @@
|
||||
|
||||
import logging
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers, launch_es
|
||||
from haystack.nodes import FARMReader, TransformersReader, ElasticsearchRetriever
|
||||
from haystack.utils import clean_wiki_text, convert_files_to_dicts, fetch_archive_from_http, print_answers, launch_es
|
||||
from haystack.nodes import FARMReader, TransformersReader, BM25Retriever
|
||||
|
||||
|
||||
def tutorial1_basic_qa_pipeline():
|
||||
@ -75,12 +75,12 @@ def tutorial1_basic_qa_pipeline():
|
||||
# They use some simple but fast algorithm.
|
||||
# **Here:** We use Elasticsearch's default BM25 algorithm
|
||||
# **Alternatives:**
|
||||
# - Customize the `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters
|
||||
# - Customize the `BM25Retriever`with custom queries (e.g. boosting) and filters
|
||||
# - Use `EmbeddingRetriever` to find candidate documents based on the similarity of
|
||||
# embeddings (e.g. created via Sentence-BERT)
|
||||
# - Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Alternative: An in-memory TfidfRetriever based on Pandas dataframes for building quick-prototypes
|
||||
# with SQLite document store.
|
||||
|
||||
@ -248,9 +248,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize Retriever\n",
|
||||
"from haystack.nodes import ElasticsearchRetriever\n",
|
||||
"from haystack.nodes import BM25Retriever\n",
|
||||
"\n",
|
||||
"retriever = ElasticsearchRetriever(document_store=document_store)\n",
|
||||
"retriever = BM25Retriever(document_store=document_store)\n",
|
||||
"\n",
|
||||
"# Alternative: Evaluate dense retrievers (EmbeddingRetriever or DensePassageRetriever)\n",
|
||||
"# The EmbeddingRetriever uses a single transformer based encoder model for query and document.\n",
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
from haystack.document_stores import ElasticsearchDocumentStore
|
||||
from haystack.nodes import ElasticsearchRetriever, DensePassageRetriever, EmbeddingRetriever, FARMReader, PreProcessor
|
||||
from haystack.nodes import BM25Retriever, DensePassageRetriever, EmbeddingRetriever, FARMReader, PreProcessor
|
||||
from haystack.utils import fetch_archive_from_http, launch_es
|
||||
from haystack.pipelines import ExtractiveQAPipeline, DocumentSearchPipeline
|
||||
from haystack.schema import Answer, Document, EvaluationResult, Label, MultiLabel, Span
|
||||
@ -62,9 +62,9 @@ def tutorial5_evaluation():
|
||||
)
|
||||
|
||||
# Initialize Retriever
|
||||
from haystack.nodes import ElasticsearchRetriever
|
||||
from haystack.nodes import BM25Retriever
|
||||
|
||||
retriever = ElasticsearchRetriever(document_store=document_store)
|
||||
retriever = BM25Retriever(document_store=document_store)
|
||||
|
||||
# Alternative: Evaluate dense retrievers (EmbeddingRetriever or DensePassageRetriever)
|
||||
# The EmbeddingRetriever uses a single transformer based encoder model for query and document.
|
||||
|
||||
@ -270,7 +270,7 @@
|
||||
"\n",
|
||||
"**Alternatives:**\n",
|
||||
"\n",
|
||||
"- The `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters\n",
|
||||
"- The `BM25Retriever`with custom queries (e.g. boosting) and filters\n",
|
||||
"- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)\n",
|
||||
"- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging"
|
||||
]
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user