mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-10-07 05:56:45 +00:00
cleaning the api docs (#616)
This commit is contained in:
parent
e192387e65
commit
3dee284f20
@ -1,269 +1,8 @@
|
||||
<a name="memory"></a>
|
||||
# memory
|
||||
|
||||
<a name="memory.InMemoryDocumentStore"></a>
|
||||
## InMemoryDocumentStore
|
||||
|
||||
```python
|
||||
class InMemoryDocumentStore(BaseDocumentStore)
|
||||
```
|
||||
|
||||
In-memory document store
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.write_documents"></a>
|
||||
#### write\_documents
|
||||
|
||||
```python
|
||||
| write_documents(documents: Union[List[dict], List[Document]], index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Indexes documents for later queries.
|
||||
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `documents`: a list of Python dictionaries or a list of Haystack Document objects.
|
||||
For documents as dictionaries, the format is {"text": "<the-actual-text>"}.
|
||||
Optionally: Include meta data via {"text": "<the-actual-text>",
|
||||
"meta": {"name": "<some-document-name>, "author": "somebody", ...}}
|
||||
It can be used for filtering and is accessible in the responses of the Finder.
|
||||
- `index`: write documents to a custom namespace. For instance, documents for evaluation can be indexed in a
|
||||
separate index than the documents for search.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.update_embeddings"></a>
|
||||
#### update\_embeddings
|
||||
|
||||
```python
|
||||
| update_embeddings(retriever: BaseRetriever, index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Updates the embeddings in the the document store using the encoding model specified in the retriever.
|
||||
This can be useful if want to add or change the embeddings for your documents (e.g. after changing the retriever config).
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `retriever`: Retriever
|
||||
- `index`: Index name to update
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.add_eval_data"></a>
|
||||
#### add\_eval\_data
|
||||
|
||||
```python
|
||||
| add_eval_data(filename: str, doc_index: Optional[str] = None, label_index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Adds a SQuAD-formatted file to the DocumentStore in order to be able to perform evaluation on it.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `filename`: Name of the file containing evaluation data
|
||||
:type filename: str
|
||||
- `doc_index`: Elasticsearch index where evaluation documents should be stored
|
||||
:type doc_index: str
|
||||
- `label_index`: Elasticsearch index where labeled questions should be stored
|
||||
:type label_index: str
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.delete_all_documents"></a>
|
||||
#### delete\_all\_documents
|
||||
|
||||
```python
|
||||
| delete_all_documents(index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Delete all documents in a index.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `index`: index name
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss"></a>
|
||||
# faiss
|
||||
|
||||
<a name="faiss.FAISSDocumentStore"></a>
|
||||
## FAISSDocumentStore
|
||||
|
||||
```python
|
||||
class FAISSDocumentStore(SQLDocumentStore)
|
||||
```
|
||||
|
||||
Document store for very large scale embedding based dense retrievers like the DPR.
|
||||
|
||||
It implements the FAISS library(https://github.com/facebookresearch/faiss)
|
||||
to perform similarity search on vectors.
|
||||
|
||||
The document text and meta-data (for filtering) are stored using the SQLDocumentStore, while
|
||||
the vector embeddings are indexed in a FAISS Index.
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(sql_url: str = "sqlite:///", index_buffer_size: int = 10_000, vector_dim: int = 768, faiss_index_factory_str: str = "Flat", faiss_index: Optional[faiss.swigfaiss.Index] = None, return_embedding: Optional[bool] = True, **kwargs, ,)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `sql_url`: SQL connection URL for database. It defaults to local file based SQLite DB. For large scale
|
||||
deployment, Postgres is recommended.
|
||||
- `index_buffer_size`: When working with large datasets, the ingestion process(FAISS + SQL) can be buffered in
|
||||
smaller chunks to reduce memory footprint.
|
||||
- `vector_dim`: the embedding vector size.
|
||||
- `faiss_index_factory_str`: Create a new FAISS index of the specified type.
|
||||
The type is determined from the given string following the conventions
|
||||
of the original FAISS index factory.
|
||||
Recommended options:
|
||||
- "Flat" (default): Best accuracy (= exact). Becomes slow and RAM intense for > 1 Mio docs.
|
||||
- "HNSW": Graph-based heuristic. If not further specified,
|
||||
we use a RAM intense, but more accurate config:
|
||||
HNSW256, efConstruction=256 and efSearch=256
|
||||
- "IVFx,Flat": Inverted Index. Replace x with the number of centroids aka nlist.
|
||||
Rule of thumb: nlist = 10 * sqrt (num_docs) is a good starting point.
|
||||
For more details see:
|
||||
- Overview of indices https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
|
||||
- Guideline for choosing an index https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
|
||||
- FAISS Index factory https://github.com/facebookresearch/faiss/wiki/The-index-factory
|
||||
Benchmarks: XXX
|
||||
- `faiss_index`: Pass an existing FAISS Index, i.e. an empty one that you configured manually
|
||||
or one with docs that you used in Haystack before and want to load again.
|
||||
- `return_embedding`: To return document embedding
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.write_documents"></a>
|
||||
#### write\_documents
|
||||
|
||||
```python
|
||||
| write_documents(documents: Union[List[dict], List[Document]], index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Add new documents to the DocumentStore.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `documents`: List of `Dicts` or List of `Documents`. If they already contain the embeddings, we'll index
|
||||
them right away in FAISS. If not, you can later call update_embeddings() to create & index them.
|
||||
- `index`: (SQL) index name for storing the docs and metadata
|
||||
|
||||
**Returns**:
|
||||
|
||||
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.update_embeddings"></a>
|
||||
#### update\_embeddings
|
||||
|
||||
```python
|
||||
| update_embeddings(retriever: BaseRetriever, index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Updates the embeddings in the the document store using the encoding model specified in the retriever.
|
||||
This can be useful if want to add or change the embeddings for your documents (e.g. after changing the retriever config).
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `retriever`: Retriever to use to get embeddings for text
|
||||
- `index`: (SQL) index name for storing the docs and metadata
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.train_index"></a>
|
||||
#### train\_index
|
||||
|
||||
```python
|
||||
| train_index(documents: Optional[Union[List[dict], List[Document]]], embeddings: Optional[np.array] = None)
|
||||
```
|
||||
|
||||
Some FAISS indices (e.g. IVF) require initial "training" on a sample of vectors before you can add your final vectors.
|
||||
The train vectors should come from the same distribution as your final ones.
|
||||
You can pass either documents (incl. embeddings) or just the plain embeddings that the index shall be trained on.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `documents`: Documents (incl. the embeddings)
|
||||
- `embeddings`: Plain embeddings
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.query_by_embedding"></a>
|
||||
#### query\_by\_embedding
|
||||
|
||||
```python
|
||||
| query_by_embedding(query_emb: np.array, filters: Optional[dict] = None, top_k: int = 10, index: Optional[str] = None, return_embedding: Optional[bool] = None) -> List[Document]
|
||||
```
|
||||
|
||||
Find the document that is most similar to the provided `query_emb` by using a vector similarity metric.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `query_emb`: Embedding of the query (e.g. gathered from DPR)
|
||||
- `filters`: Optional filters to narrow down the search space.
|
||||
Example: {"name": ["some", "more"], "category": ["only_one"]}
|
||||
- `top_k`: How many documents to return
|
||||
- `index`: (SQL) index name for storing the docs and metadata
|
||||
- `return_embedding`: To return document embedding
|
||||
|
||||
**Returns**:
|
||||
|
||||
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.save"></a>
|
||||
#### save
|
||||
|
||||
```python
|
||||
| save(file_path: Union[str, Path])
|
||||
```
|
||||
|
||||
Save FAISS Index to the specified file.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `file_path`: Path to save to.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.load"></a>
|
||||
#### load
|
||||
|
||||
```python
|
||||
| @classmethod
|
||||
| load(cls, faiss_file_path: Union[str, Path], sql_url: str, index_buffer_size: int = 10_000)
|
||||
```
|
||||
|
||||
Load a saved FAISS index from a file and connect to the SQL database.
|
||||
Note: In order to have a correct mapping from FAISS to SQL,
|
||||
make sure to use the same SQL DB that you used when calling `save()`.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `faiss_file_path`: Stored FAISS index file. Can be created via calling `save()`
|
||||
- `sql_url`: Connection string to the SQL database that contains your docs and metadata.
|
||||
- `index_buffer_size`: When working with large datasets, the ingestion process(FAISS + SQL) can be buffered in
|
||||
smaller chunks to reduce memory footprint.
|
||||
|
||||
**Returns**:
|
||||
|
||||
|
||||
|
||||
<a name="elasticsearch"></a>
|
||||
# elasticsearch
|
||||
# Module elasticsearch
|
||||
|
||||
<a name="elasticsearch.ElasticsearchDocumentStore"></a>
|
||||
## ElasticsearchDocumentStore
|
||||
## ElasticsearchDocumentStore Objects
|
||||
|
||||
```python
|
||||
class ElasticsearchDocumentStore(BaseDocumentStore)
|
||||
@ -391,29 +130,139 @@ Adds a SQuAD-formatted file to the DocumentStore in order to be able to perform
|
||||
#### delete\_all\_documents
|
||||
|
||||
```python
|
||||
| delete_all_documents(index: str)
|
||||
| delete_all_documents(index: str, filters: Optional[Dict[str, List[str]]] = None)
|
||||
```
|
||||
|
||||
Delete all documents in an index.
|
||||
Delete documents in an index. All documents are deleted if no filters are passed.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `index`: index name
|
||||
- `index`: Index name to delete the document from.
|
||||
- `filters`: Optional filters to narrow down the documents to be deleted.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="memory"></a>
|
||||
# Module memory
|
||||
|
||||
<a name="memory.InMemoryDocumentStore"></a>
|
||||
## InMemoryDocumentStore Objects
|
||||
|
||||
```python
|
||||
class InMemoryDocumentStore(BaseDocumentStore)
|
||||
```
|
||||
|
||||
In-memory document store
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.write_documents"></a>
|
||||
#### write\_documents
|
||||
|
||||
```python
|
||||
| write_documents(documents: Union[List[dict], List[Document]], index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Indexes documents for later queries.
|
||||
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `documents`: a list of Python dictionaries or a list of Haystack Document objects.
|
||||
For documents as dictionaries, the format is {"text": "<the-actual-text>"}.
|
||||
Optionally: Include meta data via {"text": "<the-actual-text>",
|
||||
"meta": {"name": "<some-document-name>, "author": "somebody", ...}}
|
||||
It can be used for filtering and is accessible in the responses of the Finder.
|
||||
- `index`: write documents to a custom namespace. For instance, documents for evaluation can be indexed in a
|
||||
separate index than the documents for search.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.update_embeddings"></a>
|
||||
#### update\_embeddings
|
||||
|
||||
```python
|
||||
| update_embeddings(retriever: BaseRetriever, index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Updates the embeddings in the the document store using the encoding model specified in the retriever.
|
||||
This can be useful if want to add or change the embeddings for your documents (e.g. after changing the retriever config).
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `retriever`: Retriever
|
||||
- `index`: Index name to update
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.add_eval_data"></a>
|
||||
#### add\_eval\_data
|
||||
|
||||
```python
|
||||
| add_eval_data(filename: str, doc_index: Optional[str] = None, label_index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Adds a SQuAD-formatted file to the DocumentStore in order to be able to perform evaluation on it.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `filename`: Name of the file containing evaluation data
|
||||
:type filename: str
|
||||
- `doc_index`: Elasticsearch index where evaluation documents should be stored
|
||||
:type doc_index: str
|
||||
- `label_index`: Elasticsearch index where labeled questions should be stored
|
||||
:type label_index: str
|
||||
|
||||
<a name="memory.InMemoryDocumentStore.delete_all_documents"></a>
|
||||
#### delete\_all\_documents
|
||||
|
||||
```python
|
||||
| delete_all_documents(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None)
|
||||
```
|
||||
|
||||
Delete documents in an index. All documents are deleted if no filters are passed.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `index`: Index name to delete the document from.
|
||||
- `filters`: Optional filters to narrow down the documents to be deleted.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="sql"></a>
|
||||
# sql
|
||||
# Module sql
|
||||
|
||||
<a name="sql.SQLDocumentStore"></a>
|
||||
## SQLDocumentStore
|
||||
## SQLDocumentStore Objects
|
||||
|
||||
```python
|
||||
class SQLDocumentStore(BaseDocumentStore)
|
||||
```
|
||||
|
||||
<a name="sql.SQLDocumentStore.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(url: str = "sqlite://", index: str = "document", label_index: str = "label", update_existing_documents: bool = False)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `url`: URL for SQL database as expected by SQLAlchemy. More info here: https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls
|
||||
- `index`: The documents are scoped to an index attribute that can be used when writing, querying, or deleting documents.
|
||||
This parameter sets the default value for document index.
|
||||
- `label_index`: The default value of index attribute for the labels.
|
||||
- `update_existing_documents`: Whether to update any existing documents with the same ID when adding
|
||||
documents. When set as True, any document with an existing ID gets updated.
|
||||
If set to False, an error is raised if the document ID of the document being
|
||||
added already exists. Using this parameter coud cause performance degradation for document insertion.
|
||||
|
||||
<a name="sql.SQLDocumentStore.write_documents"></a>
|
||||
#### write\_documents
|
||||
|
||||
@ -473,24 +322,25 @@ Adds a SQuAD-formatted file to the DocumentStore in order to be able to perform
|
||||
#### delete\_all\_documents
|
||||
|
||||
```python
|
||||
| delete_all_documents(index=None)
|
||||
| delete_all_documents(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None)
|
||||
```
|
||||
|
||||
Delete all documents in a index.
|
||||
Delete documents in an index. All documents are deleted if no filters are passed.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `index`: index name
|
||||
- `index`: Index name to delete the document from.
|
||||
- `filters`: Optional filters to narrow down the documents to be deleted.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="base"></a>
|
||||
# base
|
||||
# Module base
|
||||
|
||||
<a name="base.BaseDocumentStore"></a>
|
||||
## BaseDocumentStore
|
||||
## BaseDocumentStore Objects
|
||||
|
||||
```python
|
||||
class BaseDocumentStore(ABC)
|
||||
@ -522,3 +372,179 @@ If None, the DocumentStore's default index (self.index) will be used.
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss"></a>
|
||||
# Module faiss
|
||||
|
||||
<a name="faiss.FAISSDocumentStore"></a>
|
||||
## FAISSDocumentStore Objects
|
||||
|
||||
```python
|
||||
class FAISSDocumentStore(SQLDocumentStore)
|
||||
```
|
||||
|
||||
Document store for very large scale embedding based dense retrievers like the DPR.
|
||||
|
||||
It implements the FAISS library(https://github.com/facebookresearch/faiss)
|
||||
to perform similarity search on vectors.
|
||||
|
||||
The document text and meta-data (for filtering) are stored using the SQLDocumentStore, while
|
||||
the vector embeddings are indexed in a FAISS Index.
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(sql_url: str = "sqlite:///", index_buffer_size: int = 10_000, vector_dim: int = 768, faiss_index_factory_str: str = "Flat", faiss_index: Optional[faiss.swigfaiss.Index] = None, return_embedding: Optional[bool] = True, update_existing_documents: bool = False, index: str = "document", **kwargs, ,)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `sql_url`: SQL connection URL for database. It defaults to local file based SQLite DB. For large scale
|
||||
deployment, Postgres is recommended.
|
||||
- `index_buffer_size`: When working with large datasets, the ingestion process(FAISS + SQL) can be buffered in
|
||||
smaller chunks to reduce memory footprint.
|
||||
- `vector_dim`: the embedding vector size.
|
||||
- `faiss_index_factory_str`: Create a new FAISS index of the specified type.
|
||||
The type is determined from the given string following the conventions
|
||||
of the original FAISS index factory.
|
||||
Recommended options:
|
||||
- "Flat" (default): Best accuracy (= exact). Becomes slow and RAM intense for > 1 Mio docs.
|
||||
- "HNSW": Graph-based heuristic. If not further specified,
|
||||
we use a RAM intense, but more accurate config:
|
||||
HNSW256, efConstruction=256 and efSearch=256
|
||||
- "IVFx,Flat": Inverted Index. Replace x with the number of centroids aka nlist.
|
||||
Rule of thumb: nlist = 10 * sqrt (num_docs) is a good starting point.
|
||||
For more details see:
|
||||
- Overview of indices https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
|
||||
- Guideline for choosing an index https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
|
||||
- FAISS Index factory https://github.com/facebookresearch/faiss/wiki/The-index-factory
|
||||
Benchmarks: XXX
|
||||
- `faiss_index`: Pass an existing FAISS Index, i.e. an empty one that you configured manually
|
||||
or one with docs that you used in Haystack before and want to load again.
|
||||
- `return_embedding`: To return document embedding
|
||||
- `update_existing_documents`: Whether to update any existing documents with the same ID when adding
|
||||
documents. When set as True, any document with an existing ID gets updated.
|
||||
If set to False, an error is raised if the document ID of the document being
|
||||
added already exists.
|
||||
- `index`: Name of index in document store to use.
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.write_documents"></a>
|
||||
#### write\_documents
|
||||
|
||||
```python
|
||||
| write_documents(documents: Union[List[dict], List[Document]], index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Add new documents to the DocumentStore.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `documents`: List of `Dicts` or List of `Documents`. If they already contain the embeddings, we'll index
|
||||
them right away in FAISS. If not, you can later call update_embeddings() to create & index them.
|
||||
- `index`: (SQL) index name for storing the docs and metadata
|
||||
|
||||
**Returns**:
|
||||
|
||||
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.update_embeddings"></a>
|
||||
#### update\_embeddings
|
||||
|
||||
```python
|
||||
| update_embeddings(retriever: BaseRetriever, index: Optional[str] = None)
|
||||
```
|
||||
|
||||
Updates the embeddings in the the document store using the encoding model specified in the retriever.
|
||||
This can be useful if want to add or change the embeddings for your documents (e.g. after changing the retriever config).
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `retriever`: Retriever to use to get embeddings for text
|
||||
- `index`: (SQL) index name for storing the docs and metadata
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.train_index"></a>
|
||||
#### train\_index
|
||||
|
||||
```python
|
||||
| train_index(documents: Optional[Union[List[dict], List[Document]]], embeddings: Optional[np.array] = None)
|
||||
```
|
||||
|
||||
Some FAISS indices (e.g. IVF) require initial "training" on a sample of vectors before you can add your final vectors.
|
||||
The train vectors should come from the same distribution as your final ones.
|
||||
You can pass either documents (incl. embeddings) or just the plain embeddings that the index shall be trained on.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `documents`: Documents (incl. the embeddings)
|
||||
- `embeddings`: Plain embeddings
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.query_by_embedding"></a>
|
||||
#### query\_by\_embedding
|
||||
|
||||
```python
|
||||
| query_by_embedding(query_emb: np.array, filters: Optional[dict] = None, top_k: int = 10, index: Optional[str] = None, return_embedding: Optional[bool] = None) -> List[Document]
|
||||
```
|
||||
|
||||
Find the document that is most similar to the provided `query_emb` by using a vector similarity metric.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `query_emb`: Embedding of the query (e.g. gathered from DPR)
|
||||
- `filters`: Optional filters to narrow down the search space.
|
||||
Example: {"name": ["some", "more"], "category": ["only_one"]}
|
||||
- `top_k`: How many documents to return
|
||||
- `index`: (SQL) index name for storing the docs and metadata
|
||||
- `return_embedding`: To return document embedding
|
||||
|
||||
**Returns**:
|
||||
|
||||
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.save"></a>
|
||||
#### save
|
||||
|
||||
```python
|
||||
| save(file_path: Union[str, Path])
|
||||
```
|
||||
|
||||
Save FAISS Index to the specified file.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `file_path`: Path to save to.
|
||||
|
||||
**Returns**:
|
||||
|
||||
None
|
||||
|
||||
<a name="faiss.FAISSDocumentStore.load"></a>
|
||||
#### load
|
||||
|
||||
```python
|
||||
| @classmethod
|
||||
| load(cls, faiss_file_path: Union[str, Path], sql_url: str, index_buffer_size: int = 10_000)
|
||||
```
|
||||
|
||||
Load a saved FAISS index from a file and connect to the SQL database.
|
||||
Note: In order to have a correct mapping from FAISS to SQL,
|
||||
make sure to use the same SQL DB that you used when calling `save()`.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `faiss_file_path`: Stored FAISS index file. Can be created via calling `save()`
|
||||
- `sql_url`: Connection string to the SQL database that contains your docs and metadata.
|
||||
- `index_buffer_size`: When working with large datasets, the ingestion process(FAISS + SQL) can be buffered in
|
||||
smaller chunks to reduce memory footprint.
|
||||
|
||||
**Returns**:
|
||||
|
||||
|
||||
|
||||
|
@ -1,38 +1,8 @@
|
||||
<a name="pdf"></a>
|
||||
# pdf
|
||||
|
||||
<a name="pdf.PDFToTextConverter"></a>
|
||||
## PDFToTextConverter
|
||||
|
||||
```python
|
||||
class PDFToTextConverter(BaseConverter)
|
||||
```
|
||||
|
||||
<a name="pdf.PDFToTextConverter.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(remove_numeric_tables: Optional[bool] = False, valid_languages: Optional[List[str]] = None)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `remove_numeric_tables`: This option uses heuristics to remove numeric rows from the tables.
|
||||
The tabular structures in documents might be noise for the reader model if it
|
||||
does not have table parsing capability for finding answers. However, tables
|
||||
may also have long strings that could possible candidate for searching answers.
|
||||
The rows containing strings are thus retained in this option.
|
||||
- `valid_languages`: validate languages from a list of languages specified in the ISO 639-1
|
||||
(https://en.wikipedia.org/wiki/ISO_639-1) format.
|
||||
This option can be used to add test for encoding errors. If the extracted text is
|
||||
not one of the valid languages, then it might likely be encoding error resulting
|
||||
in garbled text.
|
||||
|
||||
<a name="txt"></a>
|
||||
# txt
|
||||
# Module txt
|
||||
|
||||
<a name="txt.TextConverter"></a>
|
||||
## TextConverter
|
||||
## TextConverter Objects
|
||||
|
||||
```python
|
||||
class TextConverter(BaseConverter)
|
||||
@ -77,11 +47,36 @@ Reads text from a txt file and executes optional preprocessing steps.
|
||||
|
||||
Dict of format {"text": "The text from file", "meta": meta}}
|
||||
|
||||
<a name="docx"></a>
|
||||
# Module docx
|
||||
|
||||
<a name="docx.DocxToTextConverter"></a>
|
||||
## DocxToTextConverter Objects
|
||||
|
||||
```python
|
||||
class DocxToTextConverter(BaseConverter)
|
||||
```
|
||||
|
||||
<a name="docx.DocxToTextConverter.convert"></a>
|
||||
#### convert
|
||||
|
||||
```python
|
||||
| convert(file_path: Path, meta: Optional[Dict[str, str]] = None) -> Dict[str, Any]
|
||||
```
|
||||
|
||||
Extract text from a .docx file.
|
||||
Note: As docx doesn't contain "page" information, we actually extract and return a list of paragraphs here.
|
||||
For compliance with other converters we nevertheless opted for keeping the methods name.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `file_path`: Path to the .docx file you want to convert
|
||||
|
||||
<a name="tika"></a>
|
||||
# tika
|
||||
# Module tika
|
||||
|
||||
<a name="tika.TikaConverter"></a>
|
||||
## TikaConverter
|
||||
## TikaConverter Objects
|
||||
|
||||
```python
|
||||
class TikaConverter(BaseConverter)
|
||||
@ -123,36 +118,11 @@ in garbled text.
|
||||
|
||||
a list of pages and the extracted meta data of the file.
|
||||
|
||||
<a name="docx"></a>
|
||||
# docx
|
||||
|
||||
<a name="docx.DocxToTextConverter"></a>
|
||||
## DocxToTextConverter
|
||||
|
||||
```python
|
||||
class DocxToTextConverter(BaseConverter)
|
||||
```
|
||||
|
||||
<a name="docx.DocxToTextConverter.convert"></a>
|
||||
#### convert
|
||||
|
||||
```python
|
||||
| convert(file_path: Path, meta: Optional[Dict[str, str]] = None) -> Dict[str, Any]
|
||||
```
|
||||
|
||||
Extract text from a .docx file.
|
||||
Note: As docx doesn't contain "page" information, we actually extract and return a list of paragraphs here.
|
||||
For compliance with other converters we nevertheless opted for keeping the methods name.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `file_path`: Path to the .docx file you want to convert
|
||||
|
||||
<a name="base"></a>
|
||||
# base
|
||||
# Module base
|
||||
|
||||
<a name="base.BaseConverter"></a>
|
||||
## BaseConverter
|
||||
## BaseConverter Objects
|
||||
|
||||
```python
|
||||
class BaseConverter()
|
||||
@ -207,3 +177,33 @@ supplied meta data like author, url, external IDs can be supplied as a dictionar
|
||||
|
||||
Validate if the language of the text is one of valid languages.
|
||||
|
||||
<a name="pdf"></a>
|
||||
# Module pdf
|
||||
|
||||
<a name="pdf.PDFToTextConverter"></a>
|
||||
## PDFToTextConverter Objects
|
||||
|
||||
```python
|
||||
class PDFToTextConverter(BaseConverter)
|
||||
```
|
||||
|
||||
<a name="pdf.PDFToTextConverter.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(remove_numeric_tables: Optional[bool] = False, valid_languages: Optional[List[str]] = None)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `remove_numeric_tables`: This option uses heuristics to remove numeric rows from the tables.
|
||||
The tabular structures in documents might be noise for the reader model if it
|
||||
does not have table parsing capability for finding answers. However, tables
|
||||
may also have long strings that could possible candidate for searching answers.
|
||||
The rows containing strings are thus retained in this option.
|
||||
- `valid_languages`: validate languages from a list of languages specified in the ISO 639-1
|
||||
(https://en.wikipedia.org/wiki/ISO_639-1) format.
|
||||
This option can be used to add test for encoding errors. If the extracted text is
|
||||
not one of the valid languages, then it might likely be encoding error resulting
|
||||
in garbled text.
|
||||
|
||||
|
137
docs/_src/api/api/generator.md
Normal file
137
docs/_src/api/api/generator.md
Normal file
@ -0,0 +1,137 @@
|
||||
<a name="transformers"></a>
|
||||
# Module transformers
|
||||
|
||||
<a name="transformers.RAGenerator"></a>
|
||||
## RAGenerator Objects
|
||||
|
||||
```python
|
||||
class RAGenerator(BaseGenerator)
|
||||
```
|
||||
|
||||
Implementation of Facebook's Retrieval-Augmented Generator (https://arxiv.org/abs/2005.11401) based on
|
||||
HuggingFace's transformers (https://huggingface.co/transformers/model_doc/rag.html).
|
||||
|
||||
Instead of "finding" the answer within a document, these models **generate** the answer.
|
||||
In that sense, RAG follows a similar approach as GPT-3 but it comes with two huge advantages
|
||||
for real-world applications:
|
||||
a) it has a manageable model size
|
||||
b) the answer generation is conditioned on retrieved documents,
|
||||
i.e. the model can easily adjust to domain documents even after training has finished
|
||||
(in contrast: GPT-3 relies on the web data seen during training)
|
||||
|
||||
**Example**
|
||||
|
||||
```python
|
||||
> question = "who got the first nobel prize in physics?"
|
||||
|
||||
# Retrieve related documents from retriever
|
||||
> retrieved_docs = retriever.retrieve(query=question)
|
||||
|
||||
> # Now generate answer from question and retrieved documents
|
||||
> generator.predict(
|
||||
> question=question,
|
||||
> documents=retrieved_docs,
|
||||
> top_k=1
|
||||
> )
|
||||
{'question': 'who got the first nobel prize in physics',
|
||||
'answers':
|
||||
[{'question': 'who got the first nobel prize in physics',
|
||||
'answer': ' albert einstein',
|
||||
'meta': { 'doc_ids': [...],
|
||||
'doc_scores': [80.42758 ...],
|
||||
'doc_probabilities': [40.71379089355469, ...
|
||||
'texts': ['Albert Einstein was a ...]
|
||||
'titles': ['"Albert Einstein"', ...]
|
||||
}}]}
|
||||
```
|
||||
|
||||
<a name="transformers.RAGenerator.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(model_name_or_path: str = "facebook/rag-token-nq", retriever: Optional[DensePassageRetriever] = None, generator_type: RAGeneratorType = RAGeneratorType.TOKEN, top_k_answers: int = 2, max_length: int = 200, min_length: int = 2, num_beams: int = 2, embed_title: bool = True, prefix: Optional[str] = None, use_gpu: bool = True)
|
||||
```
|
||||
|
||||
Load a RAG model from Transformers along with passage_embedding_model.
|
||||
See https://huggingface.co/transformers/model_doc/rag.html for more details
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `model_name_or_path`: Directory of a saved model or the name of a public model e.g.
|
||||
'facebook/rag-token-nq', 'facebook/rag-sequence-nq'.
|
||||
See https://huggingface.co/models for full list of available models.
|
||||
- `retriever`: `DensePassageRetriever` used to embedded passage
|
||||
- `generator_type`: Which RAG generator implementation to use? RAG-TOKEN or RAG-SEQUENCE
|
||||
- `top_k_answers`: Number of independently generated text to return
|
||||
- `max_length`: Maximum length of generated text
|
||||
- `min_length`: Minimum length of generated text
|
||||
- `num_beams`: Number of beams for beam search. 1 means no beam search.
|
||||
- `embed_title`: Embedded the title of passage while generating embedding
|
||||
- `prefix`: The prefix used by the generator's tokenizer.
|
||||
- `use_gpu`: Whether to use GPU (if available)
|
||||
|
||||
<a name="transformers.RAGenerator.predict"></a>
|
||||
#### predict
|
||||
|
||||
```python
|
||||
| predict(question: str, documents: List[Document], top_k: Optional[int] = None) -> Dict
|
||||
```
|
||||
|
||||
Generate the answer to the input question. The generation will be conditioned on the supplied documents.
|
||||
These document can for example be retrieved via the Retriever.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `question`: Question
|
||||
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.
|
||||
- `top_k`: Number of returned answers
|
||||
|
||||
**Returns**:
|
||||
|
||||
Generated answers plus additional infos in a dict like this:
|
||||
|
||||
```python
|
||||
> {'question': 'who got the first nobel prize in physics',
|
||||
> 'answers':
|
||||
> [{'question': 'who got the first nobel prize in physics',
|
||||
> 'answer': ' albert einstein',
|
||||
> 'meta': { 'doc_ids': [...],
|
||||
> 'doc_scores': [80.42758 ...],
|
||||
> 'doc_probabilities': [40.71379089355469, ...
|
||||
> 'texts': ['Albert Einstein was a ...]
|
||||
> 'titles': ['"Albert Einstein"', ...]
|
||||
> }}]}
|
||||
```
|
||||
|
||||
<a name="base"></a>
|
||||
# Module base
|
||||
|
||||
<a name="base.BaseGenerator"></a>
|
||||
## BaseGenerator Objects
|
||||
|
||||
```python
|
||||
class BaseGenerator(ABC)
|
||||
```
|
||||
|
||||
Abstract class for Generators
|
||||
|
||||
<a name="base.BaseGenerator.predict"></a>
|
||||
#### predict
|
||||
|
||||
```python
|
||||
| @abstractmethod
|
||||
| predict(question: str, documents: List[Document], top_k: Optional[int]) -> Dict
|
||||
```
|
||||
|
||||
Abstract method to generate answers.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `question`: Question
|
||||
- `documents`: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.
|
||||
- `top_k`: Number of returned answers
|
||||
|
||||
**Returns**:
|
||||
|
||||
Generated answers plus additional infos in a dict
|
||||
|
@ -1,5 +1,44 @@
|
||||
<a name="preprocessor"></a>
|
||||
# Module preprocessor
|
||||
|
||||
<a name="preprocessor.PreProcessor"></a>
|
||||
## PreProcessor Objects
|
||||
|
||||
```python
|
||||
class PreProcessor(BasePreProcessor)
|
||||
```
|
||||
|
||||
<a name="preprocessor.PreProcessor.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(clean_whitespace: Optional[bool] = True, clean_header_footer: Optional[bool] = False, clean_empty_lines: Optional[bool] = True, split_by: Optional[str] = "word", split_length: Optional[int] = 1000, split_stride: Optional[int] = None, split_respect_sentence_boundary: Optional[bool] = True)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `clean_header_footer`: Use heuristic to remove footers and headers across different pages by searching
|
||||
for the longest common string. This heuristic uses exact matches and therefore
|
||||
works well for footers like "Copyright 2019 by XXX", but won't detect "Page 3 of 4"
|
||||
or similar.
|
||||
- `clean_whitespace`: Strip whitespaces before or after each line in the text.
|
||||
- `clean_empty_lines`: Remove more than two empty lines in the text.
|
||||
- `split_by`: Unit for splitting the document. Can be "word", "sentence", or "passage". Set to None to disable splitting.
|
||||
- `split_length`: Max. number of the above split unit (e.g. words) that are allowed in one document. For instance, if n -> 10 & split_by ->
|
||||
"sentence", then each output document will have 10 sentences.
|
||||
- `split_stride`: Length of striding window over the splits. For example, if split_by -> `word`,
|
||||
split_length -> 5 & split_stride -> 2, then the splits would be like:
|
||||
[w1 w2 w3 w4 w5, w4 w5 w6 w7 w8, w7 w8 w10 w11 w12].
|
||||
Set the value to None to disable striding behaviour.
|
||||
- `split_respect_sentence_boundary`: Whether to split in partial sentences if split_by -> `word`. If set
|
||||
to True, the individual split will always have complete sentences &
|
||||
the number of words will be <= split_length.
|
||||
|
||||
<a name="cleaning"></a>
|
||||
# Module cleaning
|
||||
|
||||
<a name="utils"></a>
|
||||
# utils
|
||||
# Module utils
|
||||
|
||||
<a name="utils.eval_data_from_file"></a>
|
||||
#### eval\_data\_from\_file
|
||||
@ -84,45 +123,6 @@ Fetch an archive (zip or tar.gz) from a url via http and extract content to an o
|
||||
|
||||
bool if anything got fetched
|
||||
|
||||
<a name="preprocessor"></a>
|
||||
# preprocessor
|
||||
|
||||
<a name="preprocessor.PreProcessor"></a>
|
||||
## PreProcessor
|
||||
|
||||
```python
|
||||
class PreProcessor(BasePreProcessor)
|
||||
```
|
||||
|
||||
<a name="preprocessor.PreProcessor.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(clean_whitespace: Optional[bool] = True, clean_header_footer: Optional[bool] = False, clean_empty_lines: Optional[bool] = True, split_by: Optional[str] = "word", split_length: Optional[int] = 1000, split_stride: Optional[int] = None, split_respect_sentence_boundary: Optional[bool] = True)
|
||||
```
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `clean_header_footer`: Use heuristic to remove footers and headers across different pages by searching
|
||||
for the longest common string. This heuristic uses exact matches and therefore
|
||||
works well for footers like "Copyright 2019 by XXX", but won't detect "Page 3 of 4"
|
||||
or similar.
|
||||
- `clean_whitespace`: Strip whitespaces before or after each line in the text.
|
||||
- `clean_empty_lines`: Remove more than two empty lines in the text.
|
||||
- `split_by`: Unit for splitting the document. Can be "word", "sentence", or "passage". Set to None to disable splitting.
|
||||
- `split_length`: Max. number of the above split unit (e.g. words) that are allowed in one document. For instance, if n -> 10 & split_by ->
|
||||
"sentence", then each output document will have 10 sentences.
|
||||
- `split_stride`: Length of striding window over the splits. For example, if split_by -> `word`,
|
||||
split_length -> 5 & split_stride -> 2, then the splits would be like:
|
||||
[w1 w2 w3 w4 w5, w4 w5 w6 w7 w8, w7 w8 w10 w11 w12].
|
||||
Set the value to None to disable striding behaviour.
|
||||
- `split_respect_sentence_boundary`: Whether to split in partial sentences if split_by -> `word`. If set
|
||||
to True, the individual split will always have complete sentences &
|
||||
the number of words will be <= split_length.
|
||||
|
||||
<a name="base"></a>
|
||||
# base
|
||||
|
||||
<a name="cleaning"></a>
|
||||
# cleaning
|
||||
# Module base
|
||||
|
||||
|
@ -10,5 +10,8 @@ processor:
|
||||
- skip_empty_modules: true
|
||||
renderer:
|
||||
type: markdown
|
||||
descriptive_class_title: false
|
||||
descriptive_class_title: true
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: false
|
||||
add_member_class_prefix: false
|
||||
filename: document_store.md
|
||||
|
@ -10,5 +10,8 @@ processor:
|
||||
- skip_empty_modules: true
|
||||
renderer:
|
||||
type: markdown
|
||||
descriptive_class_title: false
|
||||
descriptive_class_title: true
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: false
|
||||
add_member_class_prefix: false
|
||||
filename: file_converter.md
|
||||
|
@ -10,5 +10,8 @@ processor:
|
||||
- skip_empty_modules: true
|
||||
renderer:
|
||||
type: markdown
|
||||
descriptive_class_title: false
|
||||
descriptive_class_title: true
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: false
|
||||
add_member_class_prefix: false
|
||||
filename: generator.md
|
||||
|
@ -10,5 +10,8 @@ processor:
|
||||
- skip_empty_modules: true
|
||||
renderer:
|
||||
type: markdown
|
||||
descriptive_class_title: false
|
||||
descriptive_class_title: true
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: false
|
||||
add_member_class_prefix: false
|
||||
filename: preprocessor.md
|
||||
|
@ -10,5 +10,8 @@ processor:
|
||||
- skip_empty_modules: true
|
||||
renderer:
|
||||
type: markdown
|
||||
descriptive_class_title: false
|
||||
descriptive_class_title: true
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: false
|
||||
add_member_class_prefix: false
|
||||
filename: reader.md
|
||||
|
@ -10,5 +10,8 @@ processor:
|
||||
- skip_empty_modules: true
|
||||
renderer:
|
||||
type: markdown
|
||||
descriptive_class_title: false
|
||||
descriptive_class_title: true
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: false
|
||||
add_member_class_prefix: false
|
||||
filename: retriever.md
|
||||
|
@ -1,8 +1,8 @@
|
||||
<a name="farm"></a>
|
||||
# farm
|
||||
# Module farm
|
||||
|
||||
<a name="farm.FARMReader"></a>
|
||||
## FARMReader
|
||||
## FARMReader Objects
|
||||
|
||||
```python
|
||||
class FARMReader(BaseReader)
|
||||
@ -279,10 +279,10 @@ float32 could still be be more performant.
|
||||
- `opset_version`: ONNX opset version
|
||||
|
||||
<a name="transformers"></a>
|
||||
# transformers
|
||||
# Module transformers
|
||||
|
||||
<a name="transformers.TransformersReader"></a>
|
||||
## TransformersReader
|
||||
## TransformersReader Objects
|
||||
|
||||
```python
|
||||
class TransformersReader(BaseReader)
|
||||
@ -368,5 +368,5 @@ Example:
|
||||
Dict containing question and answers
|
||||
|
||||
<a name="base"></a>
|
||||
# base
|
||||
# Module base
|
||||
|
||||
|
@ -1,8 +1,8 @@
|
||||
<a name="sparse"></a>
|
||||
# sparse
|
||||
# Module sparse
|
||||
|
||||
<a name="sparse.ElasticsearchRetriever"></a>
|
||||
## ElasticsearchRetriever
|
||||
## ElasticsearchRetriever Objects
|
||||
|
||||
```python
|
||||
class ElasticsearchRetriever(BaseRetriever)
|
||||
@ -52,7 +52,7 @@ self.retrieve(query="Why did the revenue increase?",
|
||||
```
|
||||
|
||||
<a name="sparse.ElasticsearchFilterOnlyRetriever"></a>
|
||||
## ElasticsearchFilterOnlyRetriever
|
||||
## ElasticsearchFilterOnlyRetriever Objects
|
||||
|
||||
```python
|
||||
class ElasticsearchFilterOnlyRetriever(ElasticsearchRetriever)
|
||||
@ -62,7 +62,7 @@ Naive "Retriever" that returns all documents that match the given filters. No im
|
||||
Helpful for benchmarking, testing and if you want to do QA on small documents without an "active" retriever.
|
||||
|
||||
<a name="sparse.TfidfRetriever"></a>
|
||||
## TfidfRetriever
|
||||
## TfidfRetriever Objects
|
||||
|
||||
```python
|
||||
class TfidfRetriever(BaseRetriever)
|
||||
@ -76,10 +76,10 @@ computations when text is passed on to a Reader for QA.
|
||||
It uses sklearn's TfidfVectorizer to compute a tf-idf matrix.
|
||||
|
||||
<a name="dense"></a>
|
||||
# dense
|
||||
# Module dense
|
||||
|
||||
<a name="dense.DensePassageRetriever"></a>
|
||||
## DensePassageRetriever
|
||||
## DensePassageRetriever Objects
|
||||
|
||||
```python
|
||||
class DensePassageRetriever(BaseRetriever)
|
||||
@ -201,7 +201,7 @@ train a DensePassageRetrieval model
|
||||
- `passage_encoder_save_dir`: directory inside save_dir where passage_encoder model files are saved
|
||||
|
||||
<a name="dense.EmbeddingRetriever"></a>
|
||||
## EmbeddingRetriever
|
||||
## EmbeddingRetriever Objects
|
||||
|
||||
```python
|
||||
class EmbeddingRetriever(BaseRetriever)
|
||||
@ -286,10 +286,10 @@ Create embeddings for a list of passages. For this Retriever type: The same as c
|
||||
Embeddings, one per input passage
|
||||
|
||||
<a name="base"></a>
|
||||
# base
|
||||
# Module base
|
||||
|
||||
<a name="base.BaseRetriever"></a>
|
||||
## BaseRetriever
|
||||
## BaseRetriever Objects
|
||||
|
||||
```python
|
||||
class BaseRetriever(ABC)
|
||||
@ -330,7 +330,10 @@ position in the ranking of documents the correct document is.
|
||||
- "mrr": Mean of reciprocal rank. Rewards retrievers that give relevant documents a higher rank.
|
||||
Only considers the highest ranked relevant document.
|
||||
- "map": Mean of average precision for each question. Rewards retrievers that give relevant
|
||||
documents a higher rank. Considers all retrieved relevant documents. (only with ``open_domain=False``)
|
||||
documents a higher rank. Considers all retrieved relevant documents. If ``open_domain=True``,
|
||||
average precision is normalized by the number of retrieved relevant documents per query.
|
||||
If ``open_domain=False``, average precision is normalized by the number of all relevant documents
|
||||
per query.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user