update api markdown files and add markdown file for ranker (#1198)

* update api markdown files and add markdown file for ranker * added docstrings for weaviate * new version of pydoc-markdown does not render arguments correctly. We used pydoc-markdown==3.11.0
2025-12-27 06:58:35 +00:00 · 2021-06-15 17:50:08 +02:00 · 2021-06-15 17:50:08 +02:00 · 6cd49105e7
commit 6cd49105e7
parent 215c45eb8a
6 changed files with 480 additions and 7 deletions
--- a/docs/_src/api/api/document_store.md
+++ b/docs/_src/api/api/document_store.md
@ -1466,3 +1466,254 @@ List[np.array]: List of vectors.

 Return the count of embeddings in the document store.

+<a name="weaviate"></a>
+# Module weaviate
+
+<a name="weaviate.WeaviateDocumentStore"></a>
+## WeaviateDocumentStore Objects
+
+```python
+class WeaviateDocumentStore(BaseDocumentStore)
+```
+
+Weaviate is a cloud-native, modular, real-time vector search engine built to scale your machine learning models.
+(See https://www.semi.technology/developers/weaviate/current/index.html#what-is-weaviate)
+
+Some of the key differences in contrast to FAISS & Milvus:
+1. Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up
+2. Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset 
+3. Has less variety of ANN algorithms, as of now only HNSW.  
+
+Weaviate python client is used to connect to the server, more details are here
+https://weaviate-python-client.readthedocs.io/en/docs/weaviate.html
+
+Usage:
+1. Start a Weaviate server (see https://www.semi.technology/developers/weaviate/current/getting-started/installation.html)
+2. Init a WeaviateDocumentStore in Haystack
+
+<a name="weaviate.WeaviateDocumentStore.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(host: Union[str, List[str]] = "http://localhost", port: Union[int, List[int]] = 8080, timeout_config: tuple = (5, 15), username: str = None, password: str = None, index: str = "Document", embedding_dim: int = 768, text_field: str = "text", name_field: str = "name", faq_question_field="question", similarity: str = "dot_product", index_type: str = "hnsw", custom_schema: Optional[dict] = None, return_embedding: bool = False, embedding_field: str = "embedding", progress_bar: bool = True, duplicate_documents: str = 'overwrite', **kwargs, ,)
+```
+
+**Arguments**:
+
+- `host`: Weaviate server connection URL for storing and processing documents and vectors.
+                     For more details, refer "https://www.semi.technology/developers/weaviate/current/getting-started/installation.html"
+- `port`: port of Weaviate instance
+- `timeout_config`: Weaviate Timeout config as a tuple of (retries, time out seconds).
+- `username`: username (standard authentication via http_auth)
+- `password`: password (standard authentication via http_auth)
+- `index`: Index name for document text, embedding and metadata (in Weaviate terminology, this is a "Class" in Weaviate schema).
+- `embedding_dim`: The embedding vector size. Default: 768.
+- `text_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
+                   If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
+- `name_field`: Name of field that contains the title of the the doc
+- `faq_question_field`: Name of field containing the question in case of FAQ-Style QA
+- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default.
+- `index_type`: Index type of any vector object defined in weaviate schema. The vector index type is pluggable.
+                   Currently, HSNW is only supported.
+                   See: https://www.semi.technology/developers/weaviate/current/more-resources/performance.html
+- `custom_schema`: Allows to create custom schema in Weaviate, for more details
+                   See https://www.semi.technology/developers/weaviate/current/data-schema/schema-configuration.html
+- `module_name`: Vectorization module to convert data into vectors. Default is "text2vec-trasnformers"
+                    For more details, See https://www.semi.technology/developers/weaviate/current/modules/
+- `return_embedding`: To return document embedding.
+- `embedding_field`: Name of field containing an embedding vector.
+- `progress_bar`: Whether to show a tqdm progress bar or not.
+                     Can be helpful to disable in production deployments to keep the logs clean.
+- `duplicate_documents`: Handle duplicates document based on parameter options.
+                            Parameter options : ( 'skip','overwrite','fail')
+                            skip: Ignore the duplicates documents
+                            overwrite: Update any existing documents with the same ID when adding documents.
+                            fail: an error is raised if the document ID of the document being added already exists.
+
+<a name="weaviate.WeaviateDocumentStore.get_document_by_id"></a>
+#### get\_document\_by\_id
+
+```python
+ | get_document_by_id(id: str, index: Optional[str] = None) -> Optional[Document]
+```
+
+Fetch a document by specifying its text id string
+
+<a name="weaviate.WeaviateDocumentStore.get_documents_by_id"></a>
+#### get\_documents\_by\_id
+
+```python
+ | get_documents_by_id(ids: List[str], index: Optional[str] = None, batch_size: int = 10_000) -> List[Document]
+```
+
+Fetch documents by specifying a list of text id strings
+
+<a name="weaviate.WeaviateDocumentStore.write_documents"></a>
+#### write\_documents
+
+```python
+ | write_documents(documents: Union[List[dict], List[Document]], index: Optional[str] = None, batch_size: int = 10_000, duplicate_documents: Optional[str] = None)
+```
+
+Add new documents to the DocumentStore.
+
+**Arguments**:
+
+- `documents`: List of `Dicts` or List of `Documents`. Passing an Embedding/Vector is mandatory in case weaviate is not
+                configured with a module. If a module is configured, the embedding is automatically generated by Weaviate.
+- `index`: index name for storing the docs and metadata
+- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
+- `duplicate_documents`: Handle duplicates document based on parameter options.
+                            Parameter options : ( 'skip','overwrite','fail')
+                            skip: Ignore the duplicates documents
+                            overwrite: Update any existing documents with the same ID when adding documents.
+                            fail: an error is raised if the document ID of the document being added already
+                            exists.
+
+**Raises**:
+
+- `DuplicateDocumentError`: Exception trigger on duplicate document
+
+**Returns**:
+
+None
+
+<a name="weaviate.WeaviateDocumentStore.update_document_meta"></a>
+#### update\_document\_meta
+
+```python
+ | update_document_meta(id: str, meta: Dict[str, str])
+```
+
+Update the metadata dictionary of a document by specifying its string id
+
+<a name="weaviate.WeaviateDocumentStore.get_document_count"></a>
+#### get\_document\_count
+
+```python
+ | get_document_count(filters: Optional[Dict[str, List[str]]] = None, index: Optional[str] = None) -> int
+```
+
+Return the number of documents in the document store.
+
+<a name="weaviate.WeaviateDocumentStore.get_all_documents"></a>
+#### get\_all\_documents
+
+```python
+ | get_all_documents(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None, return_embedding: Optional[bool] = None, batch_size: int = 10_000) -> List[Document]
+```
+
+Get documents from the document store.
+
+**Arguments**:
+
+- `index`: Name of the index to get the documents from. If None, the
+              DocumentStore's default index (self.index) will be used.
+- `filters`: Optional filters to narrow down the documents to return.
+                Example: {"name": ["some", "more"], "category": ["only_one"]}
+- `return_embedding`: Whether to return the document embeddings.
+- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
+
+<a name="weaviate.WeaviateDocumentStore.get_all_documents_generator"></a>
+#### get\_all\_documents\_generator
+
+```python
+ | get_all_documents_generator(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None, return_embedding: Optional[bool] = None, batch_size: int = 10_000) -> Generator[Document, None, None]
+```
+
+Get documents from the document store. Under-the-hood, documents are fetched in batches from the
+document store and yielded as individual documents. This method can be used to iteratively process
+a large number of documents without having to load all documents in memory.
+
+**Arguments**:
+
+- `index`: Name of the index to get the documents from. If None, the
+              DocumentStore's default index (self.index) will be used.
+- `filters`: Optional filters to narrow down the documents to return.
+                Example: {"name": ["some", "more"], "category": ["only_one"]}
+- `return_embedding`: Whether to return the document embeddings.
+- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
+
+<a name="weaviate.WeaviateDocumentStore.query"></a>
+#### query
+
+```python
+ | query(query: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None, top_k: int = 10, custom_query: Optional[str] = None, index: Optional[str] = None) -> List[Document]
+```
+
+Scan through documents in DocumentStore and return a small number documents
+that are most relevant to the query as defined by Weaviate semantic search.
+
+**Arguments**:
+
+- `query`: The query
+- `filters`: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that field
+- `top_k`: How many documents to return per query.
+- `custom_query`: Custom query that will executed using query.raw method, for more details refer
+                    https://www.semi.technology/developers/weaviate/current/graphql-references/filters.html
+- `index`: The name of the index in the DocumentStore from which to retrieve documents
+
+<a name="weaviate.WeaviateDocumentStore.query_by_embedding"></a>
+#### query\_by\_embedding
+
+```python
+ | query_by_embedding(query_emb: np.ndarray, filters: Optional[dict] = None, top_k: int = 10, index: Optional[str] = None, return_embedding: Optional[bool] = None) -> List[Document]
+```
+
+Find the document that is most similar to the provided `query_emb` by using a vector similarity metric.
+
+**Arguments**:
+
+- `query_emb`: Embedding of the query (e.g. gathered from DPR)
+- `filters`: Optional filters to narrow down the search space.
+                Example: {"name": ["some", "more"], "category": ["only_one"]}
+- `top_k`: How many documents to return
+- `index`: index name for storing the docs and metadata
+- `return_embedding`: To return document embedding
+
+**Returns**:
+
+
+
+<a name="weaviate.WeaviateDocumentStore.update_embeddings"></a>
+#### update\_embeddings
+
+```python
+ | update_embeddings(retriever, index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None, update_existing_embeddings: bool = True, batch_size: int = 10_000)
+```
+
+Updates the embeddings in the the document store using the encoding model specified in the retriever.
+This can be useful if want to change the embeddings for your documents (e.g. after changing the retriever config).
+
+**Arguments**:
+
+- `retriever`: Retriever to use to update the embeddings.
+- `index`: Index name to update
+- `update_existing_embeddings`: Weaviate mandates an embedding while creating the document itself.
+This option must be always true for weaviate and it will update the embeddings for all the documents.
+- `filters`: Optional filters to narrow down the documents for which embeddings are to be updated.
+                Example: {"name": ["some", "more"], "category": ["only_one"]}
+- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
+
+**Returns**:
+
+None
+
+<a name="weaviate.WeaviateDocumentStore.delete_all_documents"></a>
+#### delete\_all\_documents
+
+```python
+ | delete_all_documents(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None)
+```
+
+Delete documents in an index. All documents are deleted if no filters are passed.
+
+**Arguments**:
+
+- `index`: Index name to delete the document from.
+- `filters`: Optional filters to narrow down the documents to be deleted.
+
+**Returns**:
+
+None
+
--- a/docs/_src/api/api/generate_docstrings.sh
+++ b/docs/_src/api/api/generate_docstrings.sh
@ -15,4 +15,4 @@ pydoc-markdown pydoc-markdown-pipelines.yml
 pydoc-markdown pydoc-markdown-knowledge-graph.yml
 pydoc-markdown pydoc-markdown-graph-retriever.yml
 pydoc-markdown pydoc-markdown-evaluation.yml
-
+pydoc-markdown pydoc-markdown-ranker.yml
--- a/docs/_src/api/api/pydoc-markdown-document-store.yml
+++ b/docs/_src/api/api/pydoc-markdown-document-store.yml
@ -1,7 +1,7 @@
 loaders:
  - type: python
    search_path: [../../../../haystack/document_store]
-    modules: ['base', 'elasticsearch', 'memory', 'sql', 'faiss', 'milvus']
+    modules: ['base', 'elasticsearch', 'memory', 'sql', 'faiss', 'milvus', 'weaviate']
    ignore_when_discovered: ['__init__']
 processor:
  - type: filter
--- a/docs/_src/api/api/pydoc-markdown-ranker.yml
+++ b/docs/_src/api/api/pydoc-markdown-ranker.yml
@ -0,0 +1,19 @@
+loaders:
+  - type: python
+    search_path: [../../../../haystack/ranker]
+    modules: ['base', 'farm']
+    ignore_when_discovered: ['__init__']
+processor:
+  - type: filter
+    expression: not name.startswith('_') and default()
+  - documented_only: true
+  - do_not_filter_modules: false
+  - skip_empty_modules: true
+renderer:
+  type: markdown
+  descriptive_class_title: true
+  descriptive_module_title: true
+  add_method_class_prefix: false
+  add_member_class_prefix: false
+  filename: ranker.md
+
--- a/docs/_src/api/api/ranker.md
+++ b/docs/_src/api/api/ranker.md
@ -0,0 +1,206 @@
+<a name="base"></a>
+# Module base
+
+<a name="base.BaseRanker"></a>
+## BaseRanker Objects
+
+```python
+class BaseRanker(BaseComponent)
+```
+
+<a name="base.BaseRanker.timing"></a>
+#### timing
+
+```python
+ | timing(fn, attr_name)
+```
+
+Wrapper method used to time functions.
+
+<a name="base.BaseRanker.eval"></a>
+#### eval
+
+```python
+ | eval(label_index: str = "label", doc_index: str = "eval_document", label_origin: str = "gold_label", top_k: int = 10, open_domain: bool = False, return_preds: bool = False) -> dict
+```
+
+Performs evaluation of the Ranker.
+Ranker is evaluated in the same way as a Retriever based on whether it finds the correct document given the query string and at which
+position in the ranking of documents the correct document is.
+
+|  Returns a dict containing the following metrics:
+
+    - "recall": Proportion of questions for which correct document is among retrieved documents
+    - "mrr": Mean of reciprocal rank. Rewards retrievers that give relevant documents a higher rank.
+      Only considers the highest ranked relevant document.
+    - "map": Mean of average precision for each question. Rewards retrievers that give relevant
+      documents a higher rank. Considers all retrieved relevant documents. If ``open_domain=True``,
+      average precision is normalized by the number of retrieved relevant documents per query.
+      If ``open_domain=False``, average precision is normalized by the number of all relevant documents
+      per query.
+
+**Arguments**:
+
+- `label_index`: Index/Table in DocumentStore where labeled questions are stored
+- `doc_index`: Index/Table in DocumentStore where documents that are used for evaluation are stored
+- `top_k`: How many documents to return per query
+- `open_domain`: If ``True``, retrieval will be evaluated by checking if the answer string to a question is
+                    contained in the retrieved docs (common approach in open-domain QA).
+                    If ``False``, retrieval uses a stricter evaluation that checks if the retrieved document ids
+                    are within ids explicitly stated in the labels.
+- `return_preds`: Whether to add predictions in the returned dictionary. If True, the returned dictionary
+                     contains the keys "predictions" and "metrics".
+
+<a name="farm"></a>
+# Module farm
+
+<a name="farm.FARMRanker"></a>
+## FARMRanker Objects
+
+```python
+class FARMRanker(BaseRanker)
+```
+
+Transformer based model for Document Re-ranking using the TextPairClassifier of FARM framework (https://github.com/deepset-ai/FARM).
+While the underlying model can vary (BERT, Roberta, DistilBERT, ...), the interface remains the same.
+
+|  With a FARMRanker, you can:
+
+ - directly get predictions via predict()
+ - fine-tune the model on TextPair data via train()
+
+<a name="farm.FARMRanker.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(model_name_or_path: Union[str, Path], model_version: Optional[str] = None, batch_size: int = 50, use_gpu: bool = True, top_k: int = 10, num_processes: Optional[int] = None, max_seq_len: int = 256, progress_bar: bool = True)
+```
+
+**Arguments**:
+
+- `model_name_or_path`: Directory of a saved model or the name of a public model e.g. 'bert-base-cased',
+'deepset/bert-base-cased-squad2', 'deepset/bert-base-cased-squad2', 'distilbert-base-uncased-distilled-squad'.
+See https://huggingface.co/models for full list of available models.
+- `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
+- `batch_size`: Number of samples the model receives in one batch for inference.
+                   Memory consumption is much lower in inference mode. Recommendation: Increase the batch size
+                   to a value so only a single batch is used.
+- `use_gpu`: Whether to use GPU (if available)
+- `top_k`: The maximum number of documents to return
+- `num_processes`: The number of processes for `multiprocessing.Pool`. Set to value of 0 to disable
+                      multiprocessing. Set to None to let Inferencer determine optimum number. If you
+                      want to debug the Language Model, you might need to disable multiprocessing!
+- `max_seq_len`: Max sequence length of one input text for the model
+- `progress_bar`: Whether to show a tqdm progress bar or not.
+                     Can be helpful to disable in production deployments to keep the logs clean.
+
+<a name="farm.FARMRanker.train"></a>
+#### train
+
+```python
+ | train(data_dir: str, train_filename: str, dev_filename: Optional[str] = None, test_filename: Optional[str] = None, use_gpu: Optional[bool] = None, batch_size: int = 10, n_epochs: int = 2, learning_rate: float = 1e-5, max_seq_len: Optional[int] = None, warmup_proportion: float = 0.2, dev_split: float = 0, evaluate_every: int = 300, save_dir: Optional[str] = None, num_processes: Optional[int] = None, use_amp: str = None)
+```
+
+Fine-tune a model on a TextPairClassification dataset. Options:
+
+- Take a plain language model (e.g. `bert-base-cased`) and train it for TextPairClassification
+- Take a TextPairClassification model and fine-tune it for your domain
+
+**Arguments**:
+
+- `data_dir`: Path to directory containing your training data in SQuAD style
+- `train_filename`: Filename of training data
+- `dev_filename`: Filename of dev / eval data
+- `test_filename`: Filename of test data
+- `dev_split`: Instead of specifying a dev_filename, you can also specify a ratio (e.g. 0.1) here
+                  that gets split off from training data for eval.
+- `use_gpu`: Whether to use GPU (if available)
+- `batch_size`: Number of samples the model receives in one batch for training
+- `n_epochs`: Number of iterations on the whole training data set
+- `learning_rate`: Learning rate of the optimizer
+- `max_seq_len`: Maximum text length (in tokens). Everything longer gets cut down.
+- `warmup_proportion`: Proportion of training steps until maximum learning rate is reached.
+                          Until that point LR is increasing linearly. After that it's decreasing again linearly.
+                          Options for different schedules are available in FARM.
+- `evaluate_every`: Evaluate the model every X steps on the hold-out eval dataset
+- `save_dir`: Path to store the final model
+- `num_processes`: The number of processes for `multiprocessing.Pool` during preprocessing.
+                      Set to value of 1 to disable multiprocessing. When set to 1, you cannot split away a dev set from train set.
+                      Set to None to use all CPU cores minus one.
+- `use_amp`: Optimization level of NVIDIA's automatic mixed precision (AMP). The higher the level, the faster the model.
+                Available options:
+                None (Don't use AMP)
+                "O0" (Normal FP32 training)
+                "O1" (Mixed Precision => Recommended)
+                "O2" (Almost FP16)
+                "O3" (Pure FP16).
+                See details on: https://nvidia.github.io/apex/amp.html
+
+**Returns**:
+
+None
+
+<a name="farm.FARMRanker.update_parameters"></a>
+#### update\_parameters
+
+```python
+ | update_parameters(max_seq_len: Optional[int] = None)
+```
+
+Hot update parameters of a loaded Ranker. It may not to be safe when processing concurrent requests.
+
+<a name="farm.FARMRanker.save"></a>
+#### save
+
+```python
+ | save(directory: Path)
+```
+
+Saves the Ranker model so that it can be reused at a later point in time.
+
+**Arguments**:
+
+- `directory`: Directory where the Ranker model should be saved
+
+<a name="farm.FARMRanker.predict_batch"></a>
+#### predict\_batch
+
+```python
+ | predict_batch(query_doc_list: List[dict], top_k: int = None, batch_size: int = None)
+```
+
+Use loaded Ranker model to, for a list of queries, rank each query's supplied list of Document.
+
+Returns list of dictionary of query and list of document sorted by (desc.) similarity with query
+
+**Arguments**:
+
+- `query_doc_list`: List of dictionaries containing queries with their retrieved documents
+- `top_k`: The maximum number of answers to return for each query
+- `batch_size`: Number of samples the model receives in one batch for inference
+
+**Returns**:
+
+List of dictionaries containing query and ranked list of Document
+
+<a name="farm.FARMRanker.predict"></a>
+#### predict
+
+```python
+ | predict(query: str, documents: List[Document], top_k: Optional[int] = None)
+```
+
+Use loaded ranker model to re-rank the supplied list of Document.
+
+Returns list of Document sorted by (desc.) TextPairClassification similarity with the query.
+
+**Arguments**:
+
+- `query`: Query string
+- `documents`: List of Document to be re-ranked
+- `top_k`: The maximum number of documents to return
+
+**Returns**:
+
+List of Document
+
--- a/docs/_src/api/api/retriever.md
+++ b/docs/_src/api/api/retriever.md
@ -238,7 +238,7 @@ Karpukhin, Vladimir, et al. (2020): "Dense Passage Retrieval for Open-Domain Que
 #### \_\_init\_\_

 ```python
- | __init__(document_store: BaseDocumentStore, query_embedding_model: Union[Path, str] = "facebook/dpr-question_encoder-single-nq-base", passage_embedding_model: Union[Path, str] = "facebook/dpr-ctx_encoder-single-nq-base", single_model_path: Optional[Union[Path, str]] = None, model_version: Optional[str] = None, max_seq_len_query: int = 64, max_seq_len_passage: int = 256, top_k: int = 10, use_gpu: bool = True, batch_size: int = 16, embed_title: bool = True, use_fast_tokenizers: bool = True, infer_tokenizer_classes: bool = False, similarity_function: str = "dot_product", progress_bar: bool = True)
+ | __init__(document_store: BaseDocumentStore, query_embedding_model: Union[Path, str] = "facebook/dpr-question_encoder-single-nq-base", passage_embedding_model: Union[Path, str] = "facebook/dpr-ctx_encoder-single-nq-base", model_version: Optional[str] = None, max_seq_len_query: int = 64, max_seq_len_passage: int = 256, top_k: int = 10, use_gpu: bool = True, batch_size: int = 16, embed_title: bool = True, use_fast_tokenizers: bool = True, infer_tokenizer_classes: bool = False, similarity_function: str = "dot_product", progress_bar: bool = True)
 ```

 Init the Retriever incl. the two encoder models from a local or remote model checkpoint.
@ -266,9 +266,6 @@ The checkpoint format matches huggingface transformers' model format
 - `passage_embedding_model`: Local path or remote name of passage encoder checkpoint. The format equals the
                                one used by hugging-face transformers' modelhub models
                                Currently available remote names: ``"facebook/dpr-ctx_encoder-single-nq-base"``
- `single_model_path`: Local path or remote name of a query and passage embedder in one single model. Those
-                          models are typically trained within FARM.
-                          Currently available remote names: TODO add FARM DPR model to HF modelhub
 - `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
 - `max_seq_len_query`: Longest length of each query sequence. Maximum number of tokens for the query text. Longer ones will be cut down."
 - `max_seq_len_passage`: Longest length of each passage/context sequence. Maximum number of tokens for the passage text. Longer ones will be cut down."
@ -407,7 +404,7 @@ None

 ```python
 | @classmethod
- | load(cls, load_dir: Union[Path, str], document_store: BaseDocumentStore, max_seq_len_query: int = 64, max_seq_len_passage: int = 256, use_gpu: bool = True, batch_size: int = 16, embed_title: bool = True, use_fast_tokenizers: bool = True, similarity_function: str = "dot_product", query_encoder_dir: str = "query_encoder", passage_encoder_dir: str = "passage_encoder")
+ | load(cls, load_dir: Union[Path, str], document_store: BaseDocumentStore, max_seq_len_query: int = 64, max_seq_len_passage: int = 256, use_gpu: bool = True, batch_size: int = 16, embed_title: bool = True, use_fast_tokenizers: bool = True, similarity_function: str = "dot_product", query_encoder_dir: str = "query_encoder", passage_encoder_dir: str = "passage_encoder", infer_tokenizer_classes: bool = False)
 ```

 Load DensePassageRetriever from the specified directory.