Regen api docs (#1015)

2026-01-06 12:07:04 +00:00 · 2021-04-30 12:35:13 +02:00 · 2021-04-30 12:35:13 +02:00 · 869b493b61
commit 869b493b61
parent 99990e7249
5 changed files with 215 additions and 11 deletions
--- a/docs/_src/api/api/document_store.md
+++ b/docs/_src/api/api/document_store.md
@ -93,7 +93,7 @@ class ElasticsearchDocumentStore(BaseDocumentStore)
 #### \_\_init\_\_

 ```python
- | __init__(host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "text", text_field: str = "text", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, faq_question_field: Optional[str] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, create_index: bool = True, update_existing_documents: bool = False, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: bool = False)
+ | __init__(host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, aws4auth=None, index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "text", text_field: str = "text", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, faq_question_field: Optional[str] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, create_index: bool = True, update_existing_documents: bool = False, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: bool = False)
 ```

 A DocumentStore using Elasticsearch to store and query the documents for our search.
@ -110,6 +110,7 @@ A DocumentStore using Elasticsearch to store and query the documents for our sea
 - `password`: password (standard authentication via http_auth)
 - `api_key_id`: ID of the API key (altenative authentication mode to the above http_auth)
 - `api_key`: Secret value of the API key (altenative authentication mode to the above http_auth)
+- `aws4auth`: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
 - `index`: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
 - `label_index`: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
 - `search_fields`: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
@ -250,6 +251,15 @@ Return the number of documents in the document store.

 Return the number of labels in the document store

+<a name="elasticsearch.ElasticsearchDocumentStore.get_embedding_count"></a>
+#### get\_embedding\_count
+
+```python
+ | get_embedding_count(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None) -> int
+```
+
+Return the count of embeddings in the document store.
+
 <a name="elasticsearch.ElasticsearchDocumentStore.get_all_documents"></a>
 #### get\_all\_documents

@ -541,6 +551,15 @@ None

 Return the number of documents in the document store.

+<a name="memory.InMemoryDocumentStore.get_embedding_count"></a>
+#### get\_embedding\_count
+
+```python
+ | get_embedding_count(filters: Optional[Dict[str, List[str]]] = None, index: Optional[str] = None) -> int
+```
+
+Return the count of embeddings in the document store.
+
 <a name="memory.InMemoryDocumentStore.get_label_count"></a>
 #### get\_label\_count

@ -653,13 +672,6 @@ Fetch documents by specifying a list of text id strings

 Fetch documents by specifying a list of text vector id strings

-**Arguments**:
-
- `vector_ids`: List of vector_id strings.
- `index`: Name of the index to get the documents from. If None, the
-DocumentStore's default index (self.index) will be used.
- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
-
 <a name="sql.SQLDocumentStore.get_all_documents_generator"></a>
 #### get\_all\_documents\_generator

@ -813,7 +825,7 @@ the vector embeddings are indexed in a FAISS Index.
 #### \_\_init\_\_

 ```python
- | __init__(sql_url: str = "sqlite:///", vector_dim: int = 768, faiss_index_factory_str: str = "Flat", faiss_index: Optional[faiss.swigfaiss.Index] = None, return_embedding: bool = False, update_existing_documents: bool = False, index: str = "document", similarity: str = "dot_product", embedding_field: str = "embedding", progress_bar: bool = True, **kwargs, ,)
+ | __init__(sql_url: str = "sqlite:///", vector_dim: int = 768, faiss_index_factory_str: str = "Flat", faiss_index: Optional["faiss.swigfaiss.Index"] = None, return_embedding: bool = False, update_existing_documents: bool = False, index: str = "document", similarity: str = "dot_product", embedding_field: str = "embedding", progress_bar: bool = True, **kwargs, ,)
 ```

 **Arguments**:
@ -916,6 +928,15 @@ a large number of documents without having to load all documents in memory.
 - `return_embedding`: Whether to return the document embeddings.
 - `batch_size`: When working with large number of documents, batching can help reduce memory footprint.

+<a name="faiss.FAISSDocumentStore.get_embedding_count"></a>
+#### get\_embedding\_count
+
+```python
+ | get_embedding_count(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None) -> int
+```
+
+Return the count of embeddings in the document store.
+
 <a name="faiss.FAISSDocumentStore.train_index"></a>
 #### train\_index

@ -1257,3 +1278,12 @@ Helper function to dump all vectors stored in Milvus server.

 List[np.array]: List of vectors.

+<a name="milvus.MilvusDocumentStore.get_embedding_count"></a>
+#### get\_embedding\_count
+
+```python
+ | get_embedding_count(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None) -> int
+```
+
+Return the count of embeddings in the document store.
+
--- a/docs/_src/api/api/graph_retriever.md
+++ b/docs/_src/api/api/graph_retriever.md
@ -14,6 +14,35 @@ class Text2SparqlRetriever(BaseGraphRetriever)
 Graph retriever that uses a pre-trained Bart model to translate natural language questions given in text form to queries in SPARQL format.
 The generated SPARQL query is executed on a knowledge graph.

+<a name="text_to_sparql.Text2SparqlRetriever.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(knowledge_graph, model_name_or_path, top_k: int = 1)
+```
+
+Init the Retriever by providing a knowledge graph and a pre-trained BART model
+
+**Arguments**:
+
+- `knowledge_graph`: An instance of BaseKnowledgeGraph on which to execute SPARQL queries.
+- `model_name_or_path`: Name of or path to a pre-trained BartForConditionalGeneration model.
+- `top_k`: How many SPARQL queries to generate per text query.
+
+<a name="text_to_sparql.Text2SparqlRetriever.retrieve"></a>
+#### retrieve
+
+```python
+ | retrieve(query: str, top_k: Optional[int] = None)
+```
+
+Translate a text query to SPARQL and execute it on the knowledge graph to retrieve a list of answers
+
+**Arguments**:
+
+- `query`: Text query that shall be translated to SPARQL and then executed on the knowledge graph
+- `top_k`: How many SPARQL queries to generate per text query.
+
 <a name="text_to_sparql.Text2SparqlRetriever.format_result"></a>
 #### format\_result

@ -23,3 +52,7 @@ The generated SPARQL query is executed on a knowledge graph.

 Generate formatted dictionary output with text answer and additional info

+**Arguments**:
+
+- `result`: The result of a SPARQL query as retrieved from the knowledge graph
+
--- a/docs/_src/api/api/knowledge_graph.md
+++ b/docs/_src/api/api/knowledge_graph.md
@ -13,3 +13,143 @@ class GraphDBKnowledgeGraph(BaseKnowledgeGraph)

 Knowledge graph store that runs on a GraphDB instance

+<a name="graphdb.GraphDBKnowledgeGraph.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(host: str = "localhost", port: int = 7200, username: str = "", password: str = "", index: Optional[str] = None, prefixes: str = "")
+```
+
+Init the knowledge graph by defining the settings to connect with a GraphDB instance
+
+**Arguments**:
+
+- `host`: address of server where the GraphDB instance is running
+- `port`: port where the GraphDB instance is running
+- `username`: username to login to the GraphDB instance (if any)
+- `password`: password to login to the GraphDB instance (if any)
+- `index`: name of the index (also called repository) stored in the GraphDB instance
+- `prefixes`: definitions of namespaces with a new line after each namespace, e.g., PREFIX hp: <https://deepset.ai/harry_potter/>
+
+<a name="graphdb.GraphDBKnowledgeGraph.create_index"></a>
+#### create\_index
+
+```python
+ | create_index(config_path: Path)
+```
+
+Create a new index (also called repository) stored in the GraphDB instance
+
+**Arguments**:
+
+- `config_path`: path to a .ttl file with configuration settings, details: https://graphdb.ontotext.com/documentation/free/configuring-a-repository.html#configure-a-repository-programmatically
+
+<a name="graphdb.GraphDBKnowledgeGraph.delete_index"></a>
+#### delete\_index
+
+```python
+ | delete_index()
+```
+
+Delete the index that GraphDBKnowledgeGraph is connected to. This method deletes all data stored in the index.
+
+<a name="graphdb.GraphDBKnowledgeGraph.import_from_ttl_file"></a>
+#### import\_from\_ttl\_file
+
+```python
+ | import_from_ttl_file(index: str, path: Path)
+```
+
+Load an existing knowledge graph represented in the form of triples of subject, predicate, and object from a .ttl file into an index of GraphDB
+
+**Arguments**:
+
+- `index`: name of the index (also called repository) in the GraphDB instance where the imported triples shall be stored
+- `path`: path to a .ttl containing a knowledge graph
+
+<a name="graphdb.GraphDBKnowledgeGraph.get_all_triples"></a>
+#### get\_all\_triples
+
+```python
+ | get_all_triples(index: Optional[str] = None)
+```
+
+Query the given index in the GraphDB instance for all its stored triples. Duplicates are not filtered.
+
+**Arguments**:
+
+- `index`: name of the index (also called repository) in the GraphDB instance
+
+**Returns**:
+
+all triples stored in the index
+
+<a name="graphdb.GraphDBKnowledgeGraph.get_all_subjects"></a>
+#### get\_all\_subjects
+
+```python
+ | get_all_subjects(index: Optional[str] = None)
+```
+
+Query the given index in the GraphDB instance for all its stored subjects. Duplicates are not filtered.
+
+**Arguments**:
+
+- `index`: name of the index (also called repository) in the GraphDB instance
+
+**Returns**:
+
+all subjects stored in the index
+
+<a name="graphdb.GraphDBKnowledgeGraph.get_all_predicates"></a>
+#### get\_all\_predicates
+
+```python
+ | get_all_predicates(index: Optional[str] = None)
+```
+
+Query the given index in the GraphDB instance for all its stored predicates. Duplicates are not filtered.
+
+**Arguments**:
+
+- `index`: name of the index (also called repository) in the GraphDB instance
+
+**Returns**:
+
+all predicates stored in the index
+
+<a name="graphdb.GraphDBKnowledgeGraph.get_all_objects"></a>
+#### get\_all\_objects
+
+```python
+ | get_all_objects(index: Optional[str] = None)
+```
+
+Query the given index in the GraphDB instance for all its stored objects. Duplicates are not filtered.
+
+**Arguments**:
+
+- `index`: name of the index (also called repository) in the GraphDB instance
+
+**Returns**:
+
+all objects stored in the index
+
+<a name="graphdb.GraphDBKnowledgeGraph.query"></a>
+#### query
+
+```python
+ | query(sparql_query: str, index: Optional[str] = None)
+```
+
+Execute a SPARQL query on the given index in the GraphDB instance
+
+**Arguments**:
+
+- `sparql_query`: SPARQL query that shall be executed
+- `index`: name of the index (also called repository) in the GraphDB instance
+
+**Returns**:
+
+query result
+
--- a/docs/_src/api/api/pipelines.md
+++ b/docs/_src/api/api/pipelines.md
@ -5,7 +5,7 @@
 ## Pipeline Objects

 ```python
-class Pipeline(ABC)
+class Pipeline()
 ```

 Pipeline brings together building blocks to build a complex search pipeline with Haystack & user-defined components.
--- a/docs/_src/api/api/retriever.md
+++ b/docs/_src/api/api/retriever.md
@ -344,7 +344,7 @@ Embeddings of documents / passages shape (batch_size, embedding_dim)
 #### train

 ```python
- | train(data_dir: str, train_filename: str, dev_filename: str = None, test_filename: str = None, max_processes: int = 128, dev_split: float = 0, batch_size: int = 2, embed_title: bool = True, num_hard_negatives: int = 1, num_positives: int = 1, n_epochs: int = 3, evaluate_every: int = 1000, n_gpu: int = 1, learning_rate: float = 1e-5, epsilon: float = 1e-08, weight_decay: float = 0.0, num_warmup_steps: int = 100, grad_acc_steps: int = 1, optimizer_name: str = "TransformersAdamW", optimizer_correct_bias: bool = True, save_dir: str = "../saved_models/dpr", query_encoder_save_dir: str = "query_encoder", passage_encoder_save_dir: str = "passage_encoder")
+ | train(data_dir: str, train_filename: str, dev_filename: str = None, test_filename: str = None, max_sample: int = None, max_processes: int = 128, dev_split: float = 0, batch_size: int = 2, embed_title: bool = True, num_hard_negatives: int = 1, num_positives: int = 1, n_epochs: int = 3, evaluate_every: int = 1000, n_gpu: int = 1, learning_rate: float = 1e-5, epsilon: float = 1e-08, weight_decay: float = 0.0, num_warmup_steps: int = 100, grad_acc_steps: int = 1, optimizer_name: str = "TransformersAdamW", optimizer_correct_bias: bool = True, save_dir: str = "../saved_models/dpr", query_encoder_save_dir: str = "query_encoder", passage_encoder_save_dir: str = "passage_encoder")
 ```

 train a DensePassageRetrieval model
@ -355,6 +355,7 @@ train a DensePassageRetrieval model
 - `train_filename`: training filename
 - `dev_filename`: development set filename, file to be used by model in eval step of training
 - `test_filename`: test set filename, file to be used by model in test step after training
+- `max_sample`: maximum number of input samples to convert. Can be used for debugging a smaller dataset.
 - `max_processes`: the maximum number of processes to spawn in the multiprocessing.Pool used in DataSilo.
                      It can be set to 1 to disable the use of multiprocessing or make debugging easier.
 - `dev_split`: The proportion of the train set that will sliced. Only works if dev_filename is set to None