Regen api docs (#1015)

This commit is contained in:
Branden Chan 2021-04-30 12:35:13 +02:00 committed by GitHub
parent 99990e7249
commit 869b493b61
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 215 additions and 11 deletions

View File

@ -93,7 +93,7 @@ class ElasticsearchDocumentStore(BaseDocumentStore)
#### \_\_init\_\_
```python
| __init__(host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "text", text_field: str = "text", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, faq_question_field: Optional[str] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, create_index: bool = True, update_existing_documents: bool = False, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: bool = False)
| __init__(host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, aws4auth=None, index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "text", text_field: str = "text", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, faq_question_field: Optional[str] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, create_index: bool = True, update_existing_documents: bool = False, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: bool = False)
```
A DocumentStore using Elasticsearch to store and query the documents for our search.
@ -110,6 +110,7 @@ A DocumentStore using Elasticsearch to store and query the documents for our sea
- `password`: password (standard authentication via http_auth)
- `api_key_id`: ID of the API key (altenative authentication mode to the above http_auth)
- `api_key`: Secret value of the API key (altenative authentication mode to the above http_auth)
- `aws4auth`: Authentication for usage with aws elasticsearch (can be generated with the requests-aws4auth package)
- `index`: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
- `label_index`: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
- `search_fields`: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
@ -250,6 +251,15 @@ Return the number of documents in the document store.
Return the number of labels in the document store
<a name="elasticsearch.ElasticsearchDocumentStore.get_embedding_count"></a>
#### get\_embedding\_count
```python
| get_embedding_count(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None) -> int
```
Return the count of embeddings in the document store.
<a name="elasticsearch.ElasticsearchDocumentStore.get_all_documents"></a>
#### get\_all\_documents
@ -541,6 +551,15 @@ None
Return the number of documents in the document store.
<a name="memory.InMemoryDocumentStore.get_embedding_count"></a>
#### get\_embedding\_count
```python
| get_embedding_count(filters: Optional[Dict[str, List[str]]] = None, index: Optional[str] = None) -> int
```
Return the count of embeddings in the document store.
<a name="memory.InMemoryDocumentStore.get_label_count"></a>
#### get\_label\_count
@ -653,13 +672,6 @@ Fetch documents by specifying a list of text id strings
Fetch documents by specifying a list of text vector id strings
**Arguments**:
- `vector_ids`: List of vector_id strings.
- `index`: Name of the index to get the documents from. If None, the
DocumentStore's default index (self.index) will be used.
- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
<a name="sql.SQLDocumentStore.get_all_documents_generator"></a>
#### get\_all\_documents\_generator
@ -813,7 +825,7 @@ the vector embeddings are indexed in a FAISS Index.
#### \_\_init\_\_
```python
| __init__(sql_url: str = "sqlite:///", vector_dim: int = 768, faiss_index_factory_str: str = "Flat", faiss_index: Optional[faiss.swigfaiss.Index] = None, return_embedding: bool = False, update_existing_documents: bool = False, index: str = "document", similarity: str = "dot_product", embedding_field: str = "embedding", progress_bar: bool = True, **kwargs, ,)
| __init__(sql_url: str = "sqlite:///", vector_dim: int = 768, faiss_index_factory_str: str = "Flat", faiss_index: Optional["faiss.swigfaiss.Index"] = None, return_embedding: bool = False, update_existing_documents: bool = False, index: str = "document", similarity: str = "dot_product", embedding_field: str = "embedding", progress_bar: bool = True, **kwargs, ,)
```
**Arguments**:
@ -916,6 +928,15 @@ a large number of documents without having to load all documents in memory.
- `return_embedding`: Whether to return the document embeddings.
- `batch_size`: When working with large number of documents, batching can help reduce memory footprint.
<a name="faiss.FAISSDocumentStore.get_embedding_count"></a>
#### get\_embedding\_count
```python
| get_embedding_count(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None) -> int
```
Return the count of embeddings in the document store.
<a name="faiss.FAISSDocumentStore.train_index"></a>
#### train\_index
@ -1257,3 +1278,12 @@ Helper function to dump all vectors stored in Milvus server.
List[np.array]: List of vectors.
<a name="milvus.MilvusDocumentStore.get_embedding_count"></a>
#### get\_embedding\_count
```python
| get_embedding_count(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None) -> int
```
Return the count of embeddings in the document store.

View File

@ -14,6 +14,35 @@ class Text2SparqlRetriever(BaseGraphRetriever)
Graph retriever that uses a pre-trained Bart model to translate natural language questions given in text form to queries in SPARQL format.
The generated SPARQL query is executed on a knowledge graph.
<a name="text_to_sparql.Text2SparqlRetriever.__init__"></a>
#### \_\_init\_\_
```python
| __init__(knowledge_graph, model_name_or_path, top_k: int = 1)
```
Init the Retriever by providing a knowledge graph and a pre-trained BART model
**Arguments**:
- `knowledge_graph`: An instance of BaseKnowledgeGraph on which to execute SPARQL queries.
- `model_name_or_path`: Name of or path to a pre-trained BartForConditionalGeneration model.
- `top_k`: How many SPARQL queries to generate per text query.
<a name="text_to_sparql.Text2SparqlRetriever.retrieve"></a>
#### retrieve
```python
| retrieve(query: str, top_k: Optional[int] = None)
```
Translate a text query to SPARQL and execute it on the knowledge graph to retrieve a list of answers
**Arguments**:
- `query`: Text query that shall be translated to SPARQL and then executed on the knowledge graph
- `top_k`: How many SPARQL queries to generate per text query.
<a name="text_to_sparql.Text2SparqlRetriever.format_result"></a>
#### format\_result
@ -23,3 +52,7 @@ The generated SPARQL query is executed on a knowledge graph.
Generate formatted dictionary output with text answer and additional info
**Arguments**:
- `result`: The result of a SPARQL query as retrieved from the knowledge graph

View File

@ -13,3 +13,143 @@ class GraphDBKnowledgeGraph(BaseKnowledgeGraph)
Knowledge graph store that runs on a GraphDB instance
<a name="graphdb.GraphDBKnowledgeGraph.__init__"></a>
#### \_\_init\_\_
```python
| __init__(host: str = "localhost", port: int = 7200, username: str = "", password: str = "", index: Optional[str] = None, prefixes: str = "")
```
Init the knowledge graph by defining the settings to connect with a GraphDB instance
**Arguments**:
- `host`: address of server where the GraphDB instance is running
- `port`: port where the GraphDB instance is running
- `username`: username to login to the GraphDB instance (if any)
- `password`: password to login to the GraphDB instance (if any)
- `index`: name of the index (also called repository) stored in the GraphDB instance
- `prefixes`: definitions of namespaces with a new line after each namespace, e.g., PREFIX hp: <https://deepset.ai/harry_potter/>
<a name="graphdb.GraphDBKnowledgeGraph.create_index"></a>
#### create\_index
```python
| create_index(config_path: Path)
```
Create a new index (also called repository) stored in the GraphDB instance
**Arguments**:
- `config_path`: path to a .ttl file with configuration settings, details: https://graphdb.ontotext.com/documentation/free/configuring-a-repository.html#configure-a-repository-programmatically
<a name="graphdb.GraphDBKnowledgeGraph.delete_index"></a>
#### delete\_index
```python
| delete_index()
```
Delete the index that GraphDBKnowledgeGraph is connected to. This method deletes all data stored in the index.
<a name="graphdb.GraphDBKnowledgeGraph.import_from_ttl_file"></a>
#### import\_from\_ttl\_file
```python
| import_from_ttl_file(index: str, path: Path)
```
Load an existing knowledge graph represented in the form of triples of subject, predicate, and object from a .ttl file into an index of GraphDB
**Arguments**:
- `index`: name of the index (also called repository) in the GraphDB instance where the imported triples shall be stored
- `path`: path to a .ttl containing a knowledge graph
<a name="graphdb.GraphDBKnowledgeGraph.get_all_triples"></a>
#### get\_all\_triples
```python
| get_all_triples(index: Optional[str] = None)
```
Query the given index in the GraphDB instance for all its stored triples. Duplicates are not filtered.
**Arguments**:
- `index`: name of the index (also called repository) in the GraphDB instance
**Returns**:
all triples stored in the index
<a name="graphdb.GraphDBKnowledgeGraph.get_all_subjects"></a>
#### get\_all\_subjects
```python
| get_all_subjects(index: Optional[str] = None)
```
Query the given index in the GraphDB instance for all its stored subjects. Duplicates are not filtered.
**Arguments**:
- `index`: name of the index (also called repository) in the GraphDB instance
**Returns**:
all subjects stored in the index
<a name="graphdb.GraphDBKnowledgeGraph.get_all_predicates"></a>
#### get\_all\_predicates
```python
| get_all_predicates(index: Optional[str] = None)
```
Query the given index in the GraphDB instance for all its stored predicates. Duplicates are not filtered.
**Arguments**:
- `index`: name of the index (also called repository) in the GraphDB instance
**Returns**:
all predicates stored in the index
<a name="graphdb.GraphDBKnowledgeGraph.get_all_objects"></a>
#### get\_all\_objects
```python
| get_all_objects(index: Optional[str] = None)
```
Query the given index in the GraphDB instance for all its stored objects. Duplicates are not filtered.
**Arguments**:
- `index`: name of the index (also called repository) in the GraphDB instance
**Returns**:
all objects stored in the index
<a name="graphdb.GraphDBKnowledgeGraph.query"></a>
#### query
```python
| query(sparql_query: str, index: Optional[str] = None)
```
Execute a SPARQL query on the given index in the GraphDB instance
**Arguments**:
- `sparql_query`: SPARQL query that shall be executed
- `index`: name of the index (also called repository) in the GraphDB instance
**Returns**:
query result

View File

@ -5,7 +5,7 @@
## Pipeline Objects
```python
class Pipeline(ABC)
class Pipeline()
```
Pipeline brings together building blocks to build a complex search pipeline with Haystack & user-defined components.

View File

@ -344,7 +344,7 @@ Embeddings of documents / passages shape (batch_size, embedding_dim)
#### train
```python
| train(data_dir: str, train_filename: str, dev_filename: str = None, test_filename: str = None, max_processes: int = 128, dev_split: float = 0, batch_size: int = 2, embed_title: bool = True, num_hard_negatives: int = 1, num_positives: int = 1, n_epochs: int = 3, evaluate_every: int = 1000, n_gpu: int = 1, learning_rate: float = 1e-5, epsilon: float = 1e-08, weight_decay: float = 0.0, num_warmup_steps: int = 100, grad_acc_steps: int = 1, optimizer_name: str = "TransformersAdamW", optimizer_correct_bias: bool = True, save_dir: str = "../saved_models/dpr", query_encoder_save_dir: str = "query_encoder", passage_encoder_save_dir: str = "passage_encoder")
| train(data_dir: str, train_filename: str, dev_filename: str = None, test_filename: str = None, max_sample: int = None, max_processes: int = 128, dev_split: float = 0, batch_size: int = 2, embed_title: bool = True, num_hard_negatives: int = 1, num_positives: int = 1, n_epochs: int = 3, evaluate_every: int = 1000, n_gpu: int = 1, learning_rate: float = 1e-5, epsilon: float = 1e-08, weight_decay: float = 0.0, num_warmup_steps: int = 100, grad_acc_steps: int = 1, optimizer_name: str = "TransformersAdamW", optimizer_correct_bias: bool = True, save_dir: str = "../saved_models/dpr", query_encoder_save_dir: str = "query_encoder", passage_encoder_save_dir: str = "passage_encoder")
```
train a DensePassageRetrieval model
@ -355,6 +355,7 @@ train a DensePassageRetrieval model
- `train_filename`: training filename
- `dev_filename`: development set filename, file to be used by model in eval step of training
- `test_filename`: test set filename, file to be used by model in test step after training
- `max_sample`: maximum number of input samples to convert. Can be used for debugging a smaller dataset.
- `max_processes`: the maximum number of processes to spawn in the multiprocessing.Pool used in DataSilo.
It can be set to 1 to disable the use of multiprocessing or make debugging easier.
- `dev_split`: The proportion of the train set that will sliced. Only works if dev_filename is set to None