Update API Reference Pages for v1.0 (#1729)

* Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix
2025-11-30 08:56:16 +00:00 · 2021-11-11 12:44:29 +01:00 · 2021-11-11 12:44:29 +01:00 · 81f82b1b95
commit 81f82b1b95
parent 158460504b
33 changed files with 645 additions and 90 deletions
--- a/docs/_src/api/api/crawler.md
+++ b/docs/_src/api/api/crawler.md
@ -1,53 +1,95 @@
-<a name="entity"></a>
-# Module entity
+<a name="crawler"></a>
+# Module crawler

-<a name="entity.EntityExtractor"></a>
-## EntityExtractor Objects
+<a name="crawler.Crawler"></a>
+## Crawler

 ```python
-class EntityExtractor(BaseComponent)
+class Crawler(BaseComponent)
 ```

-This node is used to extract entities out of documents.
-The most common use case for this would be as a named entity extractor.
-The default model used is dslim/bert-base-NER.
-This node can be placed in a querying pipeline to perform entity extraction on retrieved documents only,
-or it can be placed in an indexing pipeline so that all documents in the document store have extracted entities.
-The entities extracted by this Node will populate Document.entities
+Crawl texts from a website so that we can use them later in Haystack as a corpus for search / question answering etc.

-<a name="entity.EntityExtractor.run"></a>
+**Example:**
+```python
+|    from haystack.nodes.connector import Crawler
+|
+|    crawler = Crawler(output_dir="crawled_files")
+|    # crawl Haystack docs, i.e. all pages that include haystack.deepset.ai/overview/
+|    docs = crawler.crawl(urls=["https://haystack.deepset.ai/overview/get-started"],
+|                         filter_urls= ["haystack\.deepset\.ai\/overview\/"])
+```
+
+<a name="crawler.Crawler.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(output_dir: str, urls: Optional[List[str]] = None, crawler_depth: int = 1, filter_urls: Optional[List] = None, overwrite_existing_files=True)
+```
+
+Init object with basic params for crawling (can be overwritten later).
+
+**Arguments**:
+
+- `output_dir`: Path for the directory to store files
+- `urls`: List of http(s) address(es) (can also be supplied later when calling crawl())
+- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options:
+    0: Only initial list of urls 
+    1: Follow links found on the initial URLs (but no further) 
+- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with.
+    All URLs not matching at least one of the regular expressions will be dropped.
+- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content
+
+<a name="crawler.Crawler.crawl"></a>
+#### crawl
+
+```python
+ | crawl(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None) -> List[Path]
+```
+
+Craw URL(s), extract the text from the HTML, create a Haystack Document object out of it and save it (one JSON
+file per URL, including text and basic meta data).
+You can optionally specify via `filter_urls` to only crawl URLs that match a certain pattern.
+All parameters are optional here and only meant to overwrite instance attributes at runtime.
+If no parameters are provided to this method, the instance attributes that were passed during __init__ will be used.
+
+**Arguments**:
+
+- `output_dir`: Path for the directory to store files
+- `urls`: List of http addresses or single http address
+- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options:
+                      0: Only initial list of urls
+                      1: Follow links found on the initial URLs (but no further)
+- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with.
+                   All URLs not matching at least one of the regular expressions will be dropped.
+- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content
+
+**Returns**:
+
+List of paths where the crawled webpages got stored
+
+<a name="crawler.Crawler.run"></a>
 #### run

 ```python
- | run(documents: Optional[Union[List[Document], List[dict]]] = None) -> Tuple[Dict, str]
+ | run(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None, return_documents: Optional[bool] = False) -> Tuple[Dict, str]
 ```

-This is the method called when this node is used in a pipeline
+Method to be executed when the Crawler is used as a Node within a Haystack pipeline.

-<a name="entity.EntityExtractor.extract"></a>
-#### extract
+**Arguments**:

-```python
- | extract(text)
-```
+- `output_dir`: Path for the directory to store files
+- `urls`: List of http addresses or single http address
+- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options:
+                      0: Only initial list of urls
+                      1: Follow links found on the initial URLs (but no further)
+- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with.
+                   All URLs not matching at least one of the regular expressions will be dropped.
+- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content
+- `return_documents`: Return json files content

-This function can be called to perform entity extraction when using the node in isolation.
+**Returns**:

-<a name="entity.simplify_ner_for_qa"></a>
-#### simplify\_ner\_for\_qa
-
-```python
-simplify_ner_for_qa(output)
-```
-
-Returns a simplified version of the output dictionary
-with the following structure:
-[
-    { 
-        answer: { ... }
-        entities: [ { ... }, {} ]
-    }
-]
-The entities included are only the ones that overlap with
-the answer itself.
+Tuple({"paths": List of filepaths, ...}, Name of output edge)

--- a/docs/_src/api/api/document_store.md
+++ b/docs/_src/api/api/document_store.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BaseKnowledgeGraph"></a>
-## BaseKnowledgeGraph Objects
+## BaseKnowledgeGraph

 ```python
 class BaseKnowledgeGraph(BaseComponent)
@ -11,7 +11,7 @@ class BaseKnowledgeGraph(BaseComponent)
 Base class for implementing Knowledge Graphs.

 <a name="base.BaseDocumentStore"></a>
-## BaseDocumentStore Objects
+## BaseDocumentStore

 ```python
 class BaseDocumentStore(BaseComponent)
@ -150,7 +150,7 @@ Batch elements of an iterable into fixed-length chunks or blocks.
 # Module elasticsearch

 <a name="elasticsearch.ElasticsearchDocumentStore"></a>
-## ElasticsearchDocumentStore Objects
+## ElasticsearchDocumentStore

 ```python
 class ElasticsearchDocumentStore(BaseDocumentStore)
@ -530,7 +530,7 @@ Delete labels in an index. All labels are deleted if no filters are passed.
 None

 <a name="elasticsearch.OpenSearchDocumentStore"></a>
-## OpenSearchDocumentStore Objects
+## OpenSearchDocumentStore

 ```python
 class OpenSearchDocumentStore(ElasticsearchDocumentStore)
@ -564,7 +564,7 @@ Find the document that is most similar to the provided `query_emb` by using a ve


 <a name="elasticsearch.OpenDistroElasticsearchDocumentStore"></a>
-## OpenDistroElasticsearchDocumentStore Objects
+## OpenDistroElasticsearchDocumentStore

 ```python
 class OpenDistroElasticsearchDocumentStore(OpenSearchDocumentStore)
@ -576,7 +576,7 @@ A DocumentStore which has an Open Distro for Elasticsearch service behind it.
 # Module memory

 <a name="memory.InMemoryDocumentStore"></a>
-## InMemoryDocumentStore Objects
+## InMemoryDocumentStore

 ```python
 class InMemoryDocumentStore(BaseDocumentStore)
@ -857,7 +857,7 @@ None
 # Module sql

 <a name="sql.SQLDocumentStore"></a>
-## SQLDocumentStore Objects
+## SQLDocumentStore

 ```python
 class SQLDocumentStore(BaseDocumentStore)
@ -1099,7 +1099,7 @@ None
 # Module faiss

 <a name="faiss.FAISSDocumentStore"></a>
-## FAISSDocumentStore Objects
+## FAISSDocumentStore

 ```python
 class FAISSDocumentStore(SQLDocumentStore)
@ -1368,7 +1368,7 @@ Note: In order to have a correct mapping from FAISS to SQL,
 # Module milvus

 <a name="milvus.MilvusDocumentStore"></a>
-## MilvusDocumentStore Objects
+## MilvusDocumentStore

 ```python
 class MilvusDocumentStore(SQLDocumentStore)
@ -1660,7 +1660,7 @@ Return the count of embeddings in the document store.
 # Module weaviate

 <a name="weaviate.WeaviateDocumentStore"></a>
-## WeaviateDocumentStore Objects
+## WeaviateDocumentStore

 ```python
 class WeaviateDocumentStore(BaseDocumentStore)
@ -1947,7 +1947,7 @@ None
 # Module graphdb

 <a name="graphdb.GraphDBKnowledgeGraph"></a>
-## GraphDBKnowledgeGraph Objects
+## GraphDBKnowledgeGraph

 ```python
 class GraphDBKnowledgeGraph(BaseKnowledgeGraph)
--- a/docs/_src/api/api/extractor.md
+++ b/docs/_src/api/api/extractor.md
@ -0,0 +1,53 @@
+<a name="entity"></a>
+# Module entity
+
+<a name="entity.EntityExtractor"></a>
+## EntityExtractor
+
+```python
+class EntityExtractor(BaseComponent)
+```
+
+This node is used to extract entities out of documents.
+The most common use case for this would be as a named entity extractor.
+The default model used is dslim/bert-base-NER.
+This node can be placed in a querying pipeline to perform entity extraction on retrieved documents only,
+or it can be placed in an indexing pipeline so that all documents in the document store have extracted entities.
+The entities extracted by this Node will populate Document.entities
+
+<a name="entity.EntityExtractor.run"></a>
+#### run
+
+```python
+ | run(documents: Optional[Union[List[Document], List[dict]]] = None) -> Tuple[Dict, str]
+```
+
+This is the method called when this node is used in a pipeline
+
+<a name="entity.EntityExtractor.extract"></a>
+#### extract
+
+```python
+ | extract(text)
+```
+
+This function can be called to perform entity extraction when using the node in isolation.
+
+<a name="entity.simplify_ner_for_qa"></a>
+#### simplify\_ner\_for\_qa
+
+```python
+simplify_ner_for_qa(output)
+```
+
+Returns a simplified version of the output dictionary
+with the following structure:
+[
+    { 
+        answer: { ... }
+        entities: [ { ... }, {} ]
+    }
+]
+The entities included are only the ones that overlap with
+the answer itself.
+
--- a/docs/_src/api/api/file_converter.md
+++ b/docs/_src/api/api/file_converter.md
@ -2,7 +2,7 @@
 # Module file\_type

 <a name="file_type.FileTypeClassifier"></a>
-## FileTypeClassifier Objects
+## FileTypeClassifier

 ```python
 class FileTypeClassifier(BaseComponent)
--- a/docs/_src/api/api/generate_docstrings.sh
+++ b/docs/_src/api/api/generate_docstrings.sh
@ -2,6 +2,7 @@

 # Purpose : Automate the generation of docstrings

+pydoc-markdown pydoc-markdown-primitives.yml
 pydoc-markdown pydoc-markdown-document-store.yml
 pydoc-markdown pydoc-markdown-file-converters.yml
 pydoc-markdown pydoc-markdown-file-classifier.yml
@ -18,5 +19,6 @@ pydoc-markdown pydoc-markdown-pipelines.yml
 pydoc-markdown pydoc-markdown-evaluation.yml
 pydoc-markdown pydoc-markdown-ranker.yml
 pydoc-markdown pydoc-markdown-question-generator.yml
+pydoc-markdown pydoc-markdown-query-classifier.yml
 pydoc-markdown pydoc-markdown-document-classifier.yml

--- a/docs/_src/api/api/generator.md
+++ b/docs/_src/api/api/generator.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BaseGenerator"></a>
-## BaseGenerator Objects
+## BaseGenerator

 ```python
 class BaseGenerator(BaseComponent)
@ -34,7 +34,7 @@ Generated answers plus additional infos in a dict
 # Module transformers

 <a name="transformers.RAGenerator"></a>
-## RAGenerator Objects
+## RAGenerator

 ```python
 class RAGenerator(BaseGenerator)
@ -140,7 +140,7 @@ Generated answers plus additional infos in a dict like this:
 ```

 <a name="transformers.Seq2SeqGenerator"></a>
-## Seq2SeqGenerator Objects
+## Seq2SeqGenerator

 ```python
 class Seq2SeqGenerator(BaseGenerator)
--- a/docs/_src/api/api/other.md
+++ b/docs/_src/api/api/other.md
@ -0,0 +1,47 @@
+<a name="docs2answers"></a>
+# Module docs2answers
+
+<a name="docs2answers.Docs2Answers"></a>
+## Docs2Answers
+
+```python
+class Docs2Answers(BaseComponent)
+```
+
+This Node is used to convert retrieved documents into predicted answers format.
+It is useful for situations where you are calling a Retriever only pipeline via REST API.
+This ensures that your output is in a compatible format.
+
+<a name="join_docs"></a>
+# Module join\_docs
+
+<a name="join_docs.JoinDocuments"></a>
+## JoinDocuments
+
+```python
+class JoinDocuments(BaseComponent)
+```
+
+A node to join documents outputted by multiple retriever nodes.
+
+The node allows multiple join modes:
+* concatenate: combine the documents from multiple nodes. Any duplicate documents are discarded.
+* merge: merge scores of documents from multiple nodes. Optionally, each input score can be given a different
+         `weight` & a `top_k` limit can be set. This mode can also be used for "reranking" retrieved documents.
+
+<a name="join_docs.JoinDocuments.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(join_mode: str = "concatenate", weights: Optional[List[float]] = None, top_k_join: Optional[int] = None)
+```
+
+**Arguments**:
+
+- `join_mode`: `concatenate` to combine documents from multiple retrievers or `merge` to aggregate scores of
+                  individual documents.
+- `weights`: A node-wise list(length of list must be equal to the number of input nodes) of weights for
+                adjusting document scores when using the `merge` join_mode. By default, equal weight is given
+                to each retriever score. This param is not compatible with the `concatenate` join_mode.
+- `top_k_join`: Limit documents to top_k based on the resulting scores of the join.
+
--- a/docs/_src/api/api/preprocessor.md
+++ b/docs/_src/api/api/preprocessor.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BasePreProcessor"></a>
-## BasePreProcessor Objects
+## BasePreProcessor

 ```python
 class BasePreProcessor(BaseComponent)
@ -21,7 +21,7 @@ Perform document cleaning and splitting. Takes a single document as input and re
 # Module preprocessor

 <a name="preprocessor.PreProcessor"></a>
-## PreProcessor Objects
+## PreProcessor

 ```python
 class PreProcessor(BasePreProcessor)
--- a/docs/_src/api/api/primitives.md
+++ b/docs/_src/api/api/primitives.md
@ -0,0 +1,232 @@
+<a name="schema"></a>
+# Module schema
+
+<a name="schema.Document"></a>
+## Document
+
+```python
+@dataclass
+class Document()
+```
+
+<a name="schema.Document.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(content: Union[str, pd.DataFrame], content_type: Literal["text", "table", "image"] = "text", id: Optional[str] = None, score: Optional[float] = None, meta: Dict[str, Any] = None, embedding: Optional[np.ndarray] = None, id_hash_keys: Optional[List[str]] = None)
+```
+
+One of the core data classes in Haystack. It's used to represent documents / passages in a standardized way within Haystack.
+Documents are stored in DocumentStores, are returned by Retrievers, are the input for Readers and are used in
+many other places that manipulate or interact with document-level data.
+
+Note: There can be multiple Documents originating from one file (e.g. PDF), if you split the text
+into smaller passages. We'll have one Document per passage in this case.
+
+Each document has a unique ID. This can be supplied by the user or generated automatically.
+It's particularly helpful for handling of duplicates and referencing documents in other objects (e.g. Labels)
+
+There's an easy option to convert from/to dicts via `from_dict()` and `to_dict`.
+
+**Arguments**:
+
+- `content`: Content of the document. For most cases, this will be text, but it can be a table or image.
+- `content_type`: One of "image", "table" or "image". Haystack components can use this to adjust their
+                     handling of Documents and check compatibility.
+- `id`: Unique ID for the document. If not supplied by the user, we'll generate one automatically by
+           creating a hash from the supplied text. This behaviour can be further adjusted by `id_hash_keys`.
+- `score`: The relevance score of the Document determined by a model (e.g. Retriever or Re-Ranker).
+              In the range of [0,1], where 1 means extremely relevant.
+- `meta`: Meta fields for a document like name, url, or author in the form of a custom dict (any keys and values allowed).
+- `embedding`: Vector encoding of the text
+- `id_hash_keys`: Generate the document id from a custom list of strings.
+                     If you want ensure you don't have duplicate documents in your DocumentStore but texts are
+                     not unique, you can provide custom strings here that will be used (e.g. ["filename_xy", "text_of_doc"].
+
+<a name="schema.Document.to_dict"></a>
+#### to\_dict
+
+```python
+ | to_dict(field_map={}) -> Dict
+```
+
+Convert Document to dict. An optional field_map can be supplied to change the names of the keys in the
+resulting dict. This way you can work with standardized Document objects in Haystack, but adjust the format that
+they are serialized / stored in other places (e.g. elasticsearch)
+Example:
+| doc = Document(content="some text", content_type="text")
+| doc.to_dict(field_map={"custom_content_field": "content"})
+| >>> {"custom_content_field": "some text", content_type": "text"}
+
+**Arguments**:
+
+- `field_map`: Dict with keys being the custom target keys and values being the standard Document attributes
+
+**Returns**:
+
+dict with content of the Document
+
+<a name="schema.Document.from_dict"></a>
+#### from\_dict
+
+```python
+ | @classmethod
+ | from_dict(cls, dict, field_map={})
+```
+
+Create Document from dict. An optional field_map can be supplied to adjust for custom names of the keys in the
+input dict. This way you can work with standardized Document objects in Haystack, but adjust the format that
+they are serialized / stored in other places (e.g. elasticsearch)
+Example:
+| my_dict = {"custom_content_field": "some text", content_type": "text"}
+| Document.from_dict(my_dict, field_map={"custom_content_field": "content"})
+
+**Arguments**:
+
+- `field_map`: Dict with keys being the custom target keys and values being the standard Document attributes
+
+**Returns**:
+
+dict with content of the Document
+
+<a name="schema.Document.__lt__"></a>
+#### \_\_lt\_\_
+
+```python
+ | __lt__(other)
+```
+
+Enable sorting of Documents by score
+
+<a name="schema.Span"></a>
+## Span
+
+```python
+@dataclass
+class Span()
+```
+
+<a name="schema.Span.end"></a>
+#### end
+
+Defining a sequence of characters (Text span) or cells (Table span) via start and end index.
+For extractive QA: Character where answer starts/ends
+For TableQA: Cell where the answer starts/ends (counted from top left to bottom right of table)
+
+**Arguments**:
+
+- `start`: Position where the span starts
+- `end`: Position where the spand ends
+
+<a name="schema.Answer"></a>
+## Answer
+
+```python
+@dataclass
+class Answer()
+```
+
+<a name="schema.Answer.meta"></a>
+#### meta
+
+The fundamental object in Haystack to represent any type of Answers (e.g. extractive QA, generative QA or TableQA).
+For example, it's used within some Nodes like the Reader, but also in the REST API.
+
+**Arguments**:
+
+- `answer`: The answer string. If there's no possible answer (aka "no_answer" or "is_impossible) this will be an empty string.
+- `type`: One of ("generative", "extractive", "other"): Whether this answer comes from an extractive model
+             (i.e. we can locate an exact answer string in one of the documents) or from a generative model 
+             (i.e. no pointer to a specific document, no offsets ...). 
+- `score`: The relevance score of the Answer determined by a model (e.g. Reader or Generator).
+              In the range of [0,1], where 1 means extremely relevant.
+- `context`: The related content that was used to create the answer (i.e. a text passage, part of a table, image ...)
+- `offsets_in_document`: List of `Span` objects with start and end positions of the answer **in the
+                            document** (as stored in the document store).
+                            For extractive QA: Character where answer starts => `Answer.offsets_in_document[0].start 
+                            For TableQA: Cell where the answer starts (counted from top left to bottom right of table) => `Answer.offsets_in_document[0].start
+                            (Note that in TableQA there can be multiple cell ranges that are relevant for the answer, thus there can be multiple `Spans` here) 
+- `offsets_in_context`: List of `Span` objects with start and end positions of the answer **in the
+                            context** (i.e. the surrounding text/table of a certain window size).
+                            For extractive QA: Character where answer starts => `Answer.offsets_in_document[0].start 
+                            For TableQA: Cell where the answer starts (counted from top left to bottom right of table) => `Answer.offsets_in_document[0].start
+                            (Note that in TableQA there can be multiple cell ranges that are relevant for the answer, thus there can be multiple `Spans` here) 
+- `document_id`: ID of the document that the answer was located it (if any)
+- `meta`: Dict that can be used to associate any kind of custom meta data with the answer.
+             In extractive QA, this will carry the meta data of the document where the answer was found.
+
+<a name="schema.Answer.__lt__"></a>
+#### \_\_lt\_\_
+
+```python
+ | __lt__(other)
+```
+
+Enable sorting of Answers by score
+
+<a name="schema.Label"></a>
+## Label
+
+```python
+@dataclass
+class Label()
+```
+
+<a name="schema.Label.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(query: str, document: Document, is_correct_answer: bool, is_correct_document: bool, origin: Literal["user-feedback", "gold-label"], answer: Optional[Answer], id: Optional[str] = None, no_answer: Optional[bool] = None, pipeline_id: Optional[str] = None, created_at: Optional[str] = None, updated_at: Optional[str] = None, meta: Optional[dict] = None)
+```
+
+Object used to represent label/feedback in a standardized way within Haystack.
+This includes labels from dataset like SQuAD, annotations from labeling tools,
+or, user-feedback from the Haystack REST API.
+
+**Arguments**:
+
+- `query`: the question (or query) for finding answers.
+- `document`: 
+- `answer`: the answer object.
+- `is_correct_answer`: whether the sample is positive or negative.
+- `is_correct_document`: in case of negative sample(is_correct_answer is False), there could be two cases;
+                            incorrect answer but correct document & incorrect document. This flag denotes if
+                            the returned document was correct.
+- `origin`: the source for the labels. It can be used to later for filtering.
+- `id`: Unique ID used within the DocumentStore. If not supplied, a uuid will be generated automatically.
+- `no_answer`: whether the question in unanswerable.
+- `pipeline_id`: pipeline identifier (any str) that was involved for generating this label (in-case of user feedback).
+- `created_at`: Timestamp of creation with format yyyy-MM-dd HH:mm:ss.
+                   Generate in Python via time.strftime("%Y-%m-%d %H:%M:%S").
+- `created_at`: Timestamp of update with format yyyy-MM-dd HH:mm:ss.
+                   Generate in Python via time.strftime("%Y-%m-%d %H:%M:%S")
+- `meta`: Meta fields like "annotator_name" in the form of a custom dict (any keys and values allowed).
+
+<a name="schema.MultiLabel"></a>
+## MultiLabel
+
+```python
+@dataclass
+class MultiLabel()
+```
+
+<a name="schema.MultiLabel.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(labels: List[Label], drop_negative_labels=False, drop_no_answers=False)
+```
+
+There are often multiple `Labels` associated with a single query. For example, there can be multiple annotated
+answers for one question or multiple documents contain the information you want for a query.
+This class is "syntactic sugar" that simplifies the work with such a list of related Labels.
+It stored the original labels in MultiLabel.labels and provides additional aggregated attributes that are
+automatically created at init time. For example, MultiLabel.no_answer allows you to easily access if any of the
+underlying Labels provided a text answer and therefore demonstrates that there is indeed a possible answer.
+
+**Arguments**:
+
+- `labels`: A list lof labels that belong to a similar query and shall be "grouped" together
+- `drop_negative_labels`: Whether to drop negative labels from that group (e.g. thumbs down feedback from UI)
+- `drop_no_answers`: Whether to drop labels that specify the answer is impossible
+
--- a/docs/_src/api/api/pydoc-markdown-answer-generator.yml
+++ b/docs/_src/api/api/pydoc-markdown-answer-generator.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-crawler.yml
+++ b/docs/_src/api/api/pydoc-markdown-crawler.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-document-classifier.yml
+++ b/docs/_src/api/api/pydoc-markdown-document-classifier.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-document-store.yml
+++ b/docs/_src/api/api/pydoc-markdown-document-store.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-evaluation.yml
+++ b/docs/_src/api/api/pydoc-markdown-evaluation.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-extractor.yml
+++ b/docs/_src/api/api/pydoc-markdown-extractor.yml
@ -11,8 +11,8 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
-  filename: crawler.md
+  filename: extractor.md
--- a/docs/_src/api/api/pydoc-markdown-file-classifier.yml
+++ b/docs/_src/api/api/pydoc-markdown-file-classifier.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-file-converters.yml
+++ b/docs/_src/api/api/pydoc-markdown-file-converters.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-other.yml
+++ b/docs/_src/api/api/pydoc-markdown-other.yml
@ -11,8 +11,8 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
-  filename: ranker.md
+  filename: other.md
--- a/docs/_src/api/api/pydoc-markdown-pipelines.yml
+++ b/docs/_src/api/api/pydoc-markdown-pipelines.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-preprocessor.yml
+++ b/docs/_src/api/api/pydoc-markdown-preprocessor.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-primitives.yml
+++ b/docs/_src/api/api/pydoc-markdown-primitives.yml
@ -0,0 +1,18 @@
+loaders:
+  - type: python
+    search_path: [../../../../haystack/]
+    modules: ['schema']
+    ignore_when_discovered: ['__init__']
+processor:
+  - type: filter
+    expression: not name.startswith('_') and default()
+  - documented_only: true
+  - do_not_filter_modules: false
+  - skip_empty_modules: true
+renderer:
+  type: markdown
+  descriptive_class_title: false
+  descriptive_module_title: true
+  add_method_class_prefix: false
+  add_member_class_prefix: false
+  filename: primitives.md
--- a/docs/_src/api/api/pydoc-markdown-query-classifier.yml
+++ b/docs/_src/api/api/pydoc-markdown-query-classifier.yml
@ -0,0 +1,18 @@
+loaders:
+  - type: python
+    search_path: [../../../../haystack/nodes/query_classifier]
+    modules: ['base', 'sklearn', 'transformers']
+    ignore_when_discovered: ['__init__']
+processor:
+  - type: filter
+    expression: not name.startswith('_') and default()
+  - documented_only: true
+  - do_not_filter_modules: false
+  - skip_empty_modules: true
+renderer:
+  type: markdown
+  descriptive_class_title: false
+  descriptive_module_title: true
+  add_method_class_prefix: false
+  add_member_class_prefix: false
+  filename: query_classifier.md
--- a/docs/_src/api/api/pydoc-markdown-question-generator.yml
+++ b/docs/_src/api/api/pydoc-markdown-question-generator.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-ranker.yml
+++ b/docs/_src/api/api/pydoc-markdown-ranker.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-reader.yml
+++ b/docs/_src/api/api/pydoc-markdown-reader.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-retriever.yml
+++ b/docs/_src/api/api/pydoc-markdown-retriever.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-summarizer.yml
+++ b/docs/_src/api/api/pydoc-markdown-summarizer.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/pydoc-markdown-translator.yml
+++ b/docs/_src/api/api/pydoc-markdown-translator.yml
@ -11,7 +11,7 @@ processor:
  - skip_empty_modules: true
 renderer:
  type: markdown
-  descriptive_class_title: true
+  descriptive_class_title: false
  descriptive_module_title: true
  add_method_class_prefix: false
  add_member_class_prefix: false
--- a/docs/_src/api/api/query_classifier.md
+++ b/docs/_src/api/api/query_classifier.md
@ -0,0 +1,143 @@
+<a name="base"></a>
+# Module base
+
+<a name="base.BaseQueryClassifier"></a>
+## BaseQueryClassifier Objects
+
+```python
+class BaseQueryClassifier(BaseComponent)
+```
+
+Abstract class for Query Classifiers
+
+<a name="sklearn"></a>
+# Module sklearn
+
+<a name="sklearn.SklearnQueryClassifier"></a>
+## SklearnQueryClassifier Objects
+
+```python
+class SklearnQueryClassifier(BaseQueryClassifier)
+```
+
+A node to classify an incoming query into one of two categories using a lightweight sklearn model. Depending on the result, the query flows to a different branch in your pipeline
+and the further processing can be customized. You can define this by connecting the further pipeline to either `output_1` or `output_2` from this node.
+
+**Example**:
+
+  ```python
+  |{
+  |pipe = Pipeline()
+  |pipe.add_node(component=SklearnQueryClassifier(), name="QueryClassifier", inputs=["Query"])
+  |pipe.add_node(component=elastic_retriever, name="ElasticRetriever", inputs=["QueryClassifier.output_2"])
+  |pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
+  
+  |# Keyword queries will use the ElasticRetriever
+  |pipe.run("kubernetes aws")
+  
+  |# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
+  |pipe.run("How to manage kubernetes on aws")
+  
+  ```
+  
+  Models:
+  
+  Pass your own `Sklearn` binary classification model or use one of the following pretrained ones:
+  1) Keywords vs. Questions/Statements (Default)
+  query_classifier can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle)
+  query_vectorizer can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle)
+  output_1 => question/statement
+  output_2 => keyword query
+  [Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt)
+  
+  
+  2) Questions vs. Statements
+  query_classifier can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/model.pickle)
+  query_vectorizer can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/vectorizer.pickle)
+  output_1 => question
+  output_2 => statement
+  [Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/readme.txt)
+  
+  See also the [tutorial](https://haystack.deepset.ai/tutorials/pipelines) on pipelines.
+
+<a name="sklearn.SklearnQueryClassifier.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(model_name_or_path: Union[
+ |             str, Any
+ |         ] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle", vectorizer_name_or_path: Union[
+ |             str, Any
+ |         ] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle")
+```
+
+**Arguments**:
+
+- `model_name_or_path`: Gradient boosting based binary classifier to classify between keyword vs statement/question
+queries or statement vs question queries.
+- `vectorizer_name_or_path`: A ngram based Tfidf vectorizer for extracting features from query.
+
+<a name="transformers"></a>
+# Module transformers
+
+<a name="transformers.TransformersQueryClassifier"></a>
+## TransformersQueryClassifier Objects
+
+```python
+class TransformersQueryClassifier(BaseQueryClassifier)
+```
+
+A node to classify an incoming query into one of two categories using a (small) BERT transformer model.
+Depending on the result, the query flows to a different branch in your pipeline and the further processing
+can be customized. You can define this by connecting the further pipeline to either `output_1` or `output_2`
+from this node.
+
+**Example**:
+
+  ```python
+  |{
+  |pipe = Pipeline()
+  |pipe.add_node(component=TransformersQueryClassifier(), name="QueryClassifier", inputs=["Query"])
+  |pipe.add_node(component=elastic_retriever, name="ElasticRetriever", inputs=["QueryClassifier.output_2"])
+  |pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
+  
+  |# Keyword queries will use the ElasticRetriever
+  |pipe.run("kubernetes aws")
+  
+  |# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
+  |pipe.run("How to manage kubernetes on aws")
+  
+  ```
+  
+  Models:
+  
+  Pass your own `Transformer` binary classification model from file/huggingface or use one of the following
+  pretrained ones hosted on Huggingface:
+  1) Keywords vs. Questions/Statements (Default)
+  model_name_or_path="shahrukhx01/bert-mini-finetune-question-detection"
+  output_1 => question/statement
+  output_2 => keyword query
+  [Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt)
+  
+  
+  2) Questions vs. Statements
+  `model_name_or_path`="shahrukhx01/question-vs-statement-classifier"
+  output_1 => question
+  output_2 => statement
+  [Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/readme.txt)
+  
+  
+  See also the [tutorial](https://haystack.deepset.ai/tutorials/pipelines) on pipelines.
+
+<a name="transformers.TransformersQueryClassifier.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(model_name_or_path: Union[Path, str] = "shahrukhx01/bert-mini-finetune-question-detection", use_gpu: bool = True)
+```
+
+**Arguments**:
+
+- `model_name_or_path`: Transformer based fine tuned mini bert model for query classification
+- `use_gpu`: Whether to use GPU (if available).
+
--- a/docs/_src/api/api/reader.md
+++ b/docs/_src/api/api/reader.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BaseReader"></a>
-## BaseReader Objects
+## BaseReader

 ```python
 class BaseReader(BaseComponent)
@ -30,7 +30,7 @@ Wrapper method used to time functions.
 # Module farm

 <a name="farm.FARMReader"></a>
-## FARMReader Objects
+## FARMReader

 ```python
 class FARMReader(BaseReader)
@ -361,7 +361,7 @@ Usage:
 # Module transformers

 <a name="transformers.TransformersReader"></a>
-## TransformersReader Objects
+## TransformersReader

 ```python
 class TransformersReader(BaseReader)
@ -450,7 +450,7 @@ Dict containing query and answers
 # Module table

 <a name="table.TableReader"></a>
-## TableReader Objects
+## TableReader

 ```python
 class TableReader(BaseReader)
--- a/docs/_src/api/api/retriever.md
+++ b/docs/_src/api/api/retriever.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BaseGraphRetriever"></a>
-## BaseGraphRetriever Objects
+## BaseGraphRetriever

 ```python
 class BaseGraphRetriever(BaseComponent)
@ -11,7 +11,7 @@ class BaseGraphRetriever(BaseComponent)
 Base classfor knowledge graph retrievers.

 <a name="base.BaseRetriever"></a>
-## BaseRetriever Objects
+## BaseRetriever

 ```python
 class BaseRetriever(BaseComponent)
@ -84,7 +84,7 @@ position in the ranking of documents the correct document is.
 # Module sparse

 <a name="sparse.ElasticsearchRetriever"></a>
-## ElasticsearchRetriever Objects
+## ElasticsearchRetriever

 ```python
 class ElasticsearchRetriever(BaseRetriever)
@ -152,7 +152,7 @@ that are most relevant to the query.
 - `index`: The name of the index in the DocumentStore from which to retrieve documents

 <a name="sparse.ElasticsearchFilterOnlyRetriever"></a>
-## ElasticsearchFilterOnlyRetriever Objects
+## ElasticsearchFilterOnlyRetriever

 ```python
 class ElasticsearchFilterOnlyRetriever(ElasticsearchRetriever)
@ -179,7 +179,7 @@ that are most relevant to the query.
 - `index`: The name of the index in the DocumentStore from which to retrieve documents

 <a name="sparse.TfidfRetriever"></a>
-## TfidfRetriever Objects
+## TfidfRetriever

 ```python
 class TfidfRetriever(BaseRetriever)
@ -235,7 +235,7 @@ Performing training on this class according to the TF-IDF algorithm.
 # Module dense

 <a name="dense.DensePassageRetriever"></a>
-## DensePassageRetriever Objects
+## DensePassageRetriever

 ```python
 class DensePassageRetriever(BaseRetriever)
@ -426,7 +426,7 @@ None
 Load DensePassageRetriever from the specified directory.

 <a name="dense.TableTextRetriever"></a>
-## TableTextRetriever Objects
+## TableTextRetriever

 ```python
 class TableTextRetriever(BaseRetriever)
@ -595,7 +595,7 @@ None
 Load TableTextRetriever from the specified directory.

 <a name="dense.EmbeddingRetriever"></a>
-## EmbeddingRetriever Objects
+## EmbeddingRetriever

 ```python
 class EmbeddingRetriever(BaseRetriever)
@ -688,7 +688,7 @@ Embeddings, one per input document
 # Module text2sparql

 <a name="text2sparql.Text2SparqlRetriever"></a>
-## Text2SparqlRetriever Objects
+## Text2SparqlRetriever

 ```python
 class Text2SparqlRetriever(BaseGraphRetriever)
--- a/docs/_src/api/api/summarizer.md
+++ b/docs/_src/api/api/summarizer.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BaseSummarizer"></a>
-## BaseSummarizer Objects
+## BaseSummarizer

 ```python
 class BaseSummarizer(BaseComponent)
@ -37,7 +37,7 @@ List of Documents, where Document.text contains the summarization and Document.m
 # Module transformers

 <a name="transformers.TransformersSummarizer"></a>
-## TransformersSummarizer Objects
+## TransformersSummarizer

 ```python
 class TransformersSummarizer(BaseSummarizer)
--- a/docs/_src/api/api/translator.md
+++ b/docs/_src/api/api/translator.md
@ -2,7 +2,7 @@
 # Module base

 <a name="base.BaseTranslator"></a>
-## BaseTranslator Objects
+## BaseTranslator

 ```python
 class BaseTranslator(BaseComponent)
@ -33,7 +33,7 @@ Method that gets executed when this class is used as a Node in a Haystack Pipeli
 # Module transformers

 <a name="transformers.TransformersTranslator"></a>
-## TransformersTranslator Objects
+## TransformersTranslator

 ```python
 class TransformersTranslator(BaseTranslator)