haystack/docs/_src/api/api/other_nodes.md

<a id="docs2answers"></a>

# Module docs2answers

<a id="docs2answers.Docs2Answers"></a>

## Docs2Answers

```python
class Docs2Answers(BaseComponent)
```

This Node is used to convert retrieved documents into predicted answers format.

It is useful for situations where you are calling a Retriever only pipeline via REST API.
This ensures that your output is in a compatible format.

**Arguments**:

- `progress_bar`: Whether to show a progress bar

<a id="join_docs"></a>

# Module join\_docs

<a id="join_docs.JoinDocuments"></a>

## JoinDocuments

```python
class JoinDocuments(JoinNode)
```

A node to join documents outputted by multiple retriever nodes.

The node allows multiple join modes:
* concatenate: combine the documents from multiple nodes. Any duplicate documents are discarded.
               The score is only determined by the last node that outputs the document.
* merge: merge scores of documents from multiple nodes. Optionally, each input score can be given a different
         `weight` & a `top_k` limit can be set. This mode can also be used for "reranking" retrieved documents.
* reciprocal_rank_fusion: combines the documents based on their rank in multiple nodes.

<a id="join_docs.JoinDocuments.__init__"></a>

#### JoinDocuments.\_\_init\_\_

```python
def __init__(join_mode: str = "concatenate",
             weights: Optional[List[float]] = None,
             top_k_join: Optional[int] = None,
             sort_by_score: bool = True)
```

**Arguments**:

- `join_mode`: `concatenate` to combine documents from multiple retrievers `merge` to aggregate scores of
individual documents, `reciprocal_rank_fusion` to apply rank based scoring.
- `weights`: A node-wise list(length of list must be equal to the number of input nodes) of weights for
adjusting document scores when using the `merge` join_mode. By default, equal weight is given
to each retriever score. This param is not compatible with the `concatenate` join_mode.
- `top_k_join`: Limit documents to top_k based on the resulting scores of the join.
- `sort_by_score`: Whether to sort the incoming documents by their score. Set this to True if all your
Documents are coming with `score` values. Set to False if any of the Documents come
from sources where the `score` is set to `None`, like `TfidfRetriever` on Elasticsearch.

<a id="join_answers"></a>

# Module join\_answers

<a id="join_answers.JoinAnswers"></a>

## JoinAnswers

```python
class JoinAnswers(JoinNode)
```

A node to join `Answer`s produced by multiple `Reader` nodes.

<a id="join_answers.JoinAnswers.__init__"></a>

#### JoinAnswers.\_\_init\_\_

```python
def __init__(join_mode: str = "concatenate",
             weights: Optional[List[float]] = None,
             top_k_join: Optional[int] = None,
             sort_by_score: bool = True)
```

**Arguments**:

- `join_mode`: `"concatenate"` to combine documents from multiple `Reader`s. `"merge"` to aggregate scores
of individual `Answer`s.
- `weights`: A node-wise list (length of list must be equal to the number of input nodes) of weights for
adjusting `Answer` scores when using the `"merge"` join_mode. By default, equal weight is assigned to each
`Reader` score. This parameter is not compatible with the `"concatenate"` join_mode.
- `top_k_join`: Limit `Answer`s to top_k based on the resulting scored of the join.
- `sort_by_score`: Whether to sort the incoming answers by their score. Set this to True if your Answers
are coming from a Reader or TableReader. Set to False if any Answers come from a Generator since this assigns
None as a score to each.

<a id="route_documents"></a>

# Module route\_documents

<a id="route_documents.RouteDocuments"></a>

## RouteDocuments

```python
class RouteDocuments(BaseComponent)
```

A node to split a list of `Document`s by `content_type` or by the values of a metadata field and route them to
different nodes.

<a id="route_documents.RouteDocuments.__init__"></a>

#### RouteDocuments.\_\_init\_\_

```python
def __init__(split_by: str = "content_type",
             metadata_values: Optional[List[str]] = None)
```

**Arguments**:

- `split_by`: Field to split the documents by, either `"content_type"` or a metadata field name.
If this parameter is set to `"content_type"`, the list of `Document`s will be split into a list containing
only `Document`s of type `"text"` (will be routed to `"output_1"`) and a list containing only `Document`s of
type `"table"` (will be routed to `"output_2"`).
If this parameter is set to a metadata field name, you need to specify the parameter `metadata_values` as
well.
- `metadata_values`: If the parameter `split_by` is set to a metadata field name, you need to provide a list
of values to group the `Document`s to. `Document`s whose metadata field is equal to the first value of the
provided list will be routed to `"output_1"`, `Document`s whose metadata field is equal to the second
value of the provided list will be routed to `"output_2"`, etc.
Upgrade `pydoc-markdown` & refactor GitHub Actions (#2117) * Upgrade pydoc-markdown and fix the YAMLs to work with it * Pin pydoc-markdown to major version * Generalize pydoc-markdown workflow * Make a single Action to perform all tasks that require committing into the local branch * Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green * Installing Jupyter deps for Black * Build cache before running generation tasks * Add check not to run the code generation on master * Simplify push action * Add more test deps in setup.cfg and remove from GH Action workflow * Remove forced upgrades on pip install Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-02-04 15:45:09 +01:00			`<a id="docs2answers"></a>`

Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			`# Module docs2answers`

Upgrade `pydoc-markdown` & refactor GitHub Actions (#2117) * Upgrade pydoc-markdown and fix the YAMLs to work with it * Pin pydoc-markdown to major version * Generalize pydoc-markdown workflow * Make a single Action to perform all tasks that require committing into the local branch * Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green * Installing Jupyter deps for Black * Build cache before running generation tasks * Add check not to run the code generation on master * Simplify push action * Add more test deps in setup.cfg and remove from GH Action workflow * Remove forced upgrades on pip install Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-02-04 15:45:09 +01:00			`<a id="docs2answers.Docs2Answers"></a>`

Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			`## Docs2Answers`

			```python
			`class Docs2Answers(BaseComponent)`
			```

			`This Node is used to convert retrieved documents into predicted answers format.`
Add progress bar to batch run component ops (#2864) * Add progress bar to batch run component ops * Update docs * Update schema * PR review: thanks Bogdan 2022-08-08 15:32:44 +02:00
Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			`It is useful for situations where you are calling a Retriever only pipeline via REST API.`
			`This ensures that your output is in a compatible format.`

Add progress bar to batch run component ops (#2864) * Add progress bar to batch run component ops * Update docs * Update schema * PR review: thanks Bogdan 2022-08-08 15:32:44 +02:00			`Arguments:`

			- `progress_bar`: Whether to show a progress bar

Upgrade `pydoc-markdown` & refactor GitHub Actions (#2117) * Upgrade pydoc-markdown and fix the YAMLs to work with it * Pin pydoc-markdown to major version * Generalize pydoc-markdown workflow * Make a single Action to perform all tasks that require committing into the local branch * Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green * Installing Jupyter deps for Black * Build cache before running generation tasks * Add check not to run the code generation on master * Simplify push action * Add more test deps in setup.cfg and remove from GH Action workflow * Remove forced upgrades on pip install Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-02-04 15:45:09 +01:00			`<a id="join_docs"></a>`

Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			`# Module join\_docs`

Upgrade `pydoc-markdown` & refactor GitHub Actions (#2117) * Upgrade pydoc-markdown and fix the YAMLs to work with it * Pin pydoc-markdown to major version * Generalize pydoc-markdown workflow * Make a single Action to perform all tasks that require committing into the local branch * Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green * Installing Jupyter deps for Black * Build cache before running generation tasks * Add check not to run the code generation on master * Simplify push action * Add more test deps in setup.cfg and remove from GH Action workflow * Remove forced upgrades on pip install Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-02-04 15:45:09 +01:00			`<a id="join_docs.JoinDocuments"></a>`

Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			`## JoinDocuments`

			```python
Fix JoinAnswer/JoinNode (#2612) * fix join nodes * Update Documentation & Code Style * fix unused import * change arg order * Update Documentation & Code Style * fix kwargs check * add warning when there is only one input node * Update Documentation & Code Style * fix type hint * fix wrong import order * Update Documentation & Code Style * undo kwargs * add accidentally deleted newline# * fix type hint * fix type hint Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-06-17 16:29:15 +02:00			`class JoinDocuments(JoinNode)`
Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			```

			`A node to join documents outputted by multiple retriever nodes.`

			`The node allows multiple join modes:`
			`* concatenate: combine the documents from multiple nodes. Any duplicate documents are discarded.`
Documenting output score of JoinDocuments when using concatenation (#2561) * add documentation regarding the score of JoinDocuments when using concatenation * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-05-16 18:30:07 +02:00			`The score is only determined by the last node that outputs the document.`
Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00			`* merge: merge scores of documents from multiple nodes. Optionally, each input score can be given a different`
			`weight` & a `top_k` limit can be set. This mode can also be used for "reranking" retrieved documents.
Join node should allow reciprocal rank fusion as additional merging method (#2133) * join node should allow reciprocal rank fusion * Update Documentation & Code Style * add missing merging mode * tuples are immutable * take correct results from pipeline * Update Documentation & Code Style * Simple docstrings, use ValueError * Use K=60 * Minor refactoring * precalculate expected result in test * Update Documentation & Code Style * refactor to make more clear * rm unused imports * tests should test only one thing Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: dmigo <d.f.goryunov@gmail.com> 2022-02-10 16:58:40 +01:00			`* reciprocal_rank_fusion: combines the documents based on their rank in multiple nodes.`
Update API Reference Pages for v1.0 (#1729) * Create new API pages and update existing ones * Create query classifier page * Remove Objects suffix 2021-11-11 12:44:29 +01:00
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			`<a id="join_docs.JoinDocuments.__init__"></a>`

Adjust pydoc markdown config so methods shown with classes (#2511) * add_member_class_prefix: true * Update Documentation & Code Style * Trigger redeploy * Trigger redeploy * Fix pydoc param * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-05-06 16:00:08 +02:00			`#### JoinDocuments.\_\_init\_\_`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00
			```python
refactor: update dependencies and remove pins (#3147) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version 2022-09-05 09:30:35 -03:00			`def __init__(join_mode: str = "concatenate",`
			`weights: Optional[List[float]] = None,`
			`top_k_join: Optional[int] = None,`
			`sort_by_score: bool = True)`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			```

			`Arguments:`

			- `join_mode`: `concatenate` to combine documents from multiple retrievers `merge` to aggregate scores of
			individual documents, `reciprocal_rank_fusion` to apply rank based scoring.
			- `weights`: A node-wise list(length of list must be equal to the number of input nodes) of weights for
			adjusting document scores when using the `merge` join_mode. By default, equal weight is given
			to each retriever score. This param is not compatible with the `concatenate` join_mode.
			- `top_k_join`: Limit documents to top_k based on the resulting scores of the join.
Enable the `JoinDocuments` node to work with documents with `score=None` (#2984) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI 2022-08-11 04:43:25 -04:00			- `sort_by_score`: Whether to sort the incoming documents by their score. Set this to True if all your
			Documents are coming with `score` values. Set to False if any of the Documents come
			from sources where the `score` is set to `None`, like `TfidfRetriever` on Elasticsearch.
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00
Update other.yml with new node names (#2286) * Update other.yml with new node names * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-08 12:29:49 +01:00			`<a id="join_answers"></a>`

			`# Module join\_answers`

			`<a id="join_answers.JoinAnswers"></a>`

			`## JoinAnswers`

			```python
Fix JoinAnswer/JoinNode (#2612) * fix join nodes * Update Documentation & Code Style * fix unused import * change arg order * Update Documentation & Code Style * fix kwargs check * add warning when there is only one input node * Update Documentation & Code Style * fix type hint * fix wrong import order * Update Documentation & Code Style * undo kwargs * add accidentally deleted newline# * fix type hint * fix type hint Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-06-17 16:29:15 +02:00			`class JoinAnswers(JoinNode)`
Update other.yml with new node names (#2286) * Update other.yml with new node names * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-08 12:29:49 +01:00			```

			A node to join `Answer`s produced by multiple `Reader` nodes.

Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			`<a id="join_answers.JoinAnswers.__init__"></a>`

Adjust pydoc markdown config so methods shown with classes (#2511) * add_member_class_prefix: true * Update Documentation & Code Style * Trigger redeploy * Trigger redeploy * Fix pydoc param * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-05-06 16:00:08 +02:00			`#### JoinAnswers.\_\_init\_\_`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00
			```python
refactor: update dependencies and remove pins (#3147) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version 2022-09-05 09:30:35 -03:00			`def __init__(join_mode: str = "concatenate",`
			`weights: Optional[List[float]] = None,`
			`top_k_join: Optional[int] = None,`
			`sort_by_score: bool = True)`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			```

			`Arguments:`

			- `join_mode`: `"concatenate"` to combine documents from multiple `Reader`s. `"merge"` to aggregate scores
			of individual `Answer`s.
			- `weights`: A node-wise list (length of list must be equal to the number of input nodes) of weights for
			adjusting `Answer` scores when using the `"merge"` join_mode. By default, equal weight is assigned to each
			`Reader` score. This parameter is not compatible with the `"concatenate"` join_mode.
			- `top_k_join`: Limit `Answer`s to top_k based on the resulting scored of the join.
Add sort arg to JoinAnswers (#2436) * Add sort arg to JoinAnswers * Update Documentation & Code Style * Change naming and docstring * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-05-10 11:47:00 +02:00			- `sort_by_score`: Whether to sort the incoming answers by their score. Set this to True if your Answers
			`are coming from a Reader or TableReader. Set to False if any Answers come from a Generator since this assigns`
			`None as a score to each.`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00
Update other.yml with new node names (#2286) * Update other.yml with new node names * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-08 12:29:49 +01:00			`<a id="route_documents"></a>`

			`# Module route\_documents`

			`<a id="route_documents.RouteDocuments"></a>`

			`## RouteDocuments`

			```python
			`class RouteDocuments(BaseComponent)`
			```

			A node to split a list of `Document`s by `content_type` or by the values of a metadata field and route them to
			`different nodes.`

Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			`<a id="route_documents.RouteDocuments.__init__"></a>`

Adjust pydoc markdown config so methods shown with classes (#2511) * add_member_class_prefix: true * Update Documentation & Code Style * Trigger redeploy * Trigger redeploy * Fix pydoc param * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-05-06 16:00:08 +02:00			`#### RouteDocuments.\_\_init\_\_`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00
			```python
refactor: update dependencies and remove pins (#3147) * refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version 2022-09-05 09:30:35 -03:00			`def __init__(split_by: str = "content_type",`
			`metadata_values: Optional[List[str]] = None)`
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			```

			`Arguments:`

			- `split_by`: Field to split the documents by, either `"content_type"` or a metadata field name.
			If this parameter is set to `"content_type"`, the list of `Document`s will be split into a list containing
			only `Document`s of type `"text"` (will be routed to `"output_1"`) and a list containing only `Document`s of
Fix RouteDocuments documentation (#2380) * fix RouteDocuments documentation * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-31 11:45:02 +02:00			type `"table"` (will be routed to `"output_2"`).
Bring back init defs to api in v1.2 and latest (#2296) * Bring back init defs to api in v1.2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 2022-03-10 15:01:05 +01:00			If this parameter is set to a metadata field name, you need to specify the parameter `metadata_values` as
			`well.`
			- `metadata_values`: If the parameter `split_by` is set to a metadata field name, you need to provide a list
			of values to group the `Document`s to. `Document`s whose metadata field is equal to the first value of the
			provided list will be routed to `"output_1"`, `Document`s whose metadata field is equal to the second
			value of the provided list will be routed to `"output_2"`, etc.