haystack/docs/v0.10.0/_src/api/api/pipelines.md
Markus Paff d3fd888a76
Release Docs 0.10.0 (#1460)
* updated tutorials and docstrings and new version

* update to correct directory structure
2021-09-23 16:22:14 +02:00

871 lines
28 KiB
Markdown

<a name="pipeline"></a>
# Module pipeline
<a name="pipeline.BasePipeline"></a>
## BasePipeline Objects
```python
class BasePipeline()
```
<a name="pipeline.BasePipeline.load_from_yaml"></a>
#### load\_from\_yaml
```python
| @classmethod
| load_from_yaml(cls, path: Path, pipeline_name: Optional[str] = None, overwrite_with_env_variables: bool = True)
```
Load Pipeline from a YAML file defining the individual components and how they're tied together to form
a Pipeline. A single YAML can declare multiple Pipelines, in which case an explicit `pipeline_name` must
be passed.
Here's a sample configuration:
```yaml
| version: '0.8'
|
| components: # define all the building-blocks for Pipeline
| - name: MyReader # custom-name for the component; helpful for visualization & debugging
| type: FARMReader # Haystack Class name for the component
| params:
| no_ans_boost: -10
| model_name_or_path: deepset/roberta-base-squad2
| - name: MyESRetriever
| type: ElasticsearchRetriever
| params:
| document_store: MyDocumentStore # params can reference other components defined in the YAML
| custom_query: null
| - name: MyDocumentStore
| type: ElasticsearchDocumentStore
| params:
| index: haystack_test
|
| pipelines: # multiple Pipelines can be defined using the components from above
| - name: my_query_pipeline # a simple extractive-qa Pipeline
| nodes:
| - name: MyESRetriever
| inputs: [Query]
| - name: MyReader
| inputs: [MyESRetriever]
```
**Arguments**:
- `path`: path of the YAML file.
- `pipeline_name`: if the YAML contains multiple pipelines, the pipeline_name to load must be set.
- `overwrite_with_env_variables`: Overwrite the YAML configuration with environment variables. For example,
to change index name param for an ElasticsearchDocumentStore, an env
variable 'MYDOCSTORE_PARAMS_INDEX=documents-2021' can be set. Note that an
`_` sign must be used to specify nested hierarchical properties.
<a name="pipeline.Pipeline"></a>
## Pipeline Objects
```python
class Pipeline(BasePipeline)
```
Pipeline brings together building blocks to build a complex search pipeline with Haystack & user-defined components.
Under-the-hood, a pipeline is represented as a directed acyclic graph of component nodes. It enables custom query
flows with options to branch queries(eg, extractive qa vs keyword match query), merge candidate documents for a
Reader from multiple Retrievers, or re-ranking of candidate documents.
<a name="pipeline.Pipeline.add_node"></a>
#### add\_node
```python
| add_node(component, name: str, inputs: List[str])
```
Add a new node to the pipeline.
**Arguments**:
- `component`: The object to be called when the data is passed to the node. It can be a Haystack component
(like Retriever, Reader, or Generator) or a user-defined object that implements a run()
method to process incoming data from predecessor node.
- `name`: The name for the node. It must not contain any dots.
- `inputs`: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
must be specified explicitly as "QueryClassifier.output_2".
<a name="pipeline.Pipeline.get_node"></a>
#### get\_node
```python
| get_node(name: str) -> Optional[BaseComponent]
```
Get a node from the Pipeline.
**Arguments**:
- `name`: The name of the node.
<a name="pipeline.Pipeline.set_node"></a>
#### set\_node
```python
| set_node(name: str, component)
```
Set the component for a node in the Pipeline.
**Arguments**:
- `name`: The name of the node.
- `component`: The component object to be set at the node.
<a name="pipeline.Pipeline.get_nodes_by_class"></a>
#### get\_nodes\_by\_class
```python
| get_nodes_by_class(class_type) -> List[Any]
```
Gets all nodes in the pipeline that are an instance of a certain class (incl. subclasses).
This is for example helpful if you loaded a pipeline and then want to interact directly with the document store.
Example:
| from haystack.document_store.base import BaseDocumentStore
| INDEXING_PIPELINE = Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME)
| res = INDEXING_PIPELINE.get_nodes_by_class(class_type=BaseDocumentStore)
**Returns**:
List of components that are an instance the requested class
<a name="pipeline.Pipeline.get_document_store"></a>
#### get\_document\_store
```python
| get_document_store() -> Optional[BaseDocumentStore]
```
Return the document store object used in the current pipeline.
**Returns**:
Instance of DocumentStore or None
<a name="pipeline.Pipeline.draw"></a>
#### draw
```python
| draw(path: Path = Path("pipeline.png"))
```
Create a Graphviz visualization of the pipeline.
**Arguments**:
- `path`: the path to save the image.
<a name="pipeline.Pipeline.load_from_yaml"></a>
#### load\_from\_yaml
```python
| @classmethod
| load_from_yaml(cls, path: Path, pipeline_name: Optional[str] = None, overwrite_with_env_variables: bool = True)
```
Load Pipeline from a YAML file defining the individual components and how they're tied together to form
a Pipeline. A single YAML can declare multiple Pipelines, in which case an explicit `pipeline_name` must
be passed.
Here's a sample configuration:
```yaml
| version: '0.8'
|
| components: # define all the building-blocks for Pipeline
| - name: MyReader # custom-name for the component; helpful for visualization & debugging
| type: FARMReader # Haystack Class name for the component
| params:
| no_ans_boost: -10
| model_name_or_path: deepset/roberta-base-squad2
| - name: MyESRetriever
| type: ElasticsearchRetriever
| params:
| document_store: MyDocumentStore # params can reference other components defined in the YAML
| custom_query: null
| - name: MyDocumentStore
| type: ElasticsearchDocumentStore
| params:
| index: haystack_test
|
| pipelines: # multiple Pipelines can be defined using the components from above
| - name: my_query_pipeline # a simple extractive-qa Pipeline
| nodes:
| - name: MyESRetriever
| inputs: [Query]
| - name: MyReader
| inputs: [MyESRetriever]
```
**Arguments**:
- `path`: path of the YAML file.
- `pipeline_name`: if the YAML contains multiple pipelines, the pipeline_name to load must be set.
- `overwrite_with_env_variables`: Overwrite the YAML configuration with environment variables. For example,
to change index name param for an ElasticsearchDocumentStore, an env
variable 'MYDOCSTORE_PARAMS_INDEX=documents-2021' can be set. Note that an
`_` sign must be used to specify nested hierarchical properties.
<a name="pipeline.Pipeline.save_to_yaml"></a>
#### save\_to\_yaml
```python
| save_to_yaml(path: Path, return_defaults: bool = False)
```
Save a YAML configuration for the Pipeline that can be used with `Pipeline.load_from_yaml()`.
**Arguments**:
- `path`: path of the output YAML file.
- `return_defaults`: whether to output parameters that have the default values.
<a name="pipeline.BaseStandardPipeline"></a>
## BaseStandardPipeline Objects
```python
class BaseStandardPipeline(ABC)
```
<a name="pipeline.BaseStandardPipeline.add_node"></a>
#### add\_node
```python
| add_node(component, name: str, inputs: List[str])
```
Add a new node to the pipeline.
**Arguments**:
- `component`: The object to be called when the data is passed to the node. It can be a Haystack component
(like Retriever, Reader, or Generator) or a user-defined object that implements a run()
method to process incoming data from predecessor node.
- `name`: The name for the node. It must not contain any dots.
- `inputs`: A list of inputs to the node. If the predecessor node has a single outgoing edge, just the name
of node is sufficient. For instance, a 'ElasticsearchRetriever' node would always output a single
edge with a list of documents. It can be represented as ["ElasticsearchRetriever"].
In cases when the predecessor node has multiple outputs, e.g., a "QueryClassifier", the output
must be specified explicitly as "QueryClassifier.output_2".
<a name="pipeline.BaseStandardPipeline.get_node"></a>
#### get\_node
```python
| get_node(name: str)
```
Get a node from the Pipeline.
**Arguments**:
- `name`: The name of the node.
<a name="pipeline.BaseStandardPipeline.set_node"></a>
#### set\_node
```python
| set_node(name: str, component)
```
Set the component for a node in the Pipeline.
**Arguments**:
- `name`: The name of the node.
- `component`: The component object to be set at the node.
<a name="pipeline.BaseStandardPipeline.draw"></a>
#### draw
```python
| draw(path: Path = Path("pipeline.png"))
```
Create a Graphviz visualization of the pipeline.
**Arguments**:
- `path`: the path to save the image.
<a name="pipeline.ExtractiveQAPipeline"></a>
## ExtractiveQAPipeline Objects
```python
class ExtractiveQAPipeline(BaseStandardPipeline)
```
<a name="pipeline.ExtractiveQAPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(reader: BaseReader, retriever: BaseRetriever)
```
Initialize a Pipeline for Extractive Question Answering.
**Arguments**:
- `reader`: Reader instance
- `retriever`: Retriever instance
<a name="pipeline.ExtractiveQAPipeline.run"></a>
#### run
```python
| run(query: str, params: Optional[dict] = None)
```
**Arguments**:
- `query`: the query string.
- `params`: params for the `retriever` and `reader`. For instance,
params={"retriever": {"top_k": 10}, "reader": {"top_k": 5}}
<a name="pipeline.DocumentSearchPipeline"></a>
## DocumentSearchPipeline Objects
```python
class DocumentSearchPipeline(BaseStandardPipeline)
```
<a name="pipeline.DocumentSearchPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(retriever: BaseRetriever)
```
Initialize a Pipeline for semantic document search.
**Arguments**:
- `retriever`: Retriever instance
<a name="pipeline.DocumentSearchPipeline.run"></a>
#### run
```python
| run(query: str, params: Optional[dict] = None)
```
**Arguments**:
- `query`: the query string.
- `params`: params for the `retriever` and `reader`. For instance, params={"retriever": {"top_k": 10}}
<a name="pipeline.GenerativeQAPipeline"></a>
## GenerativeQAPipeline Objects
```python
class GenerativeQAPipeline(BaseStandardPipeline)
```
<a name="pipeline.GenerativeQAPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(generator: BaseGenerator, retriever: BaseRetriever)
```
Initialize a Pipeline for Generative Question Answering.
**Arguments**:
- `generator`: Generator instance
- `retriever`: Retriever instance
<a name="pipeline.GenerativeQAPipeline.run"></a>
#### run
```python
| run(query: str, params: Optional[dict] = None)
```
**Arguments**:
- `query`: the query string.
- `params`: params for the `retriever` and `generator`. For instance,
params={"retriever": {"top_k": 10}, "generator": {"top_k": 5}}
<a name="pipeline.SearchSummarizationPipeline"></a>
## SearchSummarizationPipeline Objects
```python
class SearchSummarizationPipeline(BaseStandardPipeline)
```
<a name="pipeline.SearchSummarizationPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(summarizer: BaseSummarizer, retriever: BaseRetriever, return_in_answer_format: bool = False)
```
Initialize a Pipeline that retrieves documents for a query and then summarizes those documents.
**Arguments**:
- `summarizer`: Summarizer instance
- `retriever`: Retriever instance
- `return_in_answer_format`: Whether the results should be returned as documents (False) or in the answer
format used in other QA pipelines (True). With the latter, you can use this
pipeline as a "drop-in replacement" for other QA pipelines.
<a name="pipeline.SearchSummarizationPipeline.run"></a>
#### run
```python
| run(query: str, params: Optional[dict] = None)
```
**Arguments**:
- `query`: the query string.
- `params`: params for the `retriever` and `summarizer`. For instance,
params={"retriever": {"top_k": 10}, "summarizer": {"generate_single_summary": True}}
<a name="pipeline.FAQPipeline"></a>
## FAQPipeline Objects
```python
class FAQPipeline(BaseStandardPipeline)
```
<a name="pipeline.FAQPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(retriever: BaseRetriever)
```
Initialize a Pipeline for finding similar FAQs using semantic document search.
**Arguments**:
- `retriever`: Retriever instance
<a name="pipeline.FAQPipeline.run"></a>
#### run
```python
| run(query: str, params: Optional[dict] = None)
```
**Arguments**:
- `query`: the query string.
- `params`: params for the `retriever`. For instance, params={"retriever": {"top_k": 10}}
<a name="pipeline.TranslationWrapperPipeline"></a>
## TranslationWrapperPipeline Objects
```python
class TranslationWrapperPipeline(BaseStandardPipeline)
```
Takes an existing search pipeline and adds one "input translation node" after the Query and one
"output translation" node just before returning the results
<a name="pipeline.TranslationWrapperPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(input_translator: BaseTranslator, output_translator: BaseTranslator, pipeline: BaseStandardPipeline)
```
Wrap a given `pipeline` with the `input_translator` and `output_translator`.
**Arguments**:
- `input_translator`: A Translator node that shall translate the input query from language A to B
- `output_translator`: A Translator node that shall translate the pipeline results from language B to A
- `pipeline`: The pipeline object (e.g. ExtractiveQAPipeline) you want to "wrap".
Note that pipelines with split or merge nodes are currently not supported.
<a name="pipeline.QuestionGenerationPipeline"></a>
## QuestionGenerationPipeline Objects
```python
class QuestionGenerationPipeline(BaseStandardPipeline)
```
A simple pipeline that takes documents as input and generates
questions that it thinks can be answered by the documents.
<a name="pipeline.RetrieverQuestionGenerationPipeline"></a>
## RetrieverQuestionGenerationPipeline Objects
```python
class RetrieverQuestionGenerationPipeline(BaseStandardPipeline)
```
A simple pipeline that takes a query as input, performs retrieval, and then generates
questions that it thinks can be answered by the retrieved documents.
<a name="pipeline.QuestionAnswerGenerationPipeline"></a>
## QuestionAnswerGenerationPipeline Objects
```python
class QuestionAnswerGenerationPipeline(BaseStandardPipeline)
```
This is a pipeline which takes a document as input, generates questions that the model thinks can be answered by
this document, and then performs question answering of this questions using that single document.
<a name="pipeline.RootNode"></a>
## RootNode Objects
```python
class RootNode(BaseComponent)
```
RootNode feeds inputs together with corresponding params to a Pipeline.
<a name="pipeline.SklearnQueryClassifier"></a>
## SklearnQueryClassifier Objects
```python
class SklearnQueryClassifier(BaseComponent)
```
A node to classify an incoming query into one of two categories using a lightweight sklearn model. Depending on the result, the query flows to a different branch in your pipeline
and the further processing can be customized. You can define this by connecting the further pipeline to either `output_1` or `output_2` from this node.
**Example**:
```python
|{
|pipe = Pipeline()
|pipe.add_node(component=SklearnQueryClassifier(), name="QueryClassifier", inputs=["Query"])
|pipe.add_node(component=elastic_retriever, name="ElasticRetriever", inputs=["QueryClassifier.output_2"])
|pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
|# Keyword queries will use the ElasticRetriever
|pipe.run("kubernetes aws")
|# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
|pipe.run("How to manage kubernetes on aws")
```
Models:
Pass your own `Sklearn` binary classification model or use one of the following pretrained ones:
1) Keywords vs. Questions/Statements (Default)
query_classifier can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle)
query_vectorizer can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle)
output_1 => question/statement
output_2 => keyword query
[Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt)
2) Questions vs. Statements
query_classifier can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/model.pickle)
query_vectorizer can be found [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/vectorizer.pickle)
output_1 => question
output_2 => statement
[Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/readme.txt)
See also the [tutorial](https://haystack.deepset.ai/tutorials/pipelines) on pipelines.
<a name="pipeline.SklearnQueryClassifier.__init__"></a>
#### \_\_init\_\_
```python
| __init__(model_name_or_path: Union[
| str, Any
| ] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle", vectorizer_name_or_path: Union[
| str, Any
| ] = "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle")
```
**Arguments**:
- `model_name_or_path`: Gradient boosting based binary classifier to classify between keyword vs statement/question
queries or statement vs question queries.
- `vectorizer_name_or_path`: A ngram based Tfidf vectorizer for extracting features from query.
<a name="pipeline.TransformersQueryClassifier"></a>
## TransformersQueryClassifier Objects
```python
class TransformersQueryClassifier(BaseComponent)
```
A node to classify an incoming query into one of two categories using a (small) BERT transformer model. Depending on the result, the query flows to a different branch in your pipeline
and the further processing can be customized. You can define this by connecting the further pipeline to either `output_1` or `output_2` from this node.
**Example**:
```python
|{
|pipe = Pipeline()
|pipe.add_node(component=TransformersQueryClassifier(), name="QueryClassifier", inputs=["Query"])
|pipe.add_node(component=elastic_retriever, name="ElasticRetriever", inputs=["QueryClassifier.output_2"])
|pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
|# Keyword queries will use the ElasticRetriever
|pipe.run("kubernetes aws")
|# Semantic queries (questions, statements, sentences ...) will leverage the DPR retriever
|pipe.run("How to manage kubernetes on aws")
```
Models:
Pass your own `Transformer` binary classification model from file/huggingface or use one of the following pretrained ones hosted on Huggingface:
1) Keywords vs. Questions/Statements (Default)
model_name_or_path="shahrukhx01/bert-mini-finetune-question-detection"
output_1 => question/statement
output_2 => keyword query
[Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt)
2) Questions vs. Statements
`model_name_or_path`="shahrukhx01/question-vs-statement-classifier"
output_1 => question
output_2 => statement
[Readme](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier_statements/readme.txt)
See also the [tutorial](https://haystack.deepset.ai/tutorials/pipelines) on pipelines.
<a name="pipeline.TransformersQueryClassifier.__init__"></a>
#### \_\_init\_\_
```python
| __init__(model_name_or_path: Union[
| Path, str
| ] = "shahrukhx01/bert-mini-finetune-question-detection")
```
**Arguments**:
- `model_name_or_path`: Transformer based fine tuned mini bert model for query classification
<a name="pipeline.JoinDocuments"></a>
## JoinDocuments Objects
```python
class JoinDocuments(BaseComponent)
```
A node to join documents outputted by multiple retriever nodes.
The node allows multiple join modes:
* concatenate: combine the documents from multiple nodes. Any duplicate documents are discarded.
* merge: merge scores of documents from multiple nodes. Optionally, each input score can be given a different
`weight` & a `top_k` limit can be set. This mode can also be used for "reranking" retrieved documents.
<a name="pipeline.JoinDocuments.__init__"></a>
#### \_\_init\_\_
```python
| __init__(join_mode: str = "concatenate", weights: Optional[List[float]] = None, top_k_join: Optional[int] = None)
```
**Arguments**:
- `join_mode`: `concatenate` to combine documents from multiple retrievers or `merge` to aggregate scores of
individual documents.
- `weights`: A node-wise list(length of list must be equal to the number of input nodes) of weights for
adjusting document scores when using the `merge` join_mode. By default, equal weight is given
to each retriever score. This param is not compatible with the `concatenate` join_mode.
- `top_k_join`: Limit documents to top_k based on the resulting scores of the join.
<a name="pipeline.RayPipeline"></a>
## RayPipeline Objects
```python
class RayPipeline(Pipeline)
```
Ray (https://ray.io) is a framework for distributed computing.
Ray allows distributing a Pipeline's components across a cluster of machines. The individual components of a
Pipeline can be independently scaled. For instance, an extractive QA Pipeline deployment can have three replicas
of the Reader and a single replica for the Retriever. It enables efficient resource utilization by horizontally
scaling Components.
To set the number of replicas, add `replicas` in the YAML config for the node in a pipeline:
```yaml
| components:
| ...
|
| pipelines:
| - name: ray_query_pipeline
| type: RayPipeline
| nodes:
| - name: ESRetriever
| replicas: 2 # number of replicas to create on the Ray cluster
| inputs: [ Query ]
```
A RayPipeline can only be created with a YAML Pipeline config.
>>> from haystack.pipeline import RayPipeline
>>> pipeline = RayPipeline.load_from_yaml(path="my_pipelines.yaml", pipeline_name="my_query_pipeline")
>>> pipeline.run(query="What is the capital of Germany?")
By default, RayPipelines creates an instance of RayServe locally. To connect to an existing Ray instance,
set the `address` parameter when creating the RayPipeline instance.
<a name="pipeline.RayPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(address: str = None, **kwargs)
```
**Arguments**:
- `address`: The IP address for the Ray cluster. If set to None, a local Ray instance is started.
- `kwargs`: Optional parameters for initializing Ray.
<a name="pipeline.RayPipeline.load_from_yaml"></a>
#### load\_from\_yaml
```python
| @classmethod
| load_from_yaml(cls, path: Path, pipeline_name: Optional[str] = None, overwrite_with_env_variables: bool = True, address: Optional[str] = None, **kwargs, ,)
```
Load Pipeline from a YAML file defining the individual components and how they're tied together to form
a Pipeline. A single YAML can declare multiple Pipelines, in which case an explicit `pipeline_name` must
be passed.
Here's a sample configuration:
```yaml
| version: '0.8'
|
| components: # define all the building-blocks for Pipeline
| - name: MyReader # custom-name for the component; helpful for visualization & debugging
| type: FARMReader # Haystack Class name for the component
| params:
| no_ans_boost: -10
| model_name_or_path: deepset/roberta-base-squad2
| - name: MyESRetriever
| type: ElasticsearchRetriever
| params:
| document_store: MyDocumentStore # params can reference other components defined in the YAML
| custom_query: null
| - name: MyDocumentStore
| type: ElasticsearchDocumentStore
| params:
| index: haystack_test
|
| pipelines: # multiple Pipelines can be defined using the components from above
| - name: my_query_pipeline # a simple extractive-qa Pipeline
| nodes:
| - name: MyESRetriever
| inputs: [Query]
| - name: MyReader
| inputs: [MyESRetriever]
```
**Arguments**:
- `path`: path of the YAML file.
- `pipeline_name`: if the YAML contains multiple pipelines, the pipeline_name to load must be set.
- `overwrite_with_env_variables`: Overwrite the YAML configuration with environment variables. For example,
to change index name param for an ElasticsearchDocumentStore, an env
variable 'MYDOCSTORE_PARAMS_INDEX=documents-2021' can be set. Note that an
`_` sign must be used to specify nested hierarchical properties.
- `address`: The IP address for the Ray cluster. If set to None, a local Ray instance is started.
<a name="pipeline._RayDeploymentWrapper"></a>
## \_RayDeploymentWrapper Objects
```python
class _RayDeploymentWrapper()
```
Ray Serve supports calling of __init__ methods on the Classes to create "deployment" instances.
In case of Haystack, some Components like Retrievers have complex init methods that needs objects
like Document Stores.
This wrapper class encapsulates the initialization of Components. Given a Component Class
name, it creates an instance using the YAML Pipeline config.
<a name="pipeline._RayDeploymentWrapper.__init__"></a>
#### \_\_init\_\_
```python
| __init__(pipeline_config: dict, component_name: str)
```
Create an instance of Component.
**Arguments**:
- `pipeline_config`: Pipeline YAML parsed as a dict.
- `component_name`: Component Class name.
<a name="pipeline._RayDeploymentWrapper.__call__"></a>
#### \_\_call\_\_
```python
| __call__(*args, **kwargs)
```
Ray calls this method which is then re-directed to the corresponding component's run().
<a name="pipeline.Docs2Answers"></a>
## Docs2Answers Objects
```python
class Docs2Answers(BaseComponent)
```
This Node is used to convert retrieved documents into predicted answers format.
It is useful for situations where you are calling a Retriever only pipeline via REST API.
This ensures that your output is in a compatible format.
<a name="pipeline.MostSimilarDocumentsPipeline"></a>
## MostSimilarDocumentsPipeline Objects
```python
class MostSimilarDocumentsPipeline(BaseStandardPipeline)
```
<a name="pipeline.MostSimilarDocumentsPipeline.__init__"></a>
#### \_\_init\_\_
```python
| __init__(document_store: BaseDocumentStore)
```
Initialize a Pipeline for finding the most similar documents to a given document.
This pipeline can be helpful if you already show a relevant document to your end users and they want to search for just similar ones.
**Arguments**:
- `document_store`: Document Store instance with already stored embeddings.
<a name="pipeline.MostSimilarDocumentsPipeline.run"></a>
#### run
```python
| run(document_ids: List[str], top_k: int = 5)
```
**Arguments**:
- `document_ids`: document ids
- `top_k`: How many documents id to return against single document