--- title: "Retrievers" id: experimental-retrievers-api description: "Sweep through Document Stores and return a set of candidate documents that are relevant to the query." slug: "/experimental-retrievers-api" --- ## Module haystack\_experimental.components.retrievers.chat\_message\_retriever ### ChatMessageRetriever Retrieves chat messages from the underlying ChatMessageStore. Usage example: ```python from haystack.dataclasses import ChatMessage from haystack_experimental.components.retrievers import ChatMessageRetriever from haystack_experimental.chat_message_stores.in_memory import InMemoryChatMessageStore messages = [ ChatMessage.from_assistant("Hello, how can I help you?"), ChatMessage.from_user("Hi, I have a question about Python. What is a Protocol?"), ] message_store = InMemoryChatMessageStore() message_store.write_messages(messages) retriever = ChatMessageRetriever(message_store) result = retriever.run() print(result["messages"]) ``` #### ChatMessageRetriever.\_\_init\_\_ ```python def __init__(message_store: ChatMessageStore, last_k: int = 10) ``` Create the ChatMessageRetriever component. **Arguments**: - `message_store`: An instance of a ChatMessageStore. - `last_k`: The number of last messages to retrieve. Defaults to 10 messages if not specified. #### ChatMessageRetriever.to\_dict ```python def to_dict() -> Dict[str, Any] ``` Serializes the component to a dictionary. **Returns**: Dictionary with serialized data. #### ChatMessageRetriever.from\_dict ```python @classmethod def from_dict(cls, data: Dict[str, Any]) -> "ChatMessageRetriever" ``` Deserializes the component from a dictionary. **Arguments**: - `data`: The dictionary to deserialize from. **Returns**: The deserialized component. #### ChatMessageRetriever.run ```python @component.output_types(messages=List[ChatMessage]) def run(last_k: Optional[int] = None) -> Dict[str, List[ChatMessage]] ``` Run the ChatMessageRetriever **Arguments**: - `last_k`: The number of last messages to retrieve. This parameter takes precedence over the last_k parameter passed to the ChatMessageRetriever constructor. If unspecified, the last_k parameter passed to the constructor will be used. **Raises**: - `ValueError`: If last_k is not None and is less than 1 **Returns**: - `messages` - The retrieved chat messages. ## Module haystack\_experimental.components.retrievers.multi\_query\_embedding\_retriever ### MultiQueryEmbeddingRetriever A component that retrieves documents using multiple queries in parallel with an embedding-based retriever. This component takes a list of text queries, converts them to embeddings using a query embedder, and then uses an embedding-based retriever to find relevant documents for each query in parallel. The results are combined and sorted by relevance score. ### Usage example ```python from haystack import Document from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.document_stores.types import DuplicatePolicy from haystack.components.embedders import SentenceTransformersTextEmbedder from haystack.components.embedders import SentenceTransformersDocumentEmbedder from haystack.components.retrievers import InMemoryEmbeddingRetriever from haystack.components.writers import DocumentWriter from haystack_experimental.components.retrievers import MultiQueryEmbeddingRetriever documents = [ Document(content="Renewable energy is energy that is collected from renewable resources."), Document(content="Solar energy is a type of green energy that is harnessed from the sun."), Document(content="Wind energy is another type of green energy that is generated by wind turbines."), Document(content="Geothermal energy is heat that comes from the sub-surface of the earth."), Document(content="Biomass energy is produced from organic materials, such as plant and animal waste."), Document(content="Fossil fuels, such as coal, oil, and natural gas, are non-renewable energy sources."), ] # Populate the document store doc_store = InMemoryDocumentStore() doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") doc_embedder.warm_up() doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP) documents = doc_embedder.run(documents)["documents"] doc_writer.run(documents=documents) # Run the multi-query retriever in_memory_retriever = InMemoryEmbeddingRetriever(document_store=doc_store, top_k=1) query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") multi_query_retriever = MultiQueryEmbeddingRetriever( retriever=in_memory_retriever, query_embedder=query_embedder, max_workers=3 ) queries = ["Geothermal energy", "natural gas", "turbines"] result = multi_query_retriever.run(queries=queries) for doc in result["documents"]: print(f"Content: {doc.content}, Score: {doc.score}") >> Content: Geothermal energy is heat that comes from the sub-surface of the earth., Score: 0.8509603046266574 >> Content: Renewable energy is energy that is collected from renewable resources., Score: 0.42763211298893034 >> Content: Solar energy is a type of green energy that is harnessed from the sun., Score: 0.40077417016494354 >> Content: Fossil fuels, such as coal, oil, and natural gas, are non-renewable energy sources., Score: 0.3774863680995796 >> Content: Wind energy is another type of green energy that is generated by wind turbines., Score: 0.3091423972562246 >> Content: Biomass energy is produced from organic materials, such as plant and animal waste., Score: 0.25173074243668087 ``` #### MultiQueryEmbeddingRetriever.\_\_init\_\_ ```python def __init__(*, retriever: EmbeddingRetriever, query_embedder: TextEmbedder, max_workers: int = 3) -> None ``` Initialize MultiQueryEmbeddingRetriever. **Arguments**: - `retriever`: The embedding-based retriever to use for document retrieval. - `query_embedder`: The query embedder to convert text queries to embeddings. - `max_workers`: Maximum number of worker threads for parallel processing. #### MultiQueryEmbeddingRetriever.warm\_up ```python def warm_up() -> None ``` Warm up the query embedder and the retriever if any has a warm_up method. #### MultiQueryEmbeddingRetriever.run ```python @component.output_types(documents=list[Document]) def run( queries: list[str], retriever_kwargs: Optional[dict[str, Any]] = None ) -> dict[str, list[Document]] ``` Retrieve documents using multiple queries in parallel. **Arguments**: - `queries`: List of text queries to process. - `retriever_kwargs`: Optional dictionary of arguments to pass to the retriever's run method. **Returns**: A dictionary containing: - `documents`: List of retrieved documents sorted by relevance score. #### MultiQueryEmbeddingRetriever.to\_dict ```python def to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. **Returns**: A dictionary representing the serialized component. #### MultiQueryEmbeddingRetriever.from\_dict ```python @classmethod def from_dict(cls, data: dict[str, Any]) -> "MultiQueryEmbeddingRetriever" ``` Deserializes the component from a dictionary. **Arguments**: - `data`: The dictionary to deserialize from. **Returns**: The deserialized component. ## Module haystack\_experimental.components.retrievers.multi\_query\_text\_retriever ### MultiQueryTextRetriever A component that retrieves documents using multiple queries in parallel with a text-based retriever. This component takes a list of text queries and uses a text-based retriever to find relevant documents for each query in parallel, using a thread pool to manage concurrent execution. The results are combined and sorted by relevance score. You can use this component in combination with QueryExpander component to enhance the retrieval process. ### Usage example ```python from haystack import Document from haystack.components.writers import DocumentWriter from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.document_stores.types import DuplicatePolicy from haystack.components.retrievers import InMemoryBM25Retriever from haystack_experimental.components.query import QueryExpander from haystack_experimental.components.retrievers.multi_query_text_retriever import MultiQueryTextRetriever documents = [ Document(content="Renewable energy is energy that is collected from renewable resources."), Document(content="Solar energy is a type of green energy that is harnessed from the sun."), Document(content="Wind energy is another type of green energy that is generated by wind turbines."), Document(content="Hydropower is a form of renewable energy using the flow of water to generate electricity."), Document(content="Geothermal energy is heat that comes from the sub-surface of the earth.") ] document_store = InMemoryDocumentStore() doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP) doc_writer.run(documents=documents) in_memory_retriever = InMemoryBM25Retriever(document_store=document_store, top_k=1) multiquery_retriever = MultiQueryTextRetriever(retriever=in_memory_retriever) results = multiquery_retriever.run(queries=["renewable energy?", "Geothermal", "Hydropower"]) for doc in results["documents"]: print(f"Content: {doc.content}, Score: {doc.score}") >> >> Content: Geothermal energy is heat that comes from the sub-surface of the earth., Score: 1.6474448833731097 >> Content: Hydropower is a form of renewable energy using the flow of water to generate electricity., Score: 1.6157822790079805 >> Content: Renewable energy is energy that is collected from renewable resources., Score: 1.5255309812344944 ``` #### MultiQueryTextRetriever.\_\_init\_\_ ```python def __init__(*, retriever: TextRetriever, max_workers: int = 3) -> None ``` Initialize MultiQueryTextRetriever. **Arguments**: - `retriever`: The text-based retriever to use for document retrieval. - `max_workers`: Maximum number of worker threads for parallel processing. Default is 3. #### MultiQueryTextRetriever.warm\_up ```python def warm_up() -> None ``` Warm up the retriever if it has a warm_up method. #### MultiQueryTextRetriever.run ```python @component.output_types(documents=list[Document]) def run( queries: list[str], retriever_kwargs: Optional[dict[str, Any]] = None ) -> dict[str, list[Document]] ``` Retrieve documents using multiple queries in parallel. **Arguments**: - `queries`: List of text queries to process. - `retriever_kwargs`: Optional dictionary of arguments to pass to the retriever's run method. **Returns**: A dictionary containing: `documents`: List of retrieved documents sorted by relevance score. #### MultiQueryTextRetriever.to\_dict ```python def to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. **Returns**: The serialized component as a dictionary. #### MultiQueryTextRetriever.from\_dict ```python @classmethod def from_dict(cls, data: dict[str, Any]) -> "MultiQueryTextRetriever" ``` Deserializes the component from a dictionary. **Arguments**: - `data`: The dictionary to deserialize from. **Returns**: The deserialized component.