mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-16 17:48:19 +00:00
187 lines
5.3 KiB
Plaintext
187 lines
5.3 KiB
Plaintext
---
|
|
title: "Data Classes"
|
|
id: "data-classes"
|
|
description: "In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline."
|
|
---
|
|
Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.
|
|
|
|
You can check out the detailed parameters in our [Data Classes](ref:data-classes-api) API reference.
|
|
|
|
## Answer
|
|
|
|
### Overview
|
|
|
|
The `Answer` class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata.
|
|
|
|
### Key Features
|
|
|
|
- Adaptable data handling, accommodating any data type (`data`).
|
|
- Query tracking for contextual relevance (`query`).
|
|
- Extensive metadata support for detailed answer description.
|
|
|
|
### Attributes
|
|
|
|
```python
|
|
@dataclass(frozen=True)
|
|
class Answer:
|
|
data: Any
|
|
query: str
|
|
meta: Dict[str, Any]
|
|
```
|
|
|
|
## ExtractedAnswer
|
|
|
|
### Overview
|
|
|
|
`ExtractedAnswer` is a subclass of `Answer` that deals explicitly with answers derived from Documents, offering more detailed attributes.
|
|
|
|
### Key Features
|
|
|
|
- Includes reference to the originating `Document`.
|
|
- Score attribute to quantify the answer's confidence level.
|
|
- Optional start and end indices for pinpointing answer location within the source.
|
|
|
|
### Attributes
|
|
|
|
```python
|
|
@dataclass
|
|
class ExtractedAnswer:
|
|
query: str
|
|
score: float
|
|
data: Optional[str] = None
|
|
document: Optional[Document] = None
|
|
context: Optional[str] = None
|
|
document_offset: Optional["Span"] = None
|
|
context_offset: Optional["Span"] = None
|
|
meta: Dict[str, Any] = field(default_factory=dict)
|
|
```
|
|
|
|
## GeneratedAnswer
|
|
|
|
### Overview
|
|
|
|
`GeneratedAnswer` extends the `Answer` class to accommodate answers generated from multiple Documents.
|
|
|
|
### Key Features
|
|
|
|
- Handles string-type data.
|
|
- Links to a list of `Document` objects, enhancing answer traceability.
|
|
|
|
### Attributes
|
|
|
|
```python
|
|
@dataclass
|
|
class GeneratedAnswer:
|
|
data: str
|
|
query: str
|
|
documents: List[Document]
|
|
meta: Dict[str, Any] = field(default_factory=dict)
|
|
```
|
|
|
|
## ByteStream
|
|
|
|
### Overview
|
|
|
|
`ByteStream` represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats.
|
|
|
|
### Key Features
|
|
|
|
- Holds binary data and associated metadata.
|
|
- Optional MIME type specification for flexibility.
|
|
- File interaction methods (`to_file`, `from_file_path`, `from_string`) for easy data manipulation.
|
|
|
|
### Attributes
|
|
|
|
```python
|
|
@dataclass(frozen=True)
|
|
class ByteStream:
|
|
data: bytes
|
|
metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
|
|
mime_type: Optional[str] = field(default=None)
|
|
```
|
|
|
|
### Example
|
|
|
|
```python
|
|
from haystack.dataclasses.byte_stream import ByteStream
|
|
|
|
image = ByteStream.from_file_path("dog.jpg")
|
|
```
|
|
|
|
## ChatMessage
|
|
|
|
`ChatMessage` is the central abstraction to represent a message for a LLM. It contains role, metadata and several types of content, including text, tool calls and tool calls results.
|
|
|
|
Read the detailed documentation for the `ChatMessage` data class on a dedicated [ChatMessage](doc:chatmessage) page.
|
|
|
|
## Document
|
|
|
|
### Overview
|
|
|
|
`Document` represents a central data abstraction in Haystack, capable of holding text, tables, and binary data.
|
|
|
|
### Key Features
|
|
|
|
- Unique ID for each document.
|
|
- Multiple content types are supported: text, binary (`blob`).
|
|
- Custom metadata and scoring for advanced document management.
|
|
- Optional embedding for AI-based applications.
|
|
|
|
### Attributes
|
|
|
|
```python
|
|
@dataclass
|
|
class Document(metaclass=_BackwardCompatible):
|
|
id: str = field(default="")
|
|
content: Optional[str] = field(default=None)
|
|
blob: Optional[ByteStream] = field(default=None)
|
|
meta: Dict[str, Any] = field(default_factory=dict)
|
|
score: Optional[float] = field(default=None)
|
|
embedding: Optional[List[float]] = field(default=None)
|
|
sparse_embedding: Optional[SparseEmbedding] = field(default=None)
|
|
```
|
|
|
|
### Example
|
|
|
|
```python
|
|
from haystack import Document
|
|
|
|
documents = Document(content="Here are the contents of your document", embedding=[0.1]*768)
|
|
```
|
|
|
|
## StreamingChunk
|
|
|
|
### Overview
|
|
|
|
`StreamingChunk` represents a partially streamed LLM response, enabling real-time LLM response.
|
|
|
|
### Key Features
|
|
|
|
- String-based content representation.
|
|
- Accompanying metadata for additional context and management.
|
|
|
|
### Attributes
|
|
|
|
```python
|
|
class StreamingChunk:
|
|
content: str
|
|
metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
|
|
```
|
|
|
|
## SparseEmbedding
|
|
|
|
### Overview
|
|
|
|
The `SparseEmbedding` class represents a sparse embedding: a vector where most values are zeros.
|
|
|
|
### Attributes
|
|
|
|
- `indices`: List of indices of non-zero elements in the embedding.
|
|
- `values`: List of values of non-zero elements in the embedding.
|
|
|
|
## Tool
|
|
|
|
`Tool` is a data class representing a tool that Language Models can prepare a call for.
|
|
|
|
Read the detailed documentation for the `Tool` data class on a dedicated [Tool](doc:tool) page.
|