mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-08-03 06:08:40 +00:00

* fix all eof * fix test * fix test * fix test * typo * fix sample * fix sample * add logs * fix page_dynamic_result.txt
551 lines
27 KiB
Markdown
551 lines
27 KiB
Markdown
<a id="schema"></a>
|
|
|
|
# Module schema
|
|
|
|
<a id="schema.Document"></a>
|
|
|
|
## Document
|
|
|
|
```python
|
|
@dataclass
|
|
class Document()
|
|
```
|
|
|
|
<a id="schema.Document.__init__"></a>
|
|
|
|
#### Document.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(content: Union[str, pd.DataFrame], content_type: Literal["text", "table", "image", "audio"] = "text", id: Optional[str] = None, score: Optional[float] = None, meta: Dict[str, Any] = None, embedding: Optional[np.ndarray] = None, id_hash_keys: Optional[List[str]] = None)
|
|
```
|
|
|
|
One of the core data classes in Haystack. It's used to represent documents / passages in a standardized way within Haystack.
|
|
|
|
Documents are stored in DocumentStores, are returned by Retrievers, are the input for Readers and are used in
|
|
many other places that manipulate or interact with document-level data.
|
|
|
|
Note: There can be multiple Documents originating from one file (e.g. PDF), if you split the text
|
|
into smaller passages. We'll have one Document per passage in this case.
|
|
|
|
Each document has a unique ID. This can be supplied by the user or generated automatically.
|
|
It's particularly helpful for handling of duplicates and referencing documents in other objects (e.g. Labels)
|
|
|
|
There's an easy option to convert from/to dicts via `from_dict()` and `to_dict`.
|
|
|
|
**Arguments**:
|
|
|
|
- `content`: Content of the document. For most cases, this will be text, but it can be a table or image.
|
|
- `content_type`: One of "text", "table" or "image". Haystack components can use this to adjust their
|
|
handling of Documents and check compatibility.
|
|
- `id`: Unique ID for the document. If not supplied by the user, we'll generate one automatically by
|
|
creating a hash from the supplied text. This behaviour can be further adjusted by `id_hash_keys`.
|
|
- `score`: The relevance score of the Document determined by a model (e.g. Retriever or Re-Ranker).
|
|
If model's `scale_score` was set to True (default) score is in the unit interval (range of [0,1]), where 1 means extremely relevant.
|
|
- `meta`: Meta fields for a document like name, url, or author in the form of a custom dict (any keys and values allowed).
|
|
- `embedding`: Vector encoding of the text
|
|
- `id_hash_keys`: Generate the document id from a custom list of strings that refere to the documents attributes.
|
|
If you want ensure you don't have duplicate documents in your DocumentStore but texts are
|
|
not unique, you can modify the metadata and pass e.g. "meta" to this field (e.g. ["content", "meta"]).
|
|
In this case the id will be generated by using the content and the defined metadata.
|
|
|
|
<a id="schema.Document.to_dict"></a>
|
|
|
|
#### Document.to\_dict
|
|
|
|
```python
|
|
def to_dict(field_map={}) -> Dict
|
|
```
|
|
|
|
Convert Document to dict. An optional field_map can be supplied to change the names of the keys in the
|
|
|
|
resulting dict. This way you can work with standardized Document objects in Haystack, but adjust the format that
|
|
they are serialized / stored in other places (e.g. elasticsearch)
|
|
Example:
|
|
| doc = Document(content="some text", content_type="text")
|
|
| doc.to_dict(field_map={"custom_content_field": "content"})
|
|
| >>> {"custom_content_field": "some text", content_type": "text"}
|
|
|
|
**Arguments**:
|
|
|
|
- `field_map`: Dict with keys being the custom target keys and values being the standard Document attributes
|
|
|
|
**Returns**:
|
|
|
|
dict with content of the Document
|
|
|
|
<a id="schema.Document.from_dict"></a>
|
|
|
|
#### Document.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, dict: Dict[str, Any], field_map: Dict[str, Any] = {}, id_hash_keys: Optional[List[str]] = None) -> Document
|
|
```
|
|
|
|
Create Document from dict. An optional field_map can be supplied to adjust for custom names of the keys in the
|
|
|
|
input dict. This way you can work with standardized Document objects in Haystack, but adjust the format that
|
|
they are serialized / stored in other places (e.g. elasticsearch)
|
|
Example:
|
|
| my_dict = {"custom_content_field": "some text", content_type": "text"}
|
|
| Document.from_dict(my_dict, field_map={"custom_content_field": "content"})
|
|
|
|
**Arguments**:
|
|
|
|
- `field_map`: Dict with keys being the custom target keys and values being the standard Document attributes
|
|
|
|
**Returns**:
|
|
|
|
dict with content of the Document
|
|
|
|
<a id="schema.Document.__lt__"></a>
|
|
|
|
#### Document.\_\_lt\_\_
|
|
|
|
```python
|
|
def __lt__(other)
|
|
```
|
|
|
|
Enable sorting of Documents by score
|
|
|
|
<a id="schema.SpeechDocument"></a>
|
|
|
|
## SpeechDocument
|
|
|
|
```python
|
|
@dataclass
|
|
class SpeechDocument(Document)
|
|
```
|
|
|
|
Text-based document that also contains some accessory audio information
|
|
(either generated from the text with text to speech nodes, or extracted
|
|
from an audio source containing spoken words).
|
|
|
|
Note: for documents of this type the primary information source is *text*,
|
|
so this is _not_ an audio document. The embeddings are computed on the textual
|
|
representation and will work with regular, text-based nodes and pipelines.
|
|
|
|
<a id="schema.Span"></a>
|
|
|
|
## Span
|
|
|
|
```python
|
|
@dataclass
|
|
class Span()
|
|
```
|
|
|
|
<a id="schema.Span.end"></a>
|
|
|
|
#### end
|
|
|
|
Defining a sequence of characters (Text span) or cells (Table span) via start and end index.
|
|
|
|
For extractive QA: Character where answer starts/ends
|
|
For TableQA: Cell where the answer starts/ends (counted from top left to bottom right of table)
|
|
|
|
**Arguments**:
|
|
|
|
- `start`: Position where the span starts
|
|
- `end`: Position where the spand ends
|
|
|
|
<a id="schema.Answer"></a>
|
|
|
|
## Answer
|
|
|
|
```python
|
|
@dataclass
|
|
class Answer()
|
|
```
|
|
|
|
<a id="schema.Answer.meta"></a>
|
|
|
|
#### meta
|
|
|
|
The fundamental object in Haystack to represent any type of Answers (e.g. extractive QA, generative QA or TableQA).
|
|
|
|
For example, it's used within some Nodes like the Reader, but also in the REST API.
|
|
|
|
**Arguments**:
|
|
|
|
- `answer`: The answer string. If there's no possible answer (aka "no_answer" or "is_impossible) this will be an empty string.
|
|
- `type`: One of ("generative", "extractive", "other"): Whether this answer comes from an extractive model
|
|
(i.e. we can locate an exact answer string in one of the documents) or from a generative model
|
|
(i.e. no pointer to a specific document, no offsets ...).
|
|
- `score`: The relevance score of the Answer determined by a model (e.g. Reader or Generator).
|
|
In the range of [0,1], where 1 means extremely relevant.
|
|
- `context`: The related content that was used to create the answer (i.e. a text passage, part of a table, image ...)
|
|
- `offsets_in_document`: List of `Span` objects with start and end positions of the answer **in the
|
|
document** (as stored in the document store).
|
|
For extractive QA: Character where answer starts => `Answer.offsets_in_document[0].start
|
|
For TableQA: Cell where the answer starts (counted from top left to bottom right of table) => `Answer.offsets_in_document[0].start
|
|
(Note that in TableQA there can be multiple cell ranges that are relevant for the answer, thus there can be multiple `Spans` here)
|
|
- `offsets_in_context`: List of `Span` objects with start and end positions of the answer **in the
|
|
context** (i.e. the surrounding text/table of a certain window size).
|
|
For extractive QA: Character where answer starts => `Answer.offsets_in_document[0].start
|
|
For TableQA: Cell where the answer starts (counted from top left to bottom right of table) => `Answer.offsets_in_document[0].start
|
|
(Note that in TableQA there can be multiple cell ranges that are relevant for the answer, thus there can be multiple `Spans` here)
|
|
- `document_id`: ID of the document that the answer was located it (if any)
|
|
- `meta`: Dict that can be used to associate any kind of custom meta data with the answer.
|
|
In extractive QA, this will carry the meta data of the document where the answer was found.
|
|
|
|
<a id="schema.Answer.__lt__"></a>
|
|
|
|
#### Answer.\_\_lt\_\_
|
|
|
|
```python
|
|
def __lt__(other)
|
|
```
|
|
|
|
Enable sorting of Answers by score
|
|
|
|
<a id="schema.SpeechAnswer"></a>
|
|
|
|
## SpeechAnswer
|
|
|
|
```python
|
|
@dataclass
|
|
class SpeechAnswer(Answer)
|
|
```
|
|
|
|
Text-based answer that also contains some accessory audio information
|
|
(either generated from the text with text to speech nodes, or extracted
|
|
from an audio source containing spoken words).
|
|
|
|
Note: for answer of this type the primary information source is *text*,
|
|
so this is _not_ an audio document. The embeddings are computed on the textual
|
|
representation and will work with regular, text-based nodes and pipelines.
|
|
|
|
<a id="schema.Label"></a>
|
|
|
|
## Label
|
|
|
|
```python
|
|
@dataclass
|
|
class Label()
|
|
```
|
|
|
|
<a id="schema.Label.__init__"></a>
|
|
|
|
#### Label.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(query: str, document: Document, is_correct_answer: bool, is_correct_document: bool, origin: Literal["user-feedback", "gold-label"], answer: Optional[Answer], id: Optional[str] = None, no_answer: Optional[bool] = None, pipeline_id: Optional[str] = None, created_at: Optional[str] = None, updated_at: Optional[str] = None, meta: Optional[dict] = None, filters: Optional[dict] = None)
|
|
```
|
|
|
|
Object used to represent label/feedback in a standardized way within Haystack.
|
|
|
|
This includes labels from dataset like SQuAD, annotations from labeling tools,
|
|
or, user-feedback from the Haystack REST API.
|
|
|
|
**Arguments**:
|
|
|
|
- `query`: the question (or query) for finding answers.
|
|
- `document`:
|
|
- `answer`: the answer object.
|
|
- `is_correct_answer`: whether the sample is positive or negative.
|
|
- `is_correct_document`: in case of negative sample(is_correct_answer is False), there could be two cases;
|
|
incorrect answer but correct document & incorrect document. This flag denotes if
|
|
the returned document was correct.
|
|
- `origin`: the source for the labels. It can be used to later for filtering.
|
|
- `id`: Unique ID used within the DocumentStore. If not supplied, a uuid will be generated automatically.
|
|
- `no_answer`: whether the question in unanswerable.
|
|
- `pipeline_id`: pipeline identifier (any str) that was involved for generating this label (in-case of user feedback).
|
|
- `created_at`: Timestamp of creation with format yyyy-MM-dd HH:mm:ss.
|
|
Generate in Python via time.strftime("%Y-%m-%d %H:%M:%S").
|
|
- `created_at`: Timestamp of update with format yyyy-MM-dd HH:mm:ss.
|
|
Generate in Python via time.strftime("%Y-%m-%d %H:%M:%S")
|
|
- `meta`: Meta fields like "annotator_name" in the form of a custom dict (any keys and values allowed).
|
|
- `filters`: filters that should be applied to the query to rule out non-relevant documents. For example, if there are different correct answers
|
|
in a DocumentStore depending on the retrieved document and the answer in this label is correct only on condition of the filters.
|
|
|
|
<a id="schema.MultiLabel"></a>
|
|
|
|
## MultiLabel
|
|
|
|
```python
|
|
@dataclass
|
|
class MultiLabel()
|
|
```
|
|
|
|
<a id="schema.MultiLabel.__init__"></a>
|
|
|
|
#### MultiLabel.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(labels: List[Label], drop_negative_labels=False, drop_no_answers=False)
|
|
```
|
|
|
|
There are often multiple `Labels` associated with a single query. For example, there can be multiple annotated
|
|
|
|
answers for one question or multiple documents contain the information you want for a query.
|
|
This class is "syntactic sugar" that simplifies the work with such a list of related Labels.
|
|
It stored the original labels in MultiLabel.labels and provides additional aggregated attributes that are
|
|
automatically created at init time. For example, MultiLabel.no_answer allows you to easily access if any of the
|
|
underlying Labels provided a text answer and therefore demonstrates that there is indeed a possible answer.
|
|
|
|
**Arguments**:
|
|
|
|
- `labels`: A list of labels that belong to a similar query and shall be "grouped" together
|
|
- `drop_negative_labels`: Whether to drop negative labels from that group (e.g. thumbs down feedback from UI)
|
|
- `drop_no_answers`: Whether to drop labels that specify the answer is impossible
|
|
|
|
<a id="schema.EvaluationResult"></a>
|
|
|
|
## EvaluationResult
|
|
|
|
```python
|
|
class EvaluationResult()
|
|
```
|
|
|
|
<a id="schema.EvaluationResult.__init__"></a>
|
|
|
|
#### EvaluationResult.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(node_results: Dict[str, pd.DataFrame] = None) -> None
|
|
```
|
|
|
|
A convenience class to store, pass, and interact with results of a pipeline evaluation run (for example `pipeline.eval()`).
|
|
|
|
Detailed results are stored as one dataframe per node. This class makes them more accessible and provides
|
|
convenience methods to work with them.
|
|
For example, you can calculate eval metrics, get detailed reports, or simulate different top_k settings:
|
|
|
|
```python
|
|
| eval_results = pipeline.eval(...)
|
|
|
|
|
| # derive detailed metrics
|
|
| eval_results.calculate_metrics()
|
|
|
|
|
| # show summary of incorrect queries
|
|
| eval_results.wrong_examples()
|
|
```
|
|
|
|
Each row of the underlying DataFrames contains either an answer or a document that has been retrieved during evaluation.
|
|
Rows are enriched with basic information like rank, query, type, or node.
|
|
Additional answer or document-specific evaluation information, like gold labels
|
|
and metrics showing whether the row matches the gold labels, are included, too.
|
|
The DataFrames have the following schema:
|
|
- multilabel_id: The ID of the multilabel, which is unique for the pair of query and filters.
|
|
- query: The actual query string.
|
|
- filters: The filters used with the query.
|
|
- gold_answers (answers only): The expected answers.
|
|
- answer (answers only): The actual answer.
|
|
- context: The content of the document (the surrounding context of the answer for QA).
|
|
- exact_match (answers only): A metric showing if the answer exactly matches the gold label.
|
|
- f1 (answers only): A metric showing how well the answer overlaps with the gold label on a token basis.
|
|
- sas (answers only, optional): A metric showing how well the answer matches the gold label on a semantic basis.
|
|
- exact_match_context_scope (answers only): exact_match with enforced context match.
|
|
- f1_context_scope (answers only): f1 with enforced context scope match.
|
|
- sas_context_scope (answers only): sas with enforced context scope match.
|
|
- exact_match_document_scope (answers only): exact_match with enforced document scope match.
|
|
- f1_document_scope (answers only): f1 with enforced document scope match.
|
|
- sas_document_scope (answers only): sas with enforced document scope match.
|
|
- exact_match_document_id_and_context_scope: (answers only): exact_match with enforced document and context scope match.
|
|
- f1_document_id_and_context_scope (answers only): f1 with enforced document and context scope match.
|
|
- sas_document_id_and_context_scope (answers only): sas with enforced document and context scope match.
|
|
- gold_contexts: The contents of the gold documents.
|
|
- gold_id_match (documents only): A metric showing whether one of the gold document IDs matches the document.
|
|
- context_match (documents only): A metric showing whether one of the gold contexts matches the document content.
|
|
- answer_match (documents only): A metric showing whether the document contains the answer.
|
|
- gold_id_or_answer_match (documents only): A Boolean operation specifying that there should be either `'gold_id_match' OR 'answer_match'`.
|
|
- gold_id_and_answer_match (documents only): A Boolean operation specifying that there should be both `'gold_id_match' AND 'answer_match'`.
|
|
- gold_id_or_context_match (documents only): A Boolean operation specifying that there should be either `'gold_id_match' OR 'context_match'`.
|
|
- gold_id_and_context_match (documents only): A Boolean operation specifying that there should be both `'gold_id_match' AND 'context_match'`.
|
|
- gold_id_and_context_and_answer_match (documents only): A Boolean operation specifying that there should be `'gold_id_match' AND 'context_match' AND 'answer_match'`.
|
|
- context_and_answer_match (documents only): A Boolean operation specifying that there should be both `'context_match' AND 'answer_match'`.
|
|
- rank: A rank or 1-based-position in the result list.
|
|
- document_id: The ID of the document that has been retrieved or that contained the answer.
|
|
- gold_document_ids: The IDs of the documents to be retrieved.
|
|
- custom_document_id: The custom ID of the document (specified by `custom_document_id_field`) that has been retrieved or that contained the answer.
|
|
- gold_custom_document_ids: The custom documents IDs (specified by `custom_document_id_field`) to be retrieved.
|
|
- offsets_in_document (answers only): The position or offsets within the document where the answer was found.
|
|
- gold_offsets_in_documents (answers only): The position or offsets of the gold answer within the document.
|
|
- gold_answers_exact_match (answers only): exact_match values per gold_answer.
|
|
- gold_answers_f1 (answers only): f1 values per gold_answer.
|
|
- gold_answers_sas (answers only): sas values per gold answer.
|
|
- gold_documents_id_match: The document ID match per gold label (if `custom_document_id_field` has been specified, custom IDs are used).
|
|
- gold_contexts_similarity: Context similarity per gold label.
|
|
- gold_answers_match (documents only): Specifies whether the document contains an answer per gold label.
|
|
- type: Possible values: 'answer' or 'document'.
|
|
- node: The node name
|
|
- eval_mode: Specifies whether the evaluation was executed in integrated or isolated mode.
|
|
Check pipeline.eval()'s add_isolated_node_eval parameter for more information.
|
|
|
|
**Arguments**:
|
|
|
|
- `node_results`: The evaluation Dataframes per pipeline node.
|
|
|
|
<a id="schema.EvaluationResult.calculate_metrics"></a>
|
|
|
|
#### EvaluationResult.calculate\_metrics
|
|
|
|
```python
|
|
def calculate_metrics(simulated_top_k_reader: int = -1, simulated_top_k_retriever: int = -1, document_scope: Literal[
|
|
"document_id",
|
|
"context",
|
|
"document_id_and_context",
|
|
"document_id_or_context",
|
|
"answer",
|
|
"document_id_or_answer",
|
|
] = "document_id_or_answer", eval_mode: Literal["integrated", "isolated"] = "integrated", answer_scope: Literal["any", "context", "document_id", "document_id_and_context"] = "any") -> Dict[str, Dict[str, float]]
|
|
```
|
|
|
|
Calculates proper metrics for each node.
|
|
|
|
For Nodes that return Documents, the default metrics are:
|
|
- mrr (`Mean Reciprocal Rank <https://en.wikipedia.org/wiki/Mean_reciprocal_rank>`_)
|
|
- map (`Mean Average Precision <https://en.wikipedia.org/wiki/Evaluation_measures_%28information_retrieval%29#Mean_average_precision>`_)
|
|
- ndcg (`Normalized Discounted Cumulative Gain <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_)
|
|
- precision (Precision: How many of the returned documents were relevant?)
|
|
- recall_multi_hit (Recall according to Information Retrieval definition: How many of the relevant documents were retrieved per query?)
|
|
- recall_single_hit (Recall for Question Answering: How many of the queries returned at least one relevant document?)
|
|
|
|
For Nodes that return answers, the default metrics are:
|
|
- exact_match (How many of the queries returned the exact answer?)
|
|
- f1 (How well do the returned results overlap with any gold answer on a token basis?)
|
|
- sas, if a SAS model has been provided when calling `pipeline.eval()` (How semantically similar is the prediction to the gold answers?)
|
|
|
|
During the eval run, you can simulate lower top_k values for Reader and Retriever than the actual values.
|
|
For example, you can calculate `top_1_f1` for Reader nodes by setting `simulated_top_k_reader=1`.
|
|
|
|
If you applied `simulated_top_k_retriever` to a Reader node, you should treat the results with caution as they can differ from an actual eval run with a corresponding `top_k_retriever` heavily.
|
|
|
|
**Arguments**:
|
|
|
|
- `simulated_top_k_reader`: Simulates the `top_k` parameter of the Reader.
|
|
- `simulated_top_k_retriever`: Simulates the `top_k` parameter of the Retriever.
|
|
Note: There might be a discrepancy between simulated Reader metrics and an actual Pipeline run with Retriever `top_k`.
|
|
- `eval_mode`: The input the Node was evaluated on.
|
|
Usually a Node gets evaluated on the prediction provided by its predecessor Nodes in the Pipeline (`value='integrated'`).
|
|
However, as the quality of the Node can heavily depend on the Node's input and thus the predecessor's quality,
|
|
you might want to simulate a perfect predecessor in order to get an independent upper bound of the quality of your Node.
|
|
For example, when evaluating the Reader, use `value='isolated'` to simulate a perfect Retriever in an ExtractiveQAPipeline.
|
|
Possible values are: `integrated`, `isolated`.
|
|
The default value is `integrated`.
|
|
- `document_scope`: A criterion for deciding whether documents are relevant or not.
|
|
You can select between:
|
|
- 'document_id': Specifies that the document ID must match. You can specify a custom document ID through `pipeline.eval()`'s `custom_document_id_field` param.
|
|
A typical use case is Document Retrieval.
|
|
- 'context': Specifies that the content of the document must match. Uses fuzzy matching (see `pipeline.eval()`'s `context_matching_...` params).
|
|
A typical use case is Document-Independent Passage Retrieval.
|
|
- 'document_id_and_context': A Boolean operation specifying that both `'document_id' AND 'context'` must match.
|
|
A typical use case is Document-Specific Passage Retrieval.
|
|
- 'document_id_or_context': A Boolean operation specifying that either `'document_id' OR 'context'` must match.
|
|
A typical use case is Document Retrieval having sparse context labels.
|
|
- 'answer': Specifies that the document contents must include the answer. The selected `answer_scope` is enforced automatically.
|
|
A typical use case is Question Answering.
|
|
- 'document_id_or_answer' (default): A Boolean operation specifying that either `'document_id' OR 'answer'` must match.
|
|
This is intended to be a proper default value in order to support both main use cases:
|
|
- Document Retrieval
|
|
- Question Answering
|
|
The default value is 'document_id_or_answer'.
|
|
- `answer_scope`: Specifies the scope in which a matching answer is considered correct.
|
|
You can select between:
|
|
- 'any' (default): Any matching answer is considered correct.
|
|
- 'context': The answer is only considered correct if its context matches as well.
|
|
Uses fuzzy matching (see `pipeline.eval()`'s `context_matching_...` params).
|
|
- 'document_id': The answer is only considered correct if its document ID matches as well.
|
|
You can specify a custom document ID through `pipeline.eval()`'s `custom_document_id_field` param.
|
|
- 'document_id_and_context': The answer is only considered correct if its document ID and its context match as well.
|
|
The default value is 'any'.
|
|
In Question Answering, to enforce that the retrieved document is considered correct whenever the answer is correct, set `document_scope` to 'answer' or 'document_id_or_answer'.
|
|
|
|
<a id="schema.EvaluationResult.wrong_examples"></a>
|
|
|
|
#### EvaluationResult.wrong\_examples
|
|
|
|
```python
|
|
def wrong_examples(node: str, n: int = 3, simulated_top_k_reader: int = -1, simulated_top_k_retriever: int = -1, document_scope: Literal[
|
|
"document_id",
|
|
"context",
|
|
"document_id_and_context",
|
|
"document_id_or_context",
|
|
"answer",
|
|
"document_id_or_answer",
|
|
] = "document_id_or_answer", document_metric: str = "recall_single_hit", answer_metric: str = "f1", eval_mode: Literal["integrated", "isolated"] = "integrated", answer_scope: Literal["any", "context", "document_id", "document_id_and_context"] = "any") -> List[Dict]
|
|
```
|
|
|
|
Returns the worst performing queries.
|
|
|
|
Worst performing queries are calculated based on the metric
|
|
that is either a document metric or an answer metric according to the node type.
|
|
|
|
Lower top_k values for reader and retriever than the actual values during the eval run can be simulated.
|
|
See calculate_metrics() for more information.
|
|
|
|
**Arguments**:
|
|
|
|
- `simulated_top_k_reader`: simulates top_k param of reader
|
|
- `simulated_top_k_retriever`: simulates top_k param of retriever.
|
|
remarks: there might be a discrepancy between simulated reader metrics and an actual pipeline run with retriever top_k
|
|
- `document_metric`: the document metric worst queries are calculated with.
|
|
values can be: 'recall_single_hit', 'recall_multi_hit', 'mrr', 'map', 'precision'
|
|
- `document_metric`: the answer metric worst queries are calculated with.
|
|
values can be: 'f1', 'exact_match' and 'sas' if the evaluation was made using a SAS model.
|
|
- `eval_mode`: the input on which the node was evaluated on.
|
|
Usually nodes get evaluated on the prediction provided by its predecessor nodes in the pipeline (value='integrated').
|
|
However, as the quality of the node itself can heavily depend on the node's input and thus the predecessor's quality,
|
|
you might want to simulate a perfect predecessor in order to get an independent upper bound of the quality of your node.
|
|
For example when evaluating the reader use value='isolated' to simulate a perfect retriever in an ExtractiveQAPipeline.
|
|
Values can be 'integrated', 'isolated'.
|
|
Default value is 'integrated'.
|
|
- `document_scope`: A criterion for deciding whether documents are relevant or not.
|
|
You can select between:
|
|
- 'document_id': Specifies that the document ID must match. You can specify a custom document ID through `pipeline.eval()`'s `custom_document_id_field` param.
|
|
A typical use case is Document Retrieval.
|
|
- 'context': Specifies that the content of the document must match. Uses fuzzy matching (see `pipeline.eval()`'s `context_matching_...` params).
|
|
A typical use case is Document-Independent Passage Retrieval.
|
|
- 'document_id_and_context': A Boolean operation specifying that both `'document_id' AND 'context'` must match.
|
|
A typical use case is Document-Specific Passage Retrieval.
|
|
- 'document_id_or_context': A Boolean operation specifying that either `'document_id' OR 'context'` must match.
|
|
A typical use case is Document Retrieval having sparse context labels.
|
|
- 'answer': Specifies that the document contents must include the answer. The selected `answer_scope` is enforced automatically.
|
|
A typical use case is Question Answering.
|
|
- 'document_id_or_answer' (default): A Boolean operation specifying that either `'document_id' OR 'answer'` must match.
|
|
This is intended to be a proper default value in order to support both main use cases:
|
|
- Document Retrieval
|
|
- Question Answering
|
|
The default value is 'document_id_or_answer'.
|
|
- `answer_scope`: Specifies the scope in which a matching answer is considered correct.
|
|
You can select between:
|
|
- 'any' (default): Any matching answer is considered correct.
|
|
- 'context': The answer is only considered correct if its context matches as well.
|
|
Uses fuzzy matching (see `pipeline.eval()`'s `context_matching_...` params).
|
|
- 'document_id': The answer is only considered correct if its document ID matches as well.
|
|
You can specify a custom document ID through `pipeline.eval()`'s `custom_document_id_field` param.
|
|
- 'document_id_and_context': The answer is only considered correct if its document ID and its context match as well.
|
|
The default value is 'any'.
|
|
In Question Answering, to enforce that the retrieved document is considered correct whenever the answer is correct, set `document_scope` to 'answer' or 'document_id_or_answer'.
|
|
|
|
<a id="schema.EvaluationResult.save"></a>
|
|
|
|
#### EvaluationResult.save
|
|
|
|
```python
|
|
def save(out_dir: Union[str, Path])
|
|
```
|
|
|
|
Saves the evaluation result.
|
|
|
|
The result of each node is saved in a separate csv with file name {node_name}.csv to the out_dir folder.
|
|
|
|
**Arguments**:
|
|
|
|
- `out_dir`: Path to the target folder the csvs will be saved.
|
|
|
|
<a id="schema.EvaluationResult.load"></a>
|
|
|
|
#### EvaluationResult.load
|
|
|
|
```python
|
|
@classmethod
|
|
def load(cls, load_dir: Union[str, Path])
|
|
```
|
|
|
|
Loads the evaluation result from disk. Expects one csv file per node. See save() for further information.
|
|
|
|
**Arguments**:
|
|
|
|
- `load_dir`: The directory containing the csv files.
|