mirror of
https://github.com/deepset-ai/haystack.git
synced 2026-01-07 12:37:27 +00:00
86 lines
4.0 KiB
Plaintext
86 lines
4.0 KiB
Plaintext
---
|
|
title: "AnswerExactMatchEvaluator"
|
|
id: answerexactmatchevaluator
|
|
slug: "/answerexactmatchevaluator"
|
|
description: "The `AnswerExactMatchEvaluator` evaluates answers predicted by Haystack pipelines using ground truth labels. It checks character by character whether a predicted answer exactly matches the ground truth answer. This metric is called the exact match."
|
|
---
|
|
|
|
# AnswerExactMatchEvaluator
|
|
|
|
The `AnswerExactMatchEvaluator` evaluates answers predicted by Haystack pipelines using ground truth labels. It checks character by character whether a predicted answer exactly matches the ground truth answer. This metric is called the exact match.
|
|
|
|
| | |
|
|
| --- | --- |
|
|
| **Most common position in a pipeline** | On its own or in an evaluation pipeline. To be used after a separate pipeline that has generated the inputs for the Evaluator. |
|
|
| **Mandatory run variables** | "ground_truth_answers": A list of strings containing the ground truth answers <br /> <br />"predicted_answers": A list of strings containing the predicted answers to be evaluated |
|
|
| **Output variables** | A dictionary containing: <br /> <br />\- `score`: A number from 0.0 to 1.0 representing the proportion of questions in which any predicted answer matched the ground truth answers <br /> <br />- `individual_scores`: A list of 0s and 1s, where 1 means that the predicted answer matched one of the ground truths |
|
|
| **API reference** | [Evaluators](/reference/evaluators-api) |
|
|
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/evaluators/answer_exact_match.py |
|
|
|
|
## Overview
|
|
|
|
You can use the `AnswerExactMatchEvaluator` component to evaluate answers predicted by a Haystack pipeline, such as an extractive question answering pipeline, against ground truth labels. As the `AnswerExactMatchEvaluator` checks whether a predicted answer exactly matches the ground truth answer. It is not suited to evaluate answers generated by LLMs, for example, in a RAG pipeline. Use `FaithfulnessEvaluator` or `SASEvaluator` instead.
|
|
|
|
To initialize an `AnswerExactMatchEvaluator`, there are no parameters required.
|
|
|
|
Note that only _one_ predicted answer is compared to _one_ ground truth answer at a time. The component does not support multiple ground truth answers for the same question or multiple answers predicted for the same question.
|
|
|
|
## Usage
|
|
|
|
### On its own
|
|
|
|
Below is an example of using an `AnswerExactMatchEvaluator` component to evaluate two answers and compare them to ground truth answers.
|
|
|
|
```python
|
|
from haystack.components.evaluators import AnswerExactMatchEvaluator
|
|
|
|
evaluator = AnswerExactMatchEvaluator()
|
|
result = evaluator.run(
|
|
ground_truth_answers=["Berlin", "Paris"],
|
|
predicted_answers=["Berlin", "Lyon"],
|
|
)
|
|
|
|
print(result["individual_scores"])
|
|
## [1, 0]
|
|
print(result["score"])
|
|
## 0.5
|
|
```
|
|
|
|
### In a pipeline
|
|
|
|
Below is an example where we use an `AnswerExactMatchEvaluator` and a `SASEvaluator` in a pipeline to evaluate two answers and compare them to ground truth answers. Running a pipeline instead of the individual components simplifies calculating more than one metric.
|
|
|
|
```python
|
|
from haystack import Pipeline
|
|
from haystack.components.evaluators import AnswerExactMatchEvaluator
|
|
from haystack.components.evaluators import SASEvaluator
|
|
|
|
pipeline = Pipeline()
|
|
em_evaluator = AnswerExactMatchEvaluator()
|
|
sas_evaluator = SASEvaluator()
|
|
pipeline.add_component("em_evaluator", em_evaluator)
|
|
pipeline.add_component("sas_evaluator", sas_evaluator)
|
|
|
|
ground_truth_answers = ["Berlin", "Paris"]
|
|
predicted_answers = ["Berlin", "Lyon"]
|
|
|
|
result = pipeline.run(
|
|
{
|
|
"em_evaluator": {"ground_truth_answers": ground_truth_answers,
|
|
"predicted_answers": predicted_answers},
|
|
"sas_evaluator": {"ground_truth_answers": ground_truth_answers,
|
|
"predicted_answers": predicted_answers}
|
|
}
|
|
)
|
|
|
|
for evaluator in result:
|
|
print(result[evaluator]["individual_scores"])
|
|
## [1, 0]
|
|
## [array([[0.99999994]], dtype=float32), array([[0.51747656]], dtype=float32)]
|
|
|
|
for evaluator in result:
|
|
print(result[evaluator]["score"])
|
|
## 0.5
|
|
## 0.7587383
|
|
```
|