Daria Fokina 3e81ec75dc
docs: add 2.18 and 2.19 actual documentation pages (#9946)
* versioned-docs

* external-documentstores
2025-10-27 13:03:22 +01:00

86 lines
4.0 KiB
Plaintext

---
title: "AnswerExactMatchEvaluator"
id: answerexactmatchevaluator
slug: "/answerexactmatchevaluator"
description: "The `AnswerExactMatchEvaluator` evaluates answers predicted by Haystack pipelines using ground truth labels. It checks character by character whether a predicted answer exactly matches the ground truth answer. This metric is called the exact match."
---
# AnswerExactMatchEvaluator
The `AnswerExactMatchEvaluator` evaluates answers predicted by Haystack pipelines using ground truth labels. It checks character by character whether a predicted answer exactly matches the ground truth answer. This metric is called the exact match.
| | |
| --- | --- |
| **Most common position in a pipeline** | On its own or in an evaluation pipeline. To be used after a separate pipeline that has generated the inputs for the Evaluator. |
| **Mandatory run variables** | "ground_truth_answers": A list of strings containing the ground truth answers <br /> <br />"predicted_answers": A list of strings containing the predicted answers to be evaluated |
| **Output variables** | A dictionary containing: <br /> <br />\- `score`: A number from 0.0 to 1.0 representing the proportion of questions in which any predicted answer matched the ground truth answers <br /> <br />- `individual_scores`: A list of 0s and 1s, where 1 means that the predicted answer matched one of the ground truths |
| **API reference** | [Evaluators](/reference/evaluators-api) |
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/evaluators/answer_exact_match.py |
## Overview
You can use the `AnswerExactMatchEvaluator` component to evaluate answers predicted by a Haystack pipeline, such as an extractive question answering pipeline, against ground truth labels. As the `AnswerExactMatchEvaluator` checks whether a predicted answer exactly matches the ground truth answer. It is not suited to evaluate answers generated by LLMs, for example, in a RAG pipeline. Use `FaithfulnessEvaluator` or `SASEvaluator` instead.
To initialize an `AnswerExactMatchEvaluator`, there are no parameters required.
Note that only _one_ predicted answer is compared to _one_ ground truth answer at a time. The component does not support multiple ground truth answers for the same question or multiple answers predicted for the same question.
## Usage
### On its own
Below is an example of using an `AnswerExactMatchEvaluator` component to evaluate two answers and compare them to ground truth answers.
```python
from haystack.components.evaluators import AnswerExactMatchEvaluator
evaluator = AnswerExactMatchEvaluator()
result = evaluator.run(
ground_truth_answers=["Berlin", "Paris"],
predicted_answers=["Berlin", "Lyon"],
)
print(result["individual_scores"])
## [1, 0]
print(result["score"])
## 0.5
```
### In a pipeline
Below is an example where we use an `AnswerExactMatchEvaluator` and a `SASEvaluator` in a pipeline to evaluate two answers and compare them to ground truth answers. Running a pipeline instead of the individual components simplifies calculating more than one metric.
```python
from haystack import Pipeline
from haystack.components.evaluators import AnswerExactMatchEvaluator
from haystack.components.evaluators import SASEvaluator
pipeline = Pipeline()
em_evaluator = AnswerExactMatchEvaluator()
sas_evaluator = SASEvaluator()
pipeline.add_component("em_evaluator", em_evaluator)
pipeline.add_component("sas_evaluator", sas_evaluator)
ground_truth_answers = ["Berlin", "Paris"]
predicted_answers = ["Berlin", "Lyon"]
result = pipeline.run(
{
"em_evaluator": {"ground_truth_answers": ground_truth_answers,
"predicted_answers": predicted_answers},
"sas_evaluator": {"ground_truth_answers": ground_truth_answers,
"predicted_answers": predicted_answers}
}
)
for evaluator in result:
print(result[evaluator]["individual_scores"])
## [1, 0]
## [array([[0.99999994]], dtype=float32), array([[0.51747656]], dtype=float32)]
for evaluator in result:
print(result[evaluator]["score"])
## 0.5
## 0.7587383
```