mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-13 15:57:24 +00:00
Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995)
* print report for closed-domain eval * Add latest docstring and tutorial changes * rename parameter and rewrite docs * Add latest docstring and tutorial changes * print eval report in separate cell * Add latest docstring and tutorial changes * explain when to eval individual components Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This commit is contained in:
parent
5695d721aa
commit
2c063e960e
@ -282,6 +282,24 @@ metrics = advanced_eval_result.calculate_metrics()
|
|||||||
print(metrics["Reader"]["sas"])
|
print(metrics["Reader"]["sas"])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Isolated Evaluation Mode to Understand Upper Bounds of the Reader's Performance
|
||||||
|
The isolated node evaluation uses labels as input to the reader node instead of the output of the preceeding retriever node.
|
||||||
|
Thereby, we can additionally calculate the upper bounds of the evaluation metrics of the reader.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
eval_result_with_upper_bounds = pipeline.eval(
|
||||||
|
labels=eval_labels,
|
||||||
|
params={"Retriever": {"top_k": 1}},
|
||||||
|
add_isolated_node_eval=True
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.print_eval_report(eval_result_with_upper_bounds)
|
||||||
|
```
|
||||||
|
|
||||||
## Evaluation of Individual Components: Retriever
|
## Evaluation of Individual Components: Retriever
|
||||||
Here we evaluate only the retriever, based on whether the gold_label document is retrieved.
|
Here we evaluate only the retriever, based on whether the gold_label document is retrieved.
|
||||||
|
|
||||||
|
|||||||
@ -15346,10 +15346,54 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Isolated Evaluation Mode to Understand Upper Bounds of the Reader's Performance\n",
|
||||||
|
"The isolated node evaluation uses labels as input to the reader node instead of the output of the preceeding retriever node.\n",
|
||||||
|
"Thereby, we can additionally calculate the upper bounds of the evaluation metrics of the reader."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"eval_result_with_upper_bounds = pipeline.eval(\n",
|
||||||
|
" labels=eval_labels,\n",
|
||||||
|
" params={\"Retriever\": {\"top_k\": 1}},\n",
|
||||||
|
" add_isolated_node_eval=True\n",
|
||||||
|
" )"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false,
|
||||||
|
"pycharm": {
|
||||||
|
"name": "#%%\n"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"pipeline.print_eval_report(eval_result_with_upper_bounds)"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false,
|
||||||
|
"pycharm": {
|
||||||
|
"name": "#%%\n"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"source": [
|
"source": [
|
||||||
"## Evaluation of Individual Components: Retriever\n",
|
"## Evaluation of Individual Components: Retriever\n",
|
||||||
|
"Sometimes you might want to evaluate individual components, for example, if you don't have a pipeline but only a retriever or a reader with a model that you trained yourself.\n",
|
||||||
"Here we evaluate only the retriever, based on whether the gold_label document is retrieved."
|
"Here we evaluate only the retriever, based on whether the gold_label document is retrieved."
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
@ -179,7 +179,18 @@ def tutorial5_evaluation():
|
|||||||
metrics = advanced_eval_result.calculate_metrics()
|
metrics = advanced_eval_result.calculate_metrics()
|
||||||
print(metrics["Reader"]["sas"])
|
print(metrics["Reader"]["sas"])
|
||||||
|
|
||||||
|
## Isolated Evaluation Mode to Understand Upper Bounds of the Reader's Performance
|
||||||
|
# The isolated node evaluation uses labels as input to the reader node instead of the output of the preceeding retriever node.
|
||||||
|
# Thereby, we can additionally calculate the upper bounds of the evaluation metrics of the reader.
|
||||||
|
eval_result_with_upper_bounds = pipeline.eval(
|
||||||
|
labels=eval_labels,
|
||||||
|
params={"Retriever": {"top_k": 1}},
|
||||||
|
add_isolated_node_eval=True
|
||||||
|
)
|
||||||
|
pipeline.print_eval_report(eval_result_with_upper_bounds)
|
||||||
|
|
||||||
|
## Evaluation of Individual Components
|
||||||
|
# Sometimes you might want to evaluate individual components, for example, if you don't have a pipeline but only a retriever or a reader with a model that you trained yourself.
|
||||||
# Evaluate Retriever on its own
|
# Evaluate Retriever on its own
|
||||||
# Here we evaluate only the retriever, based on whether the gold_label document is retrieved.
|
# Here we evaluate only the retriever, based on whether the gold_label document is retrieved.
|
||||||
retriever_eval_results = retriever.eval(top_k=10, label_index=label_index, doc_index=doc_index)
|
retriever_eval_results = retriever.eval(top_k=10, label_index=label_index, doc_index=doc_index)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user