Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995)

* print report for closed-domain eval * Add latest docstring and tutorial changes * rename parameter and rewrite docs * Add latest docstring and tutorial changes * print eval report in separate cell * Add latest docstring and tutorial changes * explain when to eval individual components Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-12-12 07:17:41 +00:00 · 2022-01-14 16:29:18 +01:00 · 2022-01-14 16:29:18 +01:00 · 2c063e960e
commit 2c063e960e
parent 5695d721aa
3 changed files with 73 additions and 0 deletions
--- a/docs/_src/tutorials/tutorials/5.md
+++ b/docs/_src/tutorials/tutorials/5.md
@ -282,6 +282,24 @@ metrics = advanced_eval_result.calculate_metrics()
 print(metrics["Reader"]["sas"])
 ```

+## Isolated Evaluation Mode to Understand Upper Bounds of the Reader's Performance
+The isolated node evaluation uses labels as input to the reader node instead of the output of the preceeding retriever node.
+Thereby, we can additionally calculate the upper bounds of the evaluation metrics of the reader.
+
+
+```python
+eval_result_with_upper_bounds = pipeline.eval(
+        labels=eval_labels,
+        params={"Retriever": {"top_k": 1}},
+        add_isolated_node_eval=True
+    )
+```
+
+
+```python
+pipeline.print_eval_report(eval_result_with_upper_bounds)
+```
+
 ## Evaluation of Individual Components: Retriever
 Here we evaluate only the retriever, based on whether the gold_label document is retrieved.

--- a/tutorials/Tutorial5_Evaluation.ipynb
+++ b/tutorials/Tutorial5_Evaluation.ipynb
@ -15346,10 +15346,54 @@
    }
   }
  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Isolated Evaluation Mode to Understand Upper Bounds of the Reader's Performance\n",
+    "The isolated node evaluation uses labels as input to the reader node instead of the output of the preceeding retriever node.\n",
+    "Thereby, we can additionally calculate the upper bounds of the evaluation metrics of the reader."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "eval_result_with_upper_bounds = pipeline.eval(\n",
+    "        labels=eval_labels,\n",
+    "        params={\"Retriever\": {\"top_k\": 1}},\n",
+    "        add_isolated_node_eval=True\n",
+    "    )"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "pipeline.print_eval_report(eval_result_with_upper_bounds)"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   }
+  },
  {
   "cell_type": "markdown",
   "source": [
    "## Evaluation of Individual Components: Retriever\n",
+    "Sometimes you might want to evaluate individual components, for example, if you don't have a pipeline but only a retriever or a reader with a model that you trained yourself.\n",
    "Here we evaluate only the retriever, based on whether the gold_label document is retrieved."
   ],
   "metadata": {
--- a/tutorials/Tutorial5_Evaluation.py
+++ b/tutorials/Tutorial5_Evaluation.py
@ -179,7 +179,18 @@ def tutorial5_evaluation():
    metrics = advanced_eval_result.calculate_metrics()
    print(metrics["Reader"]["sas"])

+    ## Isolated Evaluation Mode to Understand Upper Bounds of the Reader's Performance
+    # The isolated node evaluation uses labels as input to the reader node instead of the output of the preceeding retriever node.
+    # Thereby, we can additionally calculate the upper bounds of the evaluation metrics of the reader.
+    eval_result_with_upper_bounds = pipeline.eval(
+        labels=eval_labels,
+        params={"Retriever": {"top_k": 1}},
+        add_isolated_node_eval=True
+    )
+    pipeline.print_eval_report(eval_result_with_upper_bounds)

+    ## Evaluation of Individual Components
+    # Sometimes you might want to evaluate individual components, for example, if you don't have a pipeline but only a retriever or a reader with a model that you trained yourself.
    # Evaluate Retriever on its own
    # Here we evaluate only the retriever, based on whether the gold_label document is retrieved.
    retriever_eval_results = retriever.eval(top_k=10, label_index=label_index, doc_index=doc_index)