EvaluationRunResult.score_report()
metrics
* fixing the DataFrame with the aggregated scores * fixing tests