tstadel c5540d05ed
Calculation of metrics and presentation of eval results (#1760)
* retriever metrics added

* Add latest docstring and tutorial changes

* answer and document level matching metrics implemented

* Add latest docstring and tutorial changes

* answer related metrics for retriever

* basic reader metrics implemented

* handle no_answers

* fix typing

* fix tests

* fix tests without sas

* first draft for simulated top k

* rename sas and f1 columns in dataframe

* refactoring of EvaluationResult

* Add latest docstring and tutorial changes

* more eval tests added

* fix sas expected value precision

* distinction between ir and qa recall

* EvaluationResult.worst_queries() implemented

* print_evaluation_report() added

* eval report for QA Pipeline improved

* dynamic metrics for worst queries calc

* Add latest docstring and tutorial changes

* method names adjusted

* simple test for print_eval_report() added

* improved documentation

* Add latest docstring and tutorial changes

* minor formatting

* Add latest docstring and tutorial changes

* fix no_answer cases

* adjust one docstring

* Add latest docstring and tutorial changes

* fix no_answer cases for sas

* batchmode for sas implemented

* fix for retriever metrics if there are only no_answers

* fix multilabel tests

* improve documentation for pipeline.eval()

* streamline multilabel aggregates and docs

* Add latest docstring and tutorial changes

* fix multilabel tests

* unify document_id

* add dataframe schema description to EvaluationResult

* Add latest docstring and tutorial changes

* rename worst_queries to wrong_examples

* Add latest docstring and tutorial changes

* make query digesting standard pipelines work with pipeline.eval()

* Add latest docstring and tutorial changes

* tests for multi retriever pipelines added

* remove unnecessary import

* print_eval_report(): support all pipelines without junctions

* Add latest docstring and tutorial changes

* fix typos

* Add latest docstring and tutorial changes

* fix minor simulated_top_k bug and use memory documentstore throughout tests

* sas model param description improved

* Add latest docstring and tutorial changes

* rename recall metrics

* Add latest docstring and tutorial changes

* fix mean average precision link

* Add latest docstring and tutorial changes

* adjust sas description docstring

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-30 19:26:34 +01:00
..
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00