mirror of https://github.com/deepset-ai/haystack.git synced 2025-09-22 06:33:43 +00:00

History

refactor: Adapt running benchmarks (#5007 )

* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Adapt running benchmarks script

* Adapt README.md

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* minor readme update

* Create separate methods for checking if pipeline contains reader or retriever

* Fix reader pipeline case

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>

2023-05-26 18:48:11 +02:00

data_scripts

Apply black formatting (#2115 )

2022-02-03 13:43:18 +01:00

config.json

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

distillation_config.json

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

model_distillation.py

style: Update black (#4101 )

2023-02-08 15:34:43 +01:00

nq_to_squad.py

refactor: remove direct logging without a logger (#4253 )

2023-02-23 20:42:42 +01:00

reader_results.csv

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

reader_results.md

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

reader.py

refactor: Adapt reader benchmarks (#5005 )

2023-05-26 11:40:35 +02:00

README.md

refactor: Adapt running benchmarks (#5007 )

2023-05-26 18:48:11 +02:00

results_to_json.py

chore: remove deprecated MilvusDocumentStore (#4951 )

2023-05-19 16:37:38 +02:00

retriever_index_results.csv

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

retriever_index_results.md

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

retriever_query_results.csv

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

retriever_query_results.md

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

retriever_reader.py

refactor: Add reader-retriever benchmark script (#5006 )

2023-05-26 13:54:52 +02:00

retriever_simplified.py

chore: remove deprecated MilvusDocumentStore (#4951 )

2023-05-19 16:37:38 +02:00

retriever.py

refactor: Add reader-retriever benchmark script (#5006 )

2023-05-26 13:54:52 +02:00

run.py

refactor: Adapt running benchmarks (#5007 )

2023-05-26 18:48:11 +02:00

templates.py

chore: remove deprecated MilvusDocumentStore (#4951 )

2023-05-19 16:37:38 +02:00

utils.py

refactor: Adapt running benchmarks (#5007 )

2023-05-26 18:48:11 +02:00

README.md

Benchmarks

The tooling provided in this directory allows running benchmarks on reader pipelines, retriever pipelines, and retriever-reader pipelines.

Defining configuration

To run a benchmark, you need to create a configuration file first. This file should be a Pipeline YAML file that contains both the querying and, optionally, the indexing pipeline, in case the querying pipeline includes a retriever.

The configuration file should also have a benchmark_config section that includes the following information:

labels_file: The path to a SQuAD-formatted JSON or CSV file that contains the labels to be benchmarked on.
documents_directory: The path to a directory containing files intended to be indexed into the document store. This is only necessary for retriever and retriever-reader pipelines.
data_url: This is optional. If provided, the benchmarking script will download data from this URL and save it in the data/ directory.

Here is an example of how a configuration file for a retriever-reader pipeline might look like:

components:
  - name: DocumentStore
    type: ElasticsearchDocumentStore
  - name: TextConverter
    type: TextConverter
  - name: Reader
    type: FARMReader
    params:
      model_name_or_path: deepset/roberta-base-squad2-distilled
  - name: Retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 10

pipelines:
  - name: indexing
    nodes:
      - name: TextConverter
        inputs: [File]
      - name: Retriever
        inputs: [TextConverter]
      - name: DocumentStore
        inputs: [Retriever]
  - name: querying
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: Reader
        inputs: [Retriever]

benchmark_config:
  data_url: http://example.com/data.tar.gz
  documents_directory: /path/to/documents
  labels_file: /path/to/labels.csv

Running benchmarks

Once you have your configuration file, you can run benchmarks by using the run.py script.

python run.py [--output OUTPUT] config

The script takes the following arguments:

config: This is the path to your configuration file.
--output: This is an optional path where benchmark results should be saved. If not provided, the script will create a JSON file with the same name as the specified config file.

Metrics

The benchmarks yield the following metrics:

Reader pipelines:
- Exact match score
- F1 score
- Total querying time
- Seconds/query
Retriever pipelines:
- Recall
- Mean-average precision
- Total querying time
- Seconds/query
- Queries/second
- Total indexing time
- Number of indexed Documents/second
Retriever-Reader pipelines:
- Exact match score
- F1 score
- Total querying time
- Seconds/query
- Total indexing time
- Number of indexed Documents/second

You can find more details about the performance metrics in our evaluation guide.