mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-01 10:19:23 +00:00

History

ci: Add Github workflow to automate benchmark runs (#5399 )

* Add config files

* log benchmarks to stdout

* Add top-k and batch size to configs

* Add batch size to configs

* fix: don't download files if they already exist

* Add batch size to configs

* refine script

* Remove configs using 1m docs

* update run script

* update run script

* update run script

* datadog integration

* remove out folder

* gitignore benchmarks output

* test: send benchmarks to datadog

* remove uncommented lines in script

* feat: take branch/tag argument for benchmark setup script

* fix: run.sh should ignore errors

* Add GH workflow to run benchmarks periodically

* Remove unused script

* Adapt cml.yml

* Adapt cml.yml

* Rename cml.yml to benchmarks.yml

* Revert "Rename cml.yml to benchmarks.yml"

This reverts commit 897299433a71a55827124728adff5de918d46d21.

* remove benchmarks.yml

* Use same file extension for all config files

* Use checkout@v3

* Run benchmarks sequentially

* Add timeout-minutes parameter

* Remove changes unrelated to datadog

* Apply black

* use haystack-oss aws account

* Update test/benchmarks/utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR feedback

* fix aws credentials step

* Fix path

* check docker

* Allow spinning up containers from within container

* Allow spinning up containers from within container

* Separate launching doc stores from benchmarks

* Remove docker related commands

* run only retrievers

* change port

* Revert "change port"

This reverts commit 6e5bcebb1d16e03ba7672be7e8a089084c7fc3a7.

* Run opensearch benchmark only

* Run weaviate benchmark only

* Run bm25 benchmarks only

* Changes host of doc stores

* add step to get docker logs

* Revert "add step to get docker logs"

This reverts commit c10e6faa76bde5df406a027203bd775d18c93c90.

* Install docker

* Launch doc store containers from wtihin runner container

* Remove kill command

* Change host

* dump docker logs

* change port

* Add cloud startup script

* dump docker logs

* add network param

* add network to startup.sh

* check cluster health

* move steps

* change port

* try using services

* check cluster health

* use services

* run only weaviate

* change host

* Upload benchmark results as artifacts

* Update configs

* Delete index after benchmark run

* Use correct index name

* Run only failing config

* Use smaller batch size

* Increase memory for opensearch

* Reduce batch size further

* Provide more storage

* Reduce batch size

* dump docker logs

* add java opts

* Spin up only opensearch container

* Create separate job for each doc store

* Run benchmarks sequentially

* Set working directory

* Account for reader benchmarks not doing indexing

* Change key of reader metrics

* Apply PR feedback

* Remove whitespace

* Adapt workflow to changes in datadog scripts

* Adapt workflow to changes in datadog scripts

* Increase memory for opensearch

* Reduce batch size

* Add preprocessing_batch_size to Readers

* Remove unrelated change

* Move order

* Fix path

* Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

* Manually terminate EC2 instance

* Manually terminate EC2 instance

* Always terminate runner

* Always terminate runner

* Remove unnecessary terminate-runner job

* Add cron schedule

* Disable telemetry

* Rename cml.yml to benchmarks.yml

---------

Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Paul Steppacher <p.steppacher91@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>

2023-08-17 12:56:45 +02:00

configs

ci: Add Github workflow to automate benchmark runs (#5399 )

2023-08-17 12:56:45 +02:00

datadog

ci: Add Github workflow to automate benchmark runs (#5399 )

2023-08-17 12:56:45 +02:00

distillation_config.json

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

model_distillation.py

style: Update black (#4101 )

2023-02-08 15:34:43 +01:00

nq_to_squad.py

refactor: remove direct logging without a logger (#4253 )

2023-02-23 20:42:42 +01:00

reader.py

test: Add scripts to send benchmark results to datadog (#5432 )

2023-08-03 10:09:00 +02:00

README.md

refactor: Adapt running benchmarks (#5007 )

2023-05-26 18:48:11 +02:00

retriever_reader.py

ci: Add Github workflow to automate benchmark runs (#5399 )

2023-08-17 12:56:45 +02:00

retriever.py

ci: Add Github workflow to automate benchmark runs (#5399 )

2023-08-17 12:56:45 +02:00

run.py

ci: Add Github workflow to automate benchmark runs (#5399 )

2023-08-17 12:56:45 +02:00

templates.py

chore: remove deprecated MilvusDocumentStore (#4951 )

2023-05-19 16:37:38 +02:00

utils.py

ci: Add Github workflow to automate benchmark runs (#5399 )

2023-08-17 12:56:45 +02:00

README.md

Benchmarks

The tooling provided in this directory allows running benchmarks on reader pipelines, retriever pipelines, and retriever-reader pipelines.

Defining configuration

To run a benchmark, you need to create a configuration file first. This file should be a Pipeline YAML file that contains both the querying and, optionally, the indexing pipeline, in case the querying pipeline includes a retriever.

The configuration file should also have a benchmark_config section that includes the following information:

labels_file: The path to a SQuAD-formatted JSON or CSV file that contains the labels to be benchmarked on.
documents_directory: The path to a directory containing files intended to be indexed into the document store. This is only necessary for retriever and retriever-reader pipelines.
data_url: This is optional. If provided, the benchmarking script will download data from this URL and save it in the data/ directory.

Here is an example of how a configuration file for a retriever-reader pipeline might look like:

components:
  - name: DocumentStore
    type: ElasticsearchDocumentStore
  - name: TextConverter
    type: TextConverter
  - name: Reader
    type: FARMReader
    params:
      model_name_or_path: deepset/roberta-base-squad2-distilled
  - name: Retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 10

pipelines:
  - name: indexing
    nodes:
      - name: TextConverter
        inputs: [File]
      - name: Retriever
        inputs: [TextConverter]
      - name: DocumentStore
        inputs: [Retriever]
  - name: querying
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: Reader
        inputs: [Retriever]

benchmark_config:
  data_url: http://example.com/data.tar.gz
  documents_directory: /path/to/documents
  labels_file: /path/to/labels.csv

Running benchmarks

Once you have your configuration file, you can run benchmarks by using the run.py script.

python run.py [--output OUTPUT] config

The script takes the following arguments:

config: This is the path to your configuration file.
--output: This is an optional path where benchmark results should be saved. If not provided, the script will create a JSON file with the same name as the specified config file.

Metrics

The benchmarks yield the following metrics:

Reader pipelines:
- Exact match score
- F1 score
- Total querying time
- Seconds/query
Retriever pipelines:
- Recall
- Mean-average precision
- Total querying time
- Seconds/query
- Queries/second
- Total indexing time
- Number of indexed Documents/second
Retriever-Reader pipelines:
- Exact match score
- F1 score
- Total querying time
- Seconds/query
- Total indexing time
- Number of indexed Documents/second

You can find more details about the performance metrics in our evaluation guide.