mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-22 15:38:01 +00:00

History

refactor: remove direct logging without a logger (#4253 )

* remove direct logging without a logger

* add custom pylint checker

* add test

* pylint

* improve checker message

* mypy

* remove test

* add checker for basicConfig

* more logging missed

* ignore basicConfig

* move out logger

* move out statement

* remove logging configuration

2023-02-23 20:42:42 +01:00

data_scripts

Apply black formatting (#2115 )

2022-02-03 13:43:18 +01:00

config.json

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

distillation_config.json

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

model_distillation.py

style: Update black (#4101 )

2023-02-08 15:34:43 +01:00

nq_to_squad.py

refactor: remove direct logging without a logger (#4253 )

2023-02-23 20:42:42 +01:00

reader_results.csv

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

reader_results.md

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

reader.py

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

README.md

Update README.md (#3247 )

2022-10-11 10:43:17 +02:00

results_to_json.py

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

retriever_index_results.csv

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

retriever_index_results.md

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

retriever_query_results.csv

bug: reactivate benchmarks with quick fixes (#2766 )

2022-09-20 10:22:08 +02:00

retriever_query_results.md

chore: fix all EOF (#3852 )

2023-01-16 12:34:50 +01:00

retriever_simplified.py

style: Update black (#4101 )

2023-02-08 15:34:43 +01:00

retriever.py

style: Update black (#4101 )

2023-02-08 15:34:43 +01:00

run.py

Apply black formatting (#2115 )

2022-02-03 13:43:18 +01:00

templates.py

Apply black formatting (#2115 )

2022-02-03 13:43:18 +01:00

utils.py

use 9200 as the default port in launch_opensearch (#3630 )

2022-11-28 19:06:45 +05:30

README.md

Benchmarks

To start all benchmarks (e.g. for a new Haystack release), run:

python run.py --reader --retriever_index --retriever_query --update_json --save_markdown

For custom runs, you can specify which components and processes to benchmark with the following flags:

python run.py [--reader] [--retriever_index] [--retriever_query] [--ci] [--update_json] [--save_markdown]

where

**--reader** will trigger the speed and accuracy benchmarks for the reader. Here we simply use the SQuAD dev set.

**--retriever_index** will trigger indexing benchmarks

**--retriever_query** will trigger querying benchmarks (embeddings will be loaded from file instead of being computed on the fly)

**--ci** will cause the the benchmarks to run on a smaller slice of each dataset and a smaller subset of Retriever / Reader / DocStores. 

**--update-json** will cause the script to update the json files in docs/_src/benchmarks so that the website benchmarks will be updated.
 
**--save_markdown** save results additionally to the default csv also as a markdown file

Results will be stored in this directory as

retriever_index_results.csv and retriever_index_results.md
retriever_query_results.csv and retriever_query_results.md
reader_results.csv and reader_results.md

Temp. Quickfix for bigger runs

For bigger indexing runs (500k docs) the standard elastic / opensearch container that we spawn via haystack might run OOM. Therefore, start them manually before you trigger the benchmark script and assign more memory to them:

docker start opensearch > /dev/null 2>&1 || docker run -d -p 9201:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_JAVA_OPTS=-Xms4096m -Xmx4096m" --name opensearch opensearchproject/opensearch:2.2.1

and

docker start elasticsearch > /dev/null 2>&1 || docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms4096m -Xmx4096m" --name elasticsearch elasticsearch:7.9.2