* initial test cml
* Update cml.yaml
* WIP test workflow
* switch to general ubuntu ami
* switch to general ubuntu ami
* disable gpu for tests
* rm gpu infos
* rm gpu infos
* update token env
* switch github token
* add postgres
* test db connection
* fix typo
* remove tty
* add sleep for db
* debug runner
* debug removal postgres
* debug: reset to working commit
* debug: change github token
* switch to new bot token
* debug token
* add back postgres
* adjust network runner docker
* add elastic
* fix typo
* adjust working dir
* fix benchmark execution
* enable s3 downloads
* add query benchmark. fix path
* add saving of markdown files
* cat md files. add faiss+dpr. increase n_queries
* switch to GPU instance
* switch availability zone
* switch to public aws DL ami
* increase volume size
* rm faiss. fix error logging
* save markdown files
* add reader benchmarks
* add download of squad data
* correct reader metric normalization
* fix newlines between reports
* fix max_docs for reader eval data. remove max_docs from ci run config
* fix mypy. switch workflow trigger
* try trigger for label
* try trigger for label
* change trigger syntax
* debug machine shutdown with test workflow
* add es and postgres to test workflow
* Revert "add es and postgres to test workflow"
This reverts commit 6f038d3d7f12eea924b54529e61b192858eaa9d5.
* Revert "debug machine shutdown with test workflow"
This reverts commit db70eabae8850b88e1d61fd79b04d4f49d54990a.
* fix typo in action. set benchmark config back to original
* add time and perf benchmark for es
* Add retriever benchmarking
* Add Reader benchmarking
* add nq to squad conversion
* add conversion stats
* clean benchmarks
* Add link to dataset
* Update imports
* add first support for neg psgs
* Refactor test
* set max_seq_len
* cleanup benchmark
* begin retriever speed benchmarking
* Add support for retriever query index benchmarking
* improve reader eval, retriever speed benchmarking
* improve retriever speed benchmarking
* Add retriever accuracy benchmark
* Add neg doc shuffling
* Add top_n
* 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging
* Add models to sweep
* add option for faiss index type
* remove unneeded line
* change faiss to faiss_flat
* begin automatic benchmark script
* remove existing postgres docker for benchmarking
* Add data processing scripts
* Remove shuffle in script bc data already shuffled
* switch hnsw setup from 256 to 128
* change es similarity to dot product by default
* Error includes stack trace
* Change ES default timeout
* remove delete_docs() from timing for indexing
* Add support for website export
* update website on push to benchmarks
* add complete benchmarks results
* new json format
* removed NaN as is not a valid json token
* fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches
* update benchmarks for hnsw 128,20,80
* don't delete full index in delete_all_documents()
* update texts for charts
* update recall column for retriever
* change scale and add units to desc
* add units to legend
* add axis titles. update desc
* add html tags
Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>