18 Commits

Author SHA1 Message Date
Branden Chan
363be65a78
Implement OpenSearch ANN (#1225)
* Simplify ODES init

* Add arguments to ES init and create script

* Rename similarity_fn_name and add util fn

* Create OpenSearchDocumentStore

* Specify params of Open Search HNSW

* Add better argument handling

* Update opensearch index mapping

* Edit opensearch default port

* Fix HNSW mapping

* Force small HNSW params

* Implement auto start and stopping of document store services

* Fix starting and stopping of ds service

* Restore HNSW params

* Add opensearch query benchmarks

* Add write wait time

* Revert wait time

* Add timeout

* Update benchmarks

* Update benchmarks

* Update benchmarks json

* Update documentation

* Update documentation

* Fix similarity name

* Improve argument passing

* Improve stopping and starting of service
2021-07-26 10:52:52 +02:00
Branden Chan
b555bc525c
Remove duplicate run (#1132) 2021-06-02 13:58:55 +02:00
Branden Chan
9356f637d4
Update Milvus benchmarks (#1128)
* Update Milvus benchmarks

* Add sentence transformers

* Update sentence transformers index results

* Remove duplicate row
2021-06-02 13:09:45 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus (#850)
* Add milvus benchmarking support

* Add latest docstring and tutorial changes

* Edit config

* Disable docker interactive mode

* Add milvus index type support

* Adjust FAISS and Milvus node branching

* Remove duplicate in config

* Revert method for speedup

* Add latest docstring and tutorial changes

* Add latest benchmark run

* Add latest docstring and tutorial changes

* Add json files

* Revert "Add latest docstring and tutorial changes"

This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.

* Add latest docstring and tutorial changes

* Revert "Add latest docstring and tutorial changes"

This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
b87daed62b
fixed link to dpr (#962) 2021-04-13 09:45:04 +02:00
Branden Chan
f3a3b73d9b
Choose correct similarity fns during benchmark runs & re-run benchmarks (#773)
* Adapt to new dataset_from_dicts return signature

* rename fn

* Align similarity fn in benchmark doc store

* Better choice of similarity fn

* Increase postgres wait time

* Add more expected returned variables

* update benchmark results

* Fix typo

* update all benchmark runs

* multiply stats by 100

* Specify similarity fns for website

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-03 11:45:18 +01:00
Malte Pietsch
149d98a0fd
Add latest benchmark run (#652)
* add latest benchmark run

* update templates and fix small json errors

* Change scale

Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-10 16:25:51 +01:00
Branden Chan
2aa3c071fd
Remove column in benchmark website (#608)
* Make benchmarks clearer

* remove column
2020-11-19 12:18:47 +01:00
Branden Chan
827a40b12a
Make benchmarks clearer (#606) 2020-11-19 10:31:43 +01:00
Malte Pietsch
50709a3f9d
Fix retriever mAP benchmarks 2020-11-02 19:55:58 +01:00
Branden Chan
3793205aa3 Merge branch 'master' into fix_website 2020-10-29 10:29:25 +01:00
Branden Chan
2ba5417f8e Fix metric for benchmarks website page 2020-10-29 10:26:48 +01:00
Branden Chan
7c81dfdc3a Address reviewer comments 2020-10-27 12:41:11 +01:00
brandenchan
d3743d00e9 Merge branch 'master' into automate_benchmarks 2020-10-21 17:48:10 +02:00
Markus Paff
2531c8e061
Add versioning docs (#495)
* add time and perf benchmark for es

* Add retriever benchmarking

* Add Reader benchmarking

* add nq to squad conversion

* add conversion stats

* clean benchmarks

* Add link to dataset

* Update imports

* add first support for neg psgs

* Refactor test

* set max_seq_len

* cleanup benchmark

* begin retriever speed benchmarking

* Add support for retriever query index benchmarking

* improve reader eval, retriever speed benchmarking

* improve retriever speed benchmarking

* Add retriever accuracy benchmark

* Add neg doc shuffling

* Add top_n

* 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging

* Add models to sweep

* add option for faiss index type

* remove unneeded line

* change faiss to faiss_flat

* begin automatic benchmark script

* remove existing postgres docker for benchmarking

* Add data processing scripts

* Remove shuffle in script bc data already shuffled

* switch hnsw setup from 256 to 128

* change es similarity to dot product by default

* Error includes stack trace

* Change ES default timeout

* remove delete_docs() from timing for indexing

* Add support for website export

* update website on push to benchmarks

* add complete benchmarks results

* new json format

* removed NaN as is not a valid json token

* versioning for docs

* unsaved changes

* cleaning

* cleaning

* Edit format of benchmarks data

* update also jsons in v0.4.0

Co-authored-by: brandenchan <brandenchan@icloud.com>
Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-10-19 11:46:51 +02:00
Branden Chan
1cebcb7dda
Create time and performance benchmarks for all readers and retrievers (#339)
* add time and perf benchmark for es

* Add retriever benchmarking

* Add Reader benchmarking

* add nq to squad conversion

* add conversion stats

* clean benchmarks

* Add link to dataset

* Update imports

* add first support for neg psgs

* Refactor test

* set max_seq_len

* cleanup benchmark

* begin retriever speed benchmarking

* Add support for retriever query index benchmarking

* improve reader eval, retriever speed benchmarking

* improve retriever speed benchmarking

* Add retriever accuracy benchmark

* Add neg doc shuffling

* Add top_n

* 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging

* Add models to sweep

* add option for faiss index type

* remove unneeded line

* change faiss to faiss_flat

* begin automatic benchmark script

* remove existing postgres docker for benchmarking

* Add data processing scripts

* Remove shuffle in script bc data already shuffled

* switch hnsw setup from 256 to 128

* change es similarity to dot product by default

* Error includes stack trace

* Change ES default timeout

* remove delete_docs() from timing for indexing

* Add support for website export

* update website on push to benchmarks

* add complete benchmarks results

* new json format

* removed NaN as is not a valid json token

* fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches

* update benchmarks for hnsw 128,20,80

* don't delete full index in delete_all_documents()

* update texts for charts

* update recall column for retriever

* change scale and add units to desc

* add units to legend

* add axis titles. update desc

* add html tags

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2020-10-12 13:34:42 +02:00
Markus Paff
25f34babce
Separate data and view for benchmarks (#451)
* separate data and view for benchmarks

* fixed typo
2020-10-06 10:30:19 +02:00
Branden Chan
7fdb85d63a
Create documentation website (#272)
* Skeleton of doc website

* Flesh out documentation pages

* Split concepts into their own rst files

* add tutorial rsts

* Consistent level 1 markdown headers in tutorials

* Change theme to readthedocs

* Turn bullet points into prose

* Populate sections

* Add more text

* Add more sphinx files

* Add more retriever documentation

* combined all documenations in one structure

* rename of src to _src as it was ignored by git

* Incorporate MP2's changes

* add benchmark bar charts

* Adapt docstrings in Readers

* Improvements to intro, creation of glossary

* Adapt docstrings in Retrievers

* Adapt docstrings in Finder

* Adapt Docstrings of Finder

* Updates to text

* Edit text

* update doc strings

* proof read tutorials

* Edit text

* Edit text

* Add stacked chart

* populate graph with data

* Switch Documentation to markdown (#386)

* add way to generate markdown files to sphinx

* changed from rst to markdown and extended sphinx for it

* fix spelling

* Clean titles

* delete file

* change spelling

* add sections to document store usage

* add basic rest api docs

* fix readme in setup.py

* Update Tutorials

* Change section names

* add windows note to pip install

* update intro

* new renderer for markdown files

* Fix typos

* delete dpr_utils.py

* fix windows note in get started

* Fix docstrings

* deleted rest api docs in api

* fixed typo

* Fix docstring

* revert readme to rst

* Fix readme

* Update setup.py

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-09-18 12:57:32 +02:00