* add time and perf benchmark for es
* Add retriever benchmarking
* Add Reader benchmarking
* add nq to squad conversion
* add conversion stats
* clean benchmarks
* Add link to dataset
* Update imports
* add first support for neg psgs
* Refactor test
* set max_seq_len
* cleanup benchmark
* begin retriever speed benchmarking
* Add support for retriever query index benchmarking
* improve reader eval, retriever speed benchmarking
* improve retriever speed benchmarking
* Add retriever accuracy benchmark
* Add neg doc shuffling
* Add top_n
* 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging
* Add models to sweep
* add option for faiss index type
* remove unneeded line
* change faiss to faiss_flat
* begin automatic benchmark script
* remove existing postgres docker for benchmarking
* Add data processing scripts
* Remove shuffle in script bc data already shuffled
* switch hnsw setup from 256 to 128
* change es similarity to dot product by default
* Error includes stack trace
* Change ES default timeout
* remove delete_docs() from timing for indexing
* Add support for website export
* update website on push to benchmarks
* add complete benchmarks results
* new json format
* removed NaN as is not a valid json token
* fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches
* update benchmarks for hnsw 128,20,80
* don't delete full index in delete_all_documents()
* update texts for charts
* update recall column for retriever
* change scale and add units to desc
* add units to legend
* add axis titles. update desc
* add html tags
Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
* remove phi normalization
* add special case for hnsw
* rename vector_size to vector_dim
* fix loading. fix extra dim in tests
* switch to new ES syntax for vector similarity
* 3x sql speed up. cascade deletes. add train_index()
* add docstrings. remove vector_dim from load()
* delete docs from faiss and sql
* fix delete of docs in test
* relax type hint for faiss index
* rename metric to metric_type
Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>
- Fixing issue when update_embeddings always create new FAISS index instead of clearing existing one. New index creation may not free existing used memory and cause memory leak.
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* fix gpu CMD and set tag to latest
* udpate dockerfiles. resolve race condition of index creation with multiple workers
* update dockerfiles for preload. remove try catch for elastic index creation
* add back try/catch. disable multiproc in default config to comply with --preload of gunicorn
* change to pip3 for GPU dockerfile
* remove --preload for gpu
* Skeleton of doc website
* Flesh out documentation pages
* Split concepts into their own rst files
* add tutorial rsts
* Consistent level 1 markdown headers in tutorials
* Change theme to readthedocs
* Turn bullet points into prose
* Populate sections
* Add more text
* Add more sphinx files
* Add more retriever documentation
* combined all documenations in one structure
* rename of src to _src as it was ignored by git
* Incorporate MP2's changes
* add benchmark bar charts
* Adapt docstrings in Readers
* Improvements to intro, creation of glossary
* Adapt docstrings in Retrievers
* Adapt docstrings in Finder
* Adapt Docstrings of Finder
* Updates to text
* Edit text
* update doc strings
* proof read tutorials
* Edit text
* Edit text
* Add stacked chart
* populate graph with data
* Switch Documentation to markdown (#386)
* add way to generate markdown files to sphinx
* changed from rst to markdown and extended sphinx for it
* fix spelling
* Clean titles
* delete file
* change spelling
* add sections to document store usage
* add basic rest api docs
* fix readme in setup.py
* Update Tutorials
* Change section names
* add windows note to pip install
* update intro
* new renderer for markdown files
* Fix typos
* delete dpr_utils.py
* fix windows note in get started
* Fix docstrings
* deleted rest api docs in api
* fixed typo
* Fix docstring
* revert readme to rst
* Fix readme
* Update setup.py
Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* bump farm version to 0.4.8
* move back to original transformers pipeline
* remove dpr_utils and use transformers implementation
* update tutorial notebooks