haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-28 19:29:40 +00:00

Author	SHA1	Message	Date
Malte Pietsch	3a7d029fdd	Fix Opensearch field type (flattened -> nested) (#1609 ) * fix field type flattened -> nested. change default port from 9201 to 9200 * change port in benchmarks	2021-10-19 14:40:53 +02:00
Julian Risch	f9d2f786ca	Replace FARM import statements; add dependencies (#1492 ) * Replace FARM import statements; add dependencies * Add InferenceProc., TextCl.Proc., TextPairCl.Proc. * Remove FARMRanker, add type annotations, rename max_sample * Add sample_to_features_text for InferenceProc. * Fix type annotations: model_name_or_path is str not Path * Fix mypy errors: implement _create_dataset in TextCl.Proc. * Add task_type "embeddings" in Inferencer * Allow loading AdaptiveModel for embedding task * Add SQuAD eval metrics; enable InferenceProc for embedding task * Add baskets as param to log_samples and handle empty basket list in log_samples * Remove unused dependencies * Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead * Remove FARMRanker and Classifier from doc generation scripts Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-09-28 16:34:24 +02:00
MichelBartels	da2e8da561	Adding multi gpu support for DPR inference (#1414 ) * Added support for Multi-GPU inference to DPR including benchmark * fixed multi gpu * added batch size to benchmark to better reflect multi gpu capabilities * remove unnecessary entry in config.json * fixed typos * fixed config name * update benchmark to use DEVICES constant * changed multi gpu parameters and updated docstring * adds silent fallback on cpu * update doc string, warning and config Co-authored-by: Michel Bartels <kontakt@michelbartels.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-09-10 13:25:02 +02:00
Branden Chan	363be65a78	Implement OpenSearch ANN (#1225 ) * Simplify ODES init * Add arguments to ES init and create script * Rename similarity_fn_name and add util fn * Create OpenSearchDocumentStore * Specify params of Open Search HNSW * Add better argument handling * Update opensearch index mapping * Edit opensearch default port * Fix HNSW mapping * Force small HNSW params * Implement auto start and stopping of document store services * Fix starting and stopping of ds service * Restore HNSW params * Add opensearch query benchmarks * Add write wait time * Revert wait time * Add timeout * Update benchmarks * Update benchmarks * Update benchmarks json * Update documentation * Update documentation * Fix similarity name * Improve argument passing * Improve stopping and starting of service	2021-07-26 10:52:52 +02:00
Branden Chan	c513865566	Add L2 support for FAISS HNSW (#1138 )	2021-06-04 11:05:18 +02:00
Branden Chan	77d4c2ca1c	Benchmark milvus (#850 ) * Add milvus benchmarking support * Add latest docstring and tutorial changes * Edit config * Disable docker interactive mode * Add milvus index type support * Adjust FAISS and Milvus node branching * Remove duplicate in config * Revert method for speedup * Add latest docstring and tutorial changes * Add latest benchmark run * Add latest docstring and tutorial changes * Add json files * Revert "Add latest docstring and tutorial changes" This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923. * Add latest docstring and tutorial changes * Revert "Add latest docstring and tutorial changes" This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b. * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-13 14:54:15 +02:00
Timo Moeller	837dea4e6d	Integrate sentence transformers into benchmarks (#843 ) * Integrate sentence transformers into benchmarks * Add doc store asserts * switch data downloads from s3 client to https. add license info * Fix mypy, revert config Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-09 17:24:16 +02:00
Branden Chan	f3a3b73d9b	Choose correct similarity fns during benchmark runs & re-run benchmarks (#773 ) * Adapt to new dataset_from_dicts return signature * rename fn * Align similarity fn in benchmark doc store * Better choice of similarity fn * Increase postgres wait time * Add more expected returned variables * update benchmark results * Fix typo * update all benchmark runs * multiply stats by 100 * Specify similarity fns for website Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-03 11:45:18 +01:00
Malte Pietsch	216787ed34	Fix benchmarks (#648 ) * disable fasttokenizer, increase ES timeout for delete requests * add session.close() * fix deletion of docs	2020-12-02 16:59:42 +01:00
Malte Pietsch	0acafc403a	Automate benchmarks via CML (#518 ) * initial test cml * Update cml.yaml * WIP test workflow * switch to general ubuntu ami * switch to general ubuntu ami * disable gpu for tests * rm gpu infos * rm gpu infos * update token env * switch github token * add postgres * test db connection * fix typo * remove tty * add sleep for db * debug runner * debug removal postgres * debug: reset to working commit * debug: change github token * switch to new bot token * debug token * add back postgres * adjust network runner docker * add elastic * fix typo * adjust working dir * fix benchmark execution * enable s3 downloads * add query benchmark. fix path * add saving of markdown files * cat md files. add faiss+dpr. increase n_queries * switch to GPU instance * switch availability zone * switch to public aws DL ami * increase volume size * rm faiss. fix error logging * save markdown files * add reader benchmarks * add download of squad data * correct reader metric normalization * fix newlines between reports * fix max_docs for reader eval data. remove max_docs from ci run config * fix mypy. switch workflow trigger * try trigger for label * try trigger for label * change trigger syntax * debug machine shutdown with test workflow * add es and postgres to test workflow * Revert "add es and postgres to test workflow" This reverts commit 6f038d3d7f12eea924b54529e61b192858eaa9d5. * Revert "debug machine shutdown with test workflow" This reverts commit db70eabae8850b88e1d61fd79b04d4f49d54990a. * fix typo in action. set benchmark config back to original	2020-11-18 18:28:17 +01:00
Branden Chan	fbacdfd263	Add logging of error, add n_docs assert	2020-10-22 15:45:46 +02:00
brandenchan	87e5f06fa8	add automatic json update	2020-10-21 17:59:44 +02:00
brandenchan	d3743d00e9	Merge branch 'master' into automate_benchmarks	2020-10-21 17:48:10 +02:00
Lalit Pagaria	63c12371b9	Change arg "model" to "model_name_or_path" in TransformersReader (#510 ) * Consistent parameter naming for TransformersReader along with removing unused imports as well. * Addressing review comments	2020-10-21 17:15:35 +02:00
brandenchan	6d60cc9451	add automation pipeline	2020-10-15 18:12:17 +02:00
Branden Chan	1cebcb7dda	Create time and performance benchmarks for all readers and retrievers (#339 ) * add time and perf benchmark for es * Add retriever benchmarking * Add Reader benchmarking * add nq to squad conversion * add conversion stats * clean benchmarks * Add link to dataset * Update imports * add first support for neg psgs * Refactor test * set max_seq_len * cleanup benchmark * begin retriever speed benchmarking * Add support for retriever query index benchmarking * improve reader eval, retriever speed benchmarking * improve retriever speed benchmarking * Add retriever accuracy benchmark * Add neg doc shuffling * Add top_n * 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging * Add models to sweep * add option for faiss index type * remove unneeded line * change faiss to faiss_flat * begin automatic benchmark script * remove existing postgres docker for benchmarking * Add data processing scripts * Remove shuffle in script bc data already shuffled * switch hnsw setup from 256 to 128 * change es similarity to dot product by default * Error includes stack trace * Change ES default timeout * remove delete_docs() from timing for indexing * Add support for website export * update website on push to benchmarks * add complete benchmarks results * new json format * removed NaN as is not a valid json token * fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches * update benchmarks for hnsw 128,20,80 * don't delete full index in delete_all_documents() * update texts for charts * update recall column for retriever * change scale and add units to desc * add units to legend * add axis titles. update desc * add html tags Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2020-10-12 13:34:42 +02:00

16 Commits