haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-27 19:00:35 +00:00

Author	SHA1	Message	Date
Sara Zan	a59bca3661	Apply black formatting (#2115 ) * Testing black on ui/ * Applying black on docstores * Add latest docstring and tutorial changes * Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too * Remove comments * Relax constraints on pydoc-markdown * Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade * Fix a couple of bugs * Add a type: ignore that was missing somehow * Give path to black * Apply Black * Apply Black * Relocate a couple of type: ignore * Update documentation * Make Linux CI run after applying Black * Triggering Black * Apply Black * Remove dependency, does not work well * Remove manually double trailing commas * Update documentation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-03 13:43:18 +01:00
Sara Zan	eab475bb5d	Rename every occurrence of 'embed_passages' with 'embed_documents' (#1667 ) * Rename every occurrence of 'embed_passages' with 'embed_documents' * Remove aliased method embed_documents Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-28 12:17:56 +02:00
Sara Zan	13510aa753	Refactoring of the `haystack` package (#1624 ) * Files moved, imports all broken * Fix most imports and docstrings into * Fix the paths to the modules in the API docs * Add latest docstring and tutorial changes * Add a few pipelines that were lost in the inports * Fix a bunch of mypy warnings * Add latest docstring and tutorial changes * Create a file_classifier module * Add docs for file_classifier * Fixed most circular imports, now the REST API can start * Add latest docstring and tutorial changes * Tackling more mypy issues * Reintroduce from FARM and fix last mypy issues hopefully * Re-enable old-style imports * Fix some more import from the top-level package in an attempt to sort out circular imports * Fix some imports in tests to new-style to prevent failed class equalities from breaking tests * Change document_store into document_stores * Update imports in tutorials * Add latest docstring and tutorial changes * Probably fixes summarizer tests * Improve the old-style import allowing module imports (should work) * Try to fix the docs * Remove dedicated KnowledgeGraph page from autodocs * Remove dedicated GraphRetriever page from autodocs * Fix generate_docstrings.sh with an updated list of yaml files to look for * Fix some more modules in the docs * Fix the document stores docs too * Fix a small issue on Tutorial14 * Add latest docstring and tutorial changes * Add deprecation warning to old-style imports * Remove stray folder and import Dict into dense.py * Change import path for MLFlowLogger * Add old loggers path to the import path aliases * Fix debug output of convert_ipynb.py * Fix circular import on BaseRetriever * Missed one merge block * re-run tutorial 5 * Fix imports in tutorial 5 * Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base * Add latest docstring and tutorial changes * Fix typo in utils __init__ * Fix a few more imports * Fix benchmarks too * New-style imports in test_knowledge_graph * Rollback setup.py * Rollback squad_to_dpr too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-25 15:50:23 +02:00
Lalit Pagaria	5dbd899a93	Experimental changes to support Milvus 2.x (#1473 ) * Experimental changes to support Milvus 2.x * Milvus 2.0 need other containers hence adding them * Add latest docstring and tutorial changes * Fixing tests * Correcting use of list collections * correcting connection close * Removing connection close logic * removing flush * using collection instead of connection * fixing describe collection * Fixing insert, query and search based on new signature * Making mypy happy * Fixing one test case * Fixing search and embedding fetch based on newer api * Implementing delete vector id function * Wrapping up final changes * Add latest docstring and tutorial changes * Correcting requirements.txt * removing empty line in requirements.txt * add docstring and exception for delete * add docstring. condition import on env var. raise exception for deletion * fix typo * change delete signature * ignore typing for import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-10-25 10:39:48 +02:00
Malte Pietsch	3a7d029fdd	Fix Opensearch field type (flattened -> nested) (#1609 ) * fix field type flattened -> nested. change default port from 9201 to 9200 * change port in benchmarks	2021-10-19 14:40:53 +02:00
Julian Risch	f9d2f786ca	Replace FARM import statements; add dependencies (#1492 ) * Replace FARM import statements; add dependencies * Add InferenceProc., TextCl.Proc., TextPairCl.Proc. * Remove FARMRanker, add type annotations, rename max_sample * Add sample_to_features_text for InferenceProc. * Fix type annotations: model_name_or_path is str not Path * Fix mypy errors: implement _create_dataset in TextCl.Proc. * Add task_type "embeddings" in Inferencer * Allow loading AdaptiveModel for embedding task * Add SQuAD eval metrics; enable InferenceProc for embedding task * Add baskets as param to log_samples and handle empty basket list in log_samples * Remove unused dependencies * Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead * Remove FARMRanker and Classifier from doc generation scripts Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-09-28 16:34:24 +02:00
MichelBartels	da2e8da561	Adding multi gpu support for DPR inference (#1414 ) * Added support for Multi-GPU inference to DPR including benchmark * fixed multi gpu * added batch size to benchmark to better reflect multi gpu capabilities * remove unnecessary entry in config.json * fixed typos * fixed config name * update benchmark to use DEVICES constant * changed multi gpu parameters and updated docstring * adds silent fallback on cpu * update doc string, warning and config Co-authored-by: Michel Bartels <kontakt@michelbartels.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-09-10 13:25:02 +02:00
Branden Chan	363be65a78	Implement OpenSearch ANN (#1225 ) * Simplify ODES init * Add arguments to ES init and create script * Rename similarity_fn_name and add util fn * Create OpenSearchDocumentStore * Specify params of Open Search HNSW * Add better argument handling * Update opensearch index mapping * Edit opensearch default port * Fix HNSW mapping * Force small HNSW params * Implement auto start and stopping of document store services * Fix starting and stopping of ds service * Restore HNSW params * Add opensearch query benchmarks * Add write wait time * Revert wait time * Add timeout * Update benchmarks * Update benchmarks * Update benchmarks json * Update documentation * Update documentation * Fix similarity name * Improve argument passing * Improve stopping and starting of service	2021-07-26 10:52:52 +02:00
Branden Chan	c513865566	Add L2 support for FAISS HNSW (#1138 )	2021-06-04 11:05:18 +02:00
Branden Chan	77d4c2ca1c	Benchmark milvus (#850 ) * Add milvus benchmarking support * Add latest docstring and tutorial changes * Edit config * Disable docker interactive mode * Add milvus index type support * Adjust FAISS and Milvus node branching * Remove duplicate in config * Revert method for speedup * Add latest docstring and tutorial changes * Add latest benchmark run * Add latest docstring and tutorial changes * Add json files * Revert "Add latest docstring and tutorial changes" This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923. * Add latest docstring and tutorial changes * Revert "Add latest docstring and tutorial changes" This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b. * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-13 14:54:15 +02:00
Timo Moeller	837dea4e6d	Integrate sentence transformers into benchmarks (#843 ) * Integrate sentence transformers into benchmarks * Add doc store asserts * switch data downloads from s3 client to https. add license info * Fix mypy, revert config Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-09 17:24:16 +02:00
Branden Chan	f3a3b73d9b	Choose correct similarity fns during benchmark runs & re-run benchmarks (#773 ) * Adapt to new dataset_from_dicts return signature * rename fn * Align similarity fn in benchmark doc store * Better choice of similarity fn * Increase postgres wait time * Add more expected returned variables * update benchmark results * Fix typo * update all benchmark runs * multiply stats by 100 * Specify similarity fns for website Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-03 11:45:18 +01:00
Malte Pietsch	216787ed34	Fix benchmarks (#648 ) * disable fasttokenizer, increase ES timeout for delete requests * add session.close() * fix deletion of docs	2020-12-02 16:59:42 +01:00
Malte Pietsch	0acafc403a	Automate benchmarks via CML (#518 ) * initial test cml * Update cml.yaml * WIP test workflow * switch to general ubuntu ami * switch to general ubuntu ami * disable gpu for tests * rm gpu infos * rm gpu infos * update token env * switch github token * add postgres * test db connection * fix typo * remove tty * add sleep for db * debug runner * debug removal postgres * debug: reset to working commit * debug: change github token * switch to new bot token * debug token * add back postgres * adjust network runner docker * add elastic * fix typo * adjust working dir * fix benchmark execution * enable s3 downloads * add query benchmark. fix path * add saving of markdown files * cat md files. add faiss+dpr. increase n_queries * switch to GPU instance * switch availability zone * switch to public aws DL ami * increase volume size * rm faiss. fix error logging * save markdown files * add reader benchmarks * add download of squad data * correct reader metric normalization * fix newlines between reports * fix max_docs for reader eval data. remove max_docs from ci run config * fix mypy. switch workflow trigger * try trigger for label * try trigger for label * change trigger syntax * debug machine shutdown with test workflow * add es and postgres to test workflow * Revert "add es and postgres to test workflow" This reverts commit 6f038d3d7f12eea924b54529e61b192858eaa9d5. * Revert "debug machine shutdown with test workflow" This reverts commit db70eabae8850b88e1d61fd79b04d4f49d54990a. * fix typo in action. set benchmark config back to original	2020-11-18 18:28:17 +01:00
Branden Chan	fbacdfd263	Add logging of error, add n_docs assert	2020-10-22 15:45:46 +02:00
brandenchan	87e5f06fa8	add automatic json update	2020-10-21 17:59:44 +02:00
brandenchan	d3743d00e9	Merge branch 'master' into automate_benchmarks	2020-10-21 17:48:10 +02:00
Lalit Pagaria	63c12371b9	Change arg "model" to "model_name_or_path" in TransformersReader (#510 ) * Consistent parameter naming for TransformersReader along with removing unused imports as well. * Addressing review comments	2020-10-21 17:15:35 +02:00
brandenchan	6d60cc9451	add automation pipeline	2020-10-15 18:12:17 +02:00
Branden Chan	1cebcb7dda	Create time and performance benchmarks for all readers and retrievers (#339 ) * add time and perf benchmark for es * Add retriever benchmarking * Add Reader benchmarking * add nq to squad conversion * add conversion stats * clean benchmarks * Add link to dataset * Update imports * add first support for neg psgs * Refactor test * set max_seq_len * cleanup benchmark * begin retriever speed benchmarking * Add support for retriever query index benchmarking * improve reader eval, retriever speed benchmarking * improve retriever speed benchmarking * Add retriever accuracy benchmark * Add neg doc shuffling * Add top_n * 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging * Add models to sweep * add option for faiss index type * remove unneeded line * change faiss to faiss_flat * begin automatic benchmark script * remove existing postgres docker for benchmarking * Add data processing scripts * Remove shuffle in script bc data already shuffled * switch hnsw setup from 256 to 128 * change es similarity to dot product by default * Error includes stack trace * Change ES default timeout * remove delete_docs() from timing for indexing * Add support for website export * update website on push to benchmarks * add complete benchmarks results * new json format * removed NaN as is not a valid json token * fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches * update benchmarks for hnsw 128,20,80 * don't delete full index in delete_all_documents() * update texts for charts * update recall column for retriever * change scale and add units to desc * add units to legend * add axis titles. update desc * add html tags Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>	2020-10-12 13:34:42 +02:00

20 Commits