haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-09-16 03:34:46 +00:00

Author	SHA1	Message	Date
Sara Zan	009c89fc53	Revert "Make the docstring bot work only on master" (#2114 ) * Revert "Make the docstring bot work only on master (#2078)" This reverts commit 649d07405770cd59696d0120107a3b2f0aafe7c2. * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-02 16:08:34 +01:00
Sebastián Ramírez	3c768071d5	✨ Add JSON Schema autogeneration for Pipeline YAML files (#2020 ) * 🎨 Update type annotations to allow their extraction for JSON Schema * ✨ Add main script doing all the work to generate the JSON Schema * ➕ Add GitHub Action dependency to generate JSON Schema * ✨ Update JSON Schema generation script to allow easily generating the schema without making a PR * 👷 Add GitHub Action to generate JSON Schema * 💚 Fix CI GitHub Action * 💚 Update GitHub Action environment variables * ✨ Add initial JSON Schema * Add latest docstring and tutorial changes * 🐛 Do not allow extra params not defined in each model * ♻️ Make any additional properties invalid * ✨ Make other additional properties invalid in all the levels in pipelines * ♻️ Do not include Base classes as possible nodes * 🍱 Update JSON Schema Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-02 15:00:41 +01:00
Julian Risch	3245cdef1d	Add faiss dependency to tutorial 12 (#2109 )	2022-02-02 14:19:08 +01:00
mathislucka	88771b2bee	Provide option to recreate es doc store on initialization (#2084 ) * provide option to recreate es doc store on initialization * Add latest docstring and tutorial changes * Label expects more arguments * Label expects also an answer Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-02-02 11:03:15 +01:00
Sara Zan	649d074057	Make the docstring bot work only on master (#2078 )	2022-02-01 14:09:55 +01:00
MichelBartels	525884e4cf	do not apply data parallel twice (#2095 )	2022-02-01 12:24:51 +01:00
MichelBartels	e0c072d6fd	Distribute intermediate layer distillation loss calculation over multiple GPUs (#2090 ) * distribute tinybert loss calculation * improve doc string * undo unnecessary change * fix for only one gpu * adding type hints * making sure model distillation still works without gpu * fix bug * fixing type hints	2022-02-01 09:47:00 +01:00
Sowmiya Jaganathan	7d769d8bf1	Fixed the Search Field mapping in ElasticSearch DocumentStore (#2080 ) * Review changes * Added the synonym analyser for search fields * Added the review requests. * Added the synonyms the OpenSearchDocumentStore and review requests.	2022-01-31 11:11:20 +01:00
bogdankostic	bbb65a19bd	Add Tapas reader with scores (#1997 ) * Add Tapas reader with scores * Adapt possible answer spans * Add latest docstring and tutorial changes * Remove unused imports * Adapt scoring * Add latest docstring and tutorial changes * Fix mypy * Infer model architecture from config * Adapt answer score calculation * Add latest docstring and tutorial changes * Fix mypy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-31 10:23:12 +01:00
Malte Pietsch	ee6b8d0688	Add ADR template for transparent architecture decisions (#2072 ) * add adr template for decisions * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-01-28 17:33:53 +01:00
Kristof Herrmann	7764b6992c	DC SDK - load pipeline from deepset cloud (#2013 ) * initial load_from_dc * typo * adjusted api endpoint * removed kwargs * added _load_from_dict * refactor pipeline loading mechanism * renaming load_from_dc api * renaming * fixed errors * fix comments and environment variable overrides * Add latest docstring and tutorial changes * fix outdated YAML examples * Add latest docstring and tutorial changes * Introduce readonly DCDocumentStore (without labels support) (#1991) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de> * introduce DeepsetCloudAdapter * Add latest docstring and tutorial changes * introduce DeepsetCloudClient * Add latest docstring and tutorial changes * use json api for pipeline_config * indexing pipeline test added * pseudo change to force cache eviction * revert pseudo change to force cache eviction * remove conftest duplicates * minor formatting and docstring fixes * fix tests when MOCK_DC=False Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-28 17:32:56 +01:00
Sara Zan	07cf3c614a	Disable cache on the CI (#2083 ) * Disable cache on the CI * Reintroduce paths * Add most files to the cache key * remove date and path from cache key * Try double install with cache * Try to cache more stuff, on a per-commit basis * Fix windows CI too * Add comment on how to speed up the CI with better caching	2022-01-28 17:21:23 +01:00
tstadel	1b1e44e771	install haystack in editable mode for ci (#2082 )	2022-01-28 09:59:28 +01:00
Sara Zan	713771095b	Autogenerate OpenAPI specs file (#2047 ) * Add docstrings to the REST API endpoint to have them included in the OpenAPI specs * Attempt at make GitHub CI generate the OpenAPI specs * Missing __init__.py was breaking rest_api import * Add comment on dummy pipeline * Create separate workflow file for the OpenAPI specs generation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>	2022-01-27 13:06:01 +01:00
Sara Zan	3c02aa50d0	Remove run_docker_gpu.sh (#2003 ) * Remove run_docker_gpu.sh * remove shell formatting check from CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2022-01-27 12:20:43 +01:00
Sara Zan	9af1292cda	Remove stray requirements.txt files and update README.md (#2075 ) * Remove stray requirements.txt files and update README.md * Remove requirement files * Add details about pip bug and link to setup.cfg Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-27 11:22:14 +01:00
AhmedIdr	488c3e9e52	pass faiss batch_size to sqldocumentstore (#2061 )	2022-01-26 19:35:16 +01:00
Julian Risch	5079c6847a	Convert doc embedding from ndarray to list of float for REST API (#1901 ) * convert ndarray doc embedding to list of float * check type of embedding of each doc individually * Fix in case documents is None	2022-01-26 18:20:44 +01:00
Sara Zan	d470b9d0bd	Improve dependency management (#1994 ) * Fist attempt at using setup.cfg for dependency management * Trying the new package on the CI and in Docker too * Add composite extras_require * Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports * Fix bug on class import and rephrase error message * Introduce typing for optional modules and add type: ignore in sparse.py * Include importlib_metadata backport for py3.7 * Add colab group to extra_requires * Fix pillow version * Fix grpcio * Separate out the crawler as another extra * Make paths relative in rest_api and ui * Update the test matrix in the CI * Add try catch statements around the optional imports too to account for direct imports * Never mix direct deps with self-references and add ES deps to the base install * Refactor several paths in tests to make them insensitive to the execution path * Include tstadel review and re-introduce Milvus1 in the tests suite, to fix * Wrap pdf conversion utils into safe_import * Update some tutorials and rever Milvus1 as default for now, see #2067 * Fix mypy config Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 18:12:55 +01:00
MichelBartels	4cc37548e3	Fix finetuning notebook augmentation (#2071 ) * fix data augmentation path in finetuning notebook * Add latest docstring and tutorial changes * make distillation possible with other models than BERT * use smaller dataset for distillation in finetuning tutorial * Add latest docstring and tutorial changes * make data augmentation in finetuning faster * update language models forward doc strings * fix return type of language models * remove debug output Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 17:49:14 +01:00
Sowmiya Jaganathan	c4fff19018	Supported Highlighting in Elasticsearch (#1930 ) * Supported Highlighting * Review changes * add example to docstrings * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-26 17:35:33 +01:00
Adrien Wald	2edc421a09	Add `top_k_join` parameter to `JoinDocuments.run` (#2065 ) * add top_k_join parameter to JoinDocuments.run * test JoinDocuments concatenate with top_k_join parameter * test two different top_k_join parameters	2022-01-26 17:30:16 +01:00
mathislucka	5b7e906e85	fix: get_documents_by_id should return docs for all passed ids (#2064 ) * doc store should return all documents matching ids passed to get_documents_by_id * test for get_document_by_id should be named correctly * add test for get_documents_by_id * Add latest docstring and tutorial changes * document es query limit * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 12:39:04 +01:00
Julian Risch	0f34983f74	fix answer is not subscriptable error (#2069 ) * fix answer is not subscriptable error * fix answer is not subscriptable in script	2022-01-26 11:45:45 +01:00
tstadel	8a32d8da92	Introduce readonly DCDocumentStore (without labels support) (#1991 ) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>	2022-01-25 20:36:28 +01:00
Sara Zan	d147443cb1	Pin Milvus to <2.0.0 (#2063 )	2022-01-25 17:12:56 +01:00
MichelBartels	5b6b0cef77	Add UnlabeledTextProcessor (#2054 ) * add UnlabeledTextProcessor * allow choosing processor when finetuning or distilling * fix type hint * Add latest docstring and tutorial changes * improve segment id computation for UnlabeledTextProcessor * add text and documentation * change batch size parameter for intermediate layer distillation * Add latest docstring and tutorial changes * fix distillation dim mapping * remove unnecessary changes * removed confusing parameter * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-25 14:54:34 +01:00
Julian Risch	c6f23dce88	upgrade haystack version number to 1.1.0 (#2039 ) * upgrade haystack version number to 1.1.0 * copy docs to new version folder v1.1.0	2022-01-20 13:45:38 +01:00
tstadel	50317d74bd	Add ndcg and eval_mode to docs (#2038 ) * add ndcg and eval_mode to docstrings and reorder dataframe columns in docs * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-20 13:02:46 +01:00
MichelBartels	e8cd5ea943	Add distillation to finetuning tutorial (#2025 ) * Add finetuning tutorial * Add latest docstring and tutorial changes * fix typo * Add latest docstring and tutorial changes * improve distillation explanation in finetuning tutorial * Add latest docstring and tutorial changes * allow augment_squad.py to be easier to call from within python * Update Tutorial2_Finetune_a_model_on_your_data.py * fix squad augmentation test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-20 12:18:32 +01:00
oryx1729	cb881b6fa9	Disable pip cache for Dockerfiles (#2015 )	2022-01-19 10:26:17 +01:00
Kristof Herrmann	6267476015	Bugfix - save_to_yaml for OpenSearchDocumentStore (#2017 ) * fix save_to_yaml * add link to issue * added generic implementation * added type * remove not used imports	2022-01-19 10:10:50 +01:00
Yorick van Zweeden	ea10d011ab	Replace SessionState with Streamlit built-in (#2006 ) * Replace SessionState with Streamlit built-in * Set session state to default if absent Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>	2022-01-18 14:59:42 +01:00
MichelBartels	0cca2b97cd	distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001 ) * add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation * Add latest docstring and tutorial changes * improve default parameters * Add latest docstring and tutorial changes * split up distillation method * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 20:40:38 +01:00
tstadel	f42d2e8ba0	Add nDCG to `pipeline.eval()`'s document metrics (#2008 ) * add ndcg metric * fix merge * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 18:36:41 +01:00
Julian Risch	2c063e960e	Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995 ) * print report for closed-domain eval * Add latest docstring and tutorial changes * rename parameter and rewrite docs * Add latest docstring and tutorial changes * print eval report in separate cell * Add latest docstring and tutorial changes * explain when to eval individual components Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 16:29:18 +01:00
Julian Risch	5695d721aa	update link to annotation tool docu (#2005 ) * update link to annotation tool docu * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 16:10:59 +01:00
Julian Risch	a3147cae47	Add isolated node eval mode in pipeline eval (#1962 ) * run predictions on ground-truth docs in reader * build dataframe for closed/open domain eval * fix looping through multilabel * fix looping through multilabel's list of labels * simplify collecting relevant docs * switch closed-domain eval off by default * Add latest docstring and tutorial changes * handle edge case params not given * renaming & generate pipeline eval report * add test case for closed-domain eval metrics * Add latest docstring and tutorial changes * test report of closed-domain eval * report closed-domain metrics only for answer metrics not doc metrics * refactoring * fix mypy & remove comment * add second for-loop & use answer as method input * renaming & add separate loop building docs eval df * Add latest docstring and tutorial changes * source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957) conda activate haystack-dev2 * change column order for evaluatation dataframe * added missing eval column node_input * generic order for both document and answer returning nodes; ensure no columns get lost Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> * fix column reordering after renaming of node_input * simplify tests & add docu * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>	2022-01-14 14:37:16 +01:00
Sara Zan	e28bf618d7	Implement proper FK in `MetaDocumentORM` and `MetaLabelORM` to work on PostgreSQL (#1990 ) * Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints * update_document_meta() was not using index properly * Exclude ES and Memory from the cosine_sanity_check test * move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 13:48:58 +01:00
MichelBartels	3e4dbbb32c	Align similarity scores across document stores (#1967 ) * align document store similarity functions * remove unnecessary imports * undone accidental change * stopped weaviate from pretending to support dot product similarity * stopped weaviate from pretending to support dot product similarity * Add latest docstring and tutorial changes * fix fixture params for document stores * use cosine similarity for most tests * fix cosine similarity test * fix faiss test * fix weaviate test * fix accidental deletion * fix document_store fixture * test fix; shouldn't be merged * fix test_normalize_embeddings_diff_shapes * probably a better fix * fix for parameter combinations * revert new pytest_generate_tests functionality * simplify pytest_generate_tests * normalize embeddings for test_dpr_embedding * add to faiss doc that embeddings are normalized * Add latest docstring and tutorial changes * remove unnecessary parameters and add comments * simplify two lines of memory.py into one * test similarity scores with smaller language model * fix test_similarity_score Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-12 19:28:20 +01:00
Manos Papathanasiou	965b9614db	Upgrade pillow version to 9.0.0 (#1992 )	2022-01-12 09:59:51 +01:00
Dmitry Goryunov	79fdda8a7c	Remove hard-coded variables from the Tutorial 15 (#1984 ) * Remove hard-coded variables from the Tutorial 15 * Fix missing comma * Add latest docstring and tutorial changes * Fix formatting in Tutorial15_TableQA.ipynb * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-11 17:55:20 +01:00
tstadel	c861fdb2ce	Enable batch mode for SAS cross encoders (#1987 ) * enable batch mode for sas cross encoders * fix mypy * comment on top_1 values added	2022-01-11 17:54:43 +01:00
Sara Zan	9c3d9b4885	Add models to demo docker image (#1978 ) * Add utility to cache models and nltk data & modify Dockerfiles to use it * Fix punkt data not being cached	2022-01-11 16:37:45 +01:00
tstadel	192e03be33	Fix elasticsearch scores if they are 0.0 (#1980 ) * fix elasticsearch zero scores * remove unnecessary None check	2022-01-11 09:35:02 +01:00
Mathew Kuriakose	a44b6c18c0	Unify vector_dim and embedding_dim parameter in Document Store (#1922 ) * Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores * Unit test cases updated to use `embedding_dim` instead of `vector_dim` * Unit test case update to use embedding_dim instead of vector_dim * Add latest docstring and tutorial changes * Put usage of `vector_dim` param in same if-block as corresponding warning Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-01-10 18:10:32 +01:00
Benjamin Bossan	00dc30ae54	Use scikit-learn, not sklearn, in requirements.txt (#1974 )	2022-01-10 09:56:34 +01:00
ju-gu	b7041941df	change column order for evaluatation dataframe (#1957 ) * change column order for evaluatation dataframe * added missing eval column node_input * generic order for both document and answer returning nodes; ensure no columns get lost Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-07 14:13:28 +01:00
oryx1729	5b3f693562	Fix Dockerfile-GPU (#1969 )	2022-01-06 11:13:04 +01:00
mathislucka	db76a5c5c6	fix UserWarning from slow tensor conversion (#1948 ) * fix UserWarning from slow tensor conversion * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-05 22:42:54 +01:00

... 29 30 31 32 33 ...

2539 Commits