haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-19 06:52:56 +00:00

Author	SHA1	Message	Date
Sara Zan	a095aea21e	Reintroduce push on master trigger for Linux CI (#2127 ) * Reintroduce push on master trigger with Linux CI * Reintroduce trigger for freshly opened PRs too	2022-02-04 18:06:23 +01:00
Sara Zan	859a87f71a	Remove requirements file (#2128 )	2022-02-04 18:05:47 +01:00
Buruk Aregawi	d3c776843f	Speed up query_by_embedding in InMemoryDocumentStore. (#2091 ) * Speed up query_by_embedding in InMemoryDocumentStore. * Make sure query and document embeddings are of the same dtype since they can vary. * Handle cases where there are 0 and 1 documents. * Don't put entire embedding matrix on GPU at once. Use separate get_score functions for the CPU and GPU. * Norm the vectors in get_scores_numpy in a safer way. * Apply Black * Incorporate missing factor of 4 in memory use calculation. * Apply Black Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-04 17:17:17 +01:00
tstadel	518a439482	OpenSearchDocumentStore: Extend similarity support (#2070 ) * get rid of global space_type setting * full_similarity_support * fallback to exact vector similarity * cone_embedding_field() instead of full_similarity_support * multiple embedding fields handling * update documentation and messages * revert unnecessary changes * Add latest docstring and tutorial changes * typo * Add latest docstring and tutorial changes * update docs * Add latest docstring and tutorial changes * improve messages * further improve messages * support l2 in ElasticsearchDocumentStore * Apply Black * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-02-04 16:37:08 +01:00
Sara Zan	c6bfb1c1d4	Remove rest_api extra frpom Dockerfile-GPU (#2122 )	2022-02-04 16:06:40 +01:00
Sara Zan	957e78ed9e	Upgrade `pydoc-markdown` & refactor GitHub Actions (#2117 ) * Upgrade pydoc-markdown and fix the YAMLs to work with it * Pin pydoc-markdown to major version * Generalize pydoc-markdown workflow * Make a single Action to perform all tasks that require committing into the local branch * Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green * Installing Jupyter deps for Black * Build cache before running generation tasks * Add check not to run the code generation on master * Simplify push action * Add more test deps in setup.cfg and remove from GH Action workflow * Remove forced upgrades on pip install Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-04 15:45:09 +01:00
bogdankostic	f062911040	Extend metadata filtering support in `ElasticsearchDocumentStore` (#2108 ) * Add extended filtering to ESDocumentStore * Add Docstrings * Fix definition of filter queries * Fix mypy * Add tests * Add latest docstring and tutorial changes * Adapt Docstrings * Adapt tests to added test_docs * Adapt tests to added test_docs * Adapt tests to added test_docs * Adapt tests to added test_docs * Add filtering utils for same representation in all doc stores * Apply balck formatting * Update documentation * Fix mypy * Apply Black * Fix mypy * Adopt Doc Strings * Add more tests * Apply Black * Allow filtering in OpenSearchDocStore * Update documentation * Adapt Docstrings * Update documentation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-04 13:43:12 +01:00
mathislucka	34f9308e1a	Simplify SQuAD data to df conversion (#2124 ) * Conversion to df does not need initialization * Apply Black * fix test case * Apply Black Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-04 12:37:56 +01:00
Julian Risch	53decdcefb	Allow different filters per query in pipeline evaluation (#2068 ) * add filters attribute to labels and use in eval * Add latest docstring and tutorial changes * overwrite params if None * populate filters from Label to MultiLabel * add query_id in eval df and deepcopy params for each label * fix mypy * add test for aggregating filters in multilabel * use query ids also in answers df * loop through unique query_ids * hash filters and query text as id * Add latest docstring and tutorial changes * fix top_k reader eval * Apply Black * rename query_id to id/multilabel_id * Apply Black * json dump filters in dataframe * add filters and id to wrong_examples() * Apply Black Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-02-03 19:19:05 +01:00
Buruk Aregawi	1fa682ac73	Fixed performance bug. Using a list where a set is needed. (#2125 )	2022-02-03 18:58:28 +01:00
Sara Zan	a59bca3661	Apply black formatting (#2115 ) * Testing black on ui/ * Applying black on docstores * Add latest docstring and tutorial changes * Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too * Remove comments * Relax constraints on pydoc-markdown * Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade * Fix a couple of bugs * Add a type: ignore that was missing somehow * Give path to black * Apply Black * Apply Black * Relocate a couple of type: ignore * Update documentation * Make Linux CI run after applying Black * Triggering Black * Apply Black * Remove dependency, does not work well * Remove manually double trailing commas * Update documentation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-03 13:43:18 +01:00
tstadel	9974593c5e	Fix Seq2SeqGenerator return type (#2099 ) * return proper Answer objs * fix docstrings * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-03 00:20:24 +01:00
Sara Zan	3a6e64b2a3	Make FileTypeClassifier more flexible (#2101 ) * Make FileTypeClassifier more flexible * Make supported_types a init parameter * Add tests and fix a couple of bugs * Formatting * Fix mypy * Implement feedback	2022-02-02 17:51:04 +01:00
Sara Zan	767f0025c6	Make `ui` and `rest` proper packages (#2098 ) * Adding simple setup.py to ui/ and rest_api and remove respective extras from main setup.cfg * Make 'pip install rest_api/' fetch the local Haystack instead of downloading from pypi * Add some comments to the new setup.py files and fix the Dockerfiles * Add version info to 'farm-haystack-ui' * Fix the OpenAPI Specs workflow * Install rest_api and ui properly on the CI too * Make the workflow see changes on every setup file * Fix workflow cache keys * Add license to rest_api and ui	2022-02-02 16:14:12 +01:00
Sara Zan	009c89fc53	Revert "Make the docstring bot work only on master" (#2114 ) * Revert "Make the docstring bot work only on master (#2078)" This reverts commit 649d07405770cd59696d0120107a3b2f0aafe7c2. * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-02 16:08:34 +01:00
Sebastián Ramírez	3c768071d5	✨ Add JSON Schema autogeneration for Pipeline YAML files (#2020 ) * 🎨 Update type annotations to allow their extraction for JSON Schema * ✨ Add main script doing all the work to generate the JSON Schema * ➕ Add GitHub Action dependency to generate JSON Schema * ✨ Update JSON Schema generation script to allow easily generating the schema without making a PR * 👷 Add GitHub Action to generate JSON Schema * 💚 Fix CI GitHub Action * 💚 Update GitHub Action environment variables * ✨ Add initial JSON Schema * Add latest docstring and tutorial changes * 🐛 Do not allow extra params not defined in each model * ♻️ Make any additional properties invalid * ✨ Make other additional properties invalid in all the levels in pipelines * ♻️ Do not include Base classes as possible nodes * 🍱 Update JSON Schema Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-02 15:00:41 +01:00
Julian Risch	3245cdef1d	Add faiss dependency to tutorial 12 (#2109 )	2022-02-02 14:19:08 +01:00
mathislucka	88771b2bee	Provide option to recreate es doc store on initialization (#2084 ) * provide option to recreate es doc store on initialization * Add latest docstring and tutorial changes * Label expects more arguments * Label expects also an answer Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-02-02 11:03:15 +01:00
Sara Zan	649d074057	Make the docstring bot work only on master (#2078 )	2022-02-01 14:09:55 +01:00
MichelBartels	525884e4cf	do not apply data parallel twice (#2095 )	2022-02-01 12:24:51 +01:00
MichelBartels	e0c072d6fd	Distribute intermediate layer distillation loss calculation over multiple GPUs (#2090 ) * distribute tinybert loss calculation * improve doc string * undo unnecessary change * fix for only one gpu * adding type hints * making sure model distillation still works without gpu * fix bug * fixing type hints	2022-02-01 09:47:00 +01:00
Sowmiya Jaganathan	7d769d8bf1	Fixed the Search Field mapping in ElasticSearch DocumentStore (#2080 ) * Review changes * Added the synonym analyser for search fields * Added the review requests. * Added the synonyms the OpenSearchDocumentStore and review requests.	2022-01-31 11:11:20 +01:00
bogdankostic	bbb65a19bd	Add Tapas reader with scores (#1997 ) * Add Tapas reader with scores * Adapt possible answer spans * Add latest docstring and tutorial changes * Remove unused imports * Adapt scoring * Add latest docstring and tutorial changes * Fix mypy * Infer model architecture from config * Adapt answer score calculation * Add latest docstring and tutorial changes * Fix mypy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-31 10:23:12 +01:00
Malte Pietsch	ee6b8d0688	Add ADR template for transparent architecture decisions (#2072 ) * add adr template for decisions * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-01-28 17:33:53 +01:00
Kristof Herrmann	7764b6992c	DC SDK - load pipeline from deepset cloud (#2013 ) * initial load_from_dc * typo * adjusted api endpoint * removed kwargs * added _load_from_dict * refactor pipeline loading mechanism * renaming load_from_dc api * renaming * fixed errors * fix comments and environment variable overrides * Add latest docstring and tutorial changes * fix outdated YAML examples * Add latest docstring and tutorial changes * Introduce readonly DCDocumentStore (without labels support) (#1991) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de> * introduce DeepsetCloudAdapter * Add latest docstring and tutorial changes * introduce DeepsetCloudClient * Add latest docstring and tutorial changes * use json api for pipeline_config * indexing pipeline test added * pseudo change to force cache eviction * revert pseudo change to force cache eviction * remove conftest duplicates * minor formatting and docstring fixes * fix tests when MOCK_DC=False Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-28 17:32:56 +01:00
Sara Zan	07cf3c614a	Disable cache on the CI (#2083 ) * Disable cache on the CI * Reintroduce paths * Add most files to the cache key * remove date and path from cache key * Try double install with cache * Try to cache more stuff, on a per-commit basis * Fix windows CI too * Add comment on how to speed up the CI with better caching	2022-01-28 17:21:23 +01:00
tstadel	1b1e44e771	install haystack in editable mode for ci (#2082 )	2022-01-28 09:59:28 +01:00
Sara Zan	713771095b	Autogenerate OpenAPI specs file (#2047 ) * Add docstrings to the REST API endpoint to have them included in the OpenAPI specs * Attempt at make GitHub CI generate the OpenAPI specs * Missing __init__.py was breaking rest_api import * Add comment on dummy pipeline * Create separate workflow file for the OpenAPI specs generation Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>	2022-01-27 13:06:01 +01:00
Sara Zan	3c02aa50d0	Remove run_docker_gpu.sh (#2003 ) * Remove run_docker_gpu.sh * remove shell formatting check from CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2022-01-27 12:20:43 +01:00
Sara Zan	9af1292cda	Remove stray requirements.txt files and update README.md (#2075 ) * Remove stray requirements.txt files and update README.md * Remove requirement files * Add details about pip bug and link to setup.cfg Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-27 11:22:14 +01:00
AhmedIdr	488c3e9e52	pass faiss batch_size to sqldocumentstore (#2061 )	2022-01-26 19:35:16 +01:00
Julian Risch	5079c6847a	Convert doc embedding from ndarray to list of float for REST API (#1901 ) * convert ndarray doc embedding to list of float * check type of embedding of each doc individually * Fix in case documents is None	2022-01-26 18:20:44 +01:00
Sara Zan	d470b9d0bd	Improve dependency management (#1994 ) * Fist attempt at using setup.cfg for dependency management * Trying the new package on the CI and in Docker too * Add composite extras_require * Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports * Fix bug on class import and rephrase error message * Introduce typing for optional modules and add type: ignore in sparse.py * Include importlib_metadata backport for py3.7 * Add colab group to extra_requires * Fix pillow version * Fix grpcio * Separate out the crawler as another extra * Make paths relative in rest_api and ui * Update the test matrix in the CI * Add try catch statements around the optional imports too to account for direct imports * Never mix direct deps with self-references and add ES deps to the base install * Refactor several paths in tests to make them insensitive to the execution path * Include tstadel review and re-introduce Milvus1 in the tests suite, to fix * Wrap pdf conversion utils into safe_import * Update some tutorials and rever Milvus1 as default for now, see #2067 * Fix mypy config Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 18:12:55 +01:00
MichelBartels	4cc37548e3	Fix finetuning notebook augmentation (#2071 ) * fix data augmentation path in finetuning notebook * Add latest docstring and tutorial changes * make distillation possible with other models than BERT * use smaller dataset for distillation in finetuning tutorial * Add latest docstring and tutorial changes * make data augmentation in finetuning faster * update language models forward doc strings * fix return type of language models * remove debug output Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 17:49:14 +01:00
Sowmiya Jaganathan	c4fff19018	Supported Highlighting in Elasticsearch (#1930 ) * Supported Highlighting * Review changes * add example to docstrings * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net> Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>	2022-01-26 17:35:33 +01:00
Adrien Wald	2edc421a09	Add `top_k_join` parameter to `JoinDocuments.run` (#2065 ) * add top_k_join parameter to JoinDocuments.run * test JoinDocuments concatenate with top_k_join parameter * test two different top_k_join parameters	2022-01-26 17:30:16 +01:00
mathislucka	5b7e906e85	fix: get_documents_by_id should return docs for all passed ids (#2064 ) * doc store should return all documents matching ids passed to get_documents_by_id * test for get_document_by_id should be named correctly * add test for get_documents_by_id * Add latest docstring and tutorial changes * document es query limit * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-26 12:39:04 +01:00
Julian Risch	0f34983f74	fix answer is not subscriptable error (#2069 ) * fix answer is not subscriptable error * fix answer is not subscriptable in script	2022-01-26 11:45:45 +01:00
tstadel	8a32d8da92	Introduce readonly DCDocumentStore (without labels support) (#1991 ) * minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>	2022-01-25 20:36:28 +01:00
Sara Zan	d147443cb1	Pin Milvus to <2.0.0 (#2063 )	2022-01-25 17:12:56 +01:00
MichelBartels	5b6b0cef77	Add UnlabeledTextProcessor (#2054 ) * add UnlabeledTextProcessor * allow choosing processor when finetuning or distilling * fix type hint * Add latest docstring and tutorial changes * improve segment id computation for UnlabeledTextProcessor * add text and documentation * change batch size parameter for intermediate layer distillation * Add latest docstring and tutorial changes * fix distillation dim mapping * remove unnecessary changes * removed confusing parameter * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-25 14:54:34 +01:00
Julian Risch	c6f23dce88	upgrade haystack version number to 1.1.0 (#2039 ) * upgrade haystack version number to 1.1.0 * copy docs to new version folder v1.1.0	2022-01-20 13:45:38 +01:00
tstadel	50317d74bd	Add ndcg and eval_mode to docs (#2038 ) * add ndcg and eval_mode to docstrings and reorder dataframe columns in docs * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-20 13:02:46 +01:00
MichelBartels	e8cd5ea943	Add distillation to finetuning tutorial (#2025 ) * Add finetuning tutorial * Add latest docstring and tutorial changes * fix typo * Add latest docstring and tutorial changes * improve distillation explanation in finetuning tutorial * Add latest docstring and tutorial changes * allow augment_squad.py to be easier to call from within python * Update Tutorial2_Finetune_a_model_on_your_data.py * fix squad augmentation test Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-20 12:18:32 +01:00
oryx1729	cb881b6fa9	Disable pip cache for Dockerfiles (#2015 )	2022-01-19 10:26:17 +01:00
Kristof Herrmann	6267476015	Bugfix - save_to_yaml for OpenSearchDocumentStore (#2017 ) * fix save_to_yaml * add link to issue * added generic implementation * added type * remove not used imports	2022-01-19 10:10:50 +01:00
Yorick van Zweeden	ea10d011ab	Replace SessionState with Streamlit built-in (#2006 ) * Replace SessionState with Streamlit built-in * Set session state to default if absent Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>	2022-01-18 14:59:42 +01:00
MichelBartels	0cca2b97cd	distinguish intermediate layer & prediction layer distillation phases with different parameters (#2001 ) * add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation * Add latest docstring and tutorial changes * improve default parameters * Add latest docstring and tutorial changes * split up distillation method * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 20:40:38 +01:00
tstadel	f42d2e8ba0	Add nDCG to `pipeline.eval()`'s document metrics (#2008 ) * add ndcg metric * fix merge * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 18:36:41 +01:00
Julian Risch	2c063e960e	Extend Tutorial 5 with Upper Bound Reader Eval Metrics (#1995 ) * print report for closed-domain eval * Add latest docstring and tutorial changes * rename parameter and rewrite docs * Add latest docstring and tutorial changes * print eval report in separate cell * Add latest docstring and tutorial changes * explain when to eval individual components Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-01-14 16:29:18 +01:00

... 54 55 56 57 58 ...

3803 Commits