haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-20 07:21:09 +00:00

Author	SHA1	Message	Date
Adrien Wald	c401e86099	Use `ElasticsearchDocumentStore.get_all_documents` in `ElasticsearchFilterOnlyRetriever.retrieve` (#2151 ) * use get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve * Update Documentation & Code Style * add test case for es_filter_only retriever * Update Documentation & Code Style * fix test by adding empty string for query * Update Documentation & Code Style * add explicit name of argument "query" Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2022-04-25 09:53:48 +02:00
tstadel	25475a68c7	Match answer sorting in `QuestionAnsweringHead` with `FARMReader` (#2414 ) * match no_answer confidence * Update Documentation & Code Style * test added * Update Documentation & Code Style * fix tests * Update Documentation & Code Style * apply penalties of scores to confidences too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-21 11:24:39 +02:00
Malte Pietsch	4bf470286b	Upgrade xpdf to 4.0.4 (#2443 ) * Update minimal gpu docker image to xpdf 4.0.4 * Update Dockerfile-GPU * Update Dockerfile * Update Dockerfile-GPU * Update Dockerfile-GPU-minimal	2022-04-21 10:27:56 +02:00
Malte Pietsch	133a76229b	Add info about execution env to minimal GPU image (#2441 )	2022-04-21 08:30:42 +02:00
Sara Zan	07d7ecbff1	Make `python-magic` fully optional (#2412 ) * Add windows specific package for python-magic * Disable some tests on Windows and add explanatory warning in case of issues with libmagic Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-20 09:18:02 +02:00
tstadel	e862400256	Prevent Stackoverflow on Windows CI (#2426 ) * prevent stackoverflow on windows ci * Update Documentation & Code Style * fix is_windows condition * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-04-19 16:10:39 +02:00
Sara Zan	4eec2dc45e	Change YAML version exception into a warning (#2385 ) * Change exception into warning, add strict_version param, and remove compatibility between schemas * Simplify update_json_schema * Rename unstable into master * Prevent validate_config from changing the config to validate * Fix version validation and add tests * Rename master into ignore * Complete parameter rename Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 16:08:08 +02:00
Sara Zan	8abf11fbd3	Update `pdftotext` also on `pinecone` and `milvus1` CI jobs (#2433 ) * Upgrade pdftotext also on pinecone and milvus1 jobs * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 16:06:27 +02:00
Sara Zan	ba9c976bfe	Update `pdftotext` link (#2432 ) * Update pdftotext link * Update Documentation & Code Style * Update Tutorial8_Preprocessing.ipynb Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 14:30:18 +02:00
Sara Zan	929c685cda	Forbid usage of `args` and `kwargs` in any node's `__init__` (#2362 ) Add failing test * Remove `*kwargs` from docstores' `__init__` functions (#2407) Remove kwargs from ESDocStore subclasses * Remove kwargs from subclasses of SQLDocumentStore * Remove kwargs from Weaviate * Revert change in pinecone * Fix tests * Fix retriever test wirh weaviate * Change Exception into DocumentStoreError * Update Documentation & Code Style * Remove `*kwargs` from `FARMReader` (#2413) Remove FARMReader kwargs without trying to replace them functionally * Update Documentation & Code Style * enforce same index values before and after saving/loading eval dataframes (#2398) * Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350) * Add tests for missing init and super * Update Documentation & Code Style * change in with endswith * Move test in pipeline.py and change test in pipeline_yaml.py * Update Documentation & Code Style * Use caplog to test the warning * Update Documentation & Code Style * move tests into test_pipeline and use get_config * Update Documentation & Code Style * Unmock version name * Improve variadic args test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-14 16:42:02 +02:00
tstadel	46a50fb979	Fix python-magic making Windows CI stuck (#2425 ) * revert python-magic PR #2330 * Revert "revert python-magic PR #2330" This reverts commit 23fa2cc836e36daecd9e77d340dde6e32e25c82b. * remove python-magic dep * use python-magic-bin only * add comment about python-magic-bin	2022-04-14 16:08:55 +02:00
Sebastian	3d42b70fbb	Added macos version of xpdf in tutorial 8 (#2424 ) * Added macos version of xpdf in tutorial 8 * mini-error	2022-04-14 15:31:40 +02:00
Sara Zan	60428020ff	Exclude beir from windows install (#2419 )	2022-04-13 19:06:04 +02:00
Sara Zan	1a81080e8a	Add `apt update` in Linux CI (#2415 ) * Update linux_ci.yml	2022-04-13 15:35:56 +02:00
Sara Zan	d98883b79d	Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350 ) * Add tests for missing init and super * Update Documentation & Code Style * change in with endswith * Move test in pipeline.py and change test in pipeline_yaml.py * Update Documentation & Code Style * Use caplog to test the warning * Update Documentation & Code Style * move tests into test_pipeline and use get_config * Update Documentation & Code Style * Unmock version name * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-13 14:29:05 +02:00
tstadel	73f9ab0f57	enforce same index values before and after saving/loading eval dataframes (#2398 )	2022-04-13 13:35:36 +02:00
Sara Zan	96a538b182	Pylint (import related warnings) and REST API improvements (#2326 ) * remove duplicate imports * fix ungrouped-imports * Fix wrong-import-position * Fix unused-import * pyproject.toml * Working on wrong-import-order * Solve wrong-import-order * fix Pool import * Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py * remove Converter from modeling * Fix mypy issues on adaptive_model.py * create es_converter.py * remove converter import * change import path in tests * Restructure REST API to not rely on global vars from search.apy and improve tests * Fix openapi generator * Move variable initialization * Change type of FilterRequest.filters Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-12 16:41:05 +02:00
Branden Chan	75dcfd3fab	Delete files in docs/_src (#2322 ) * Delete files in _src * Filter unused images and re-add images that were in use in docs/img * Remove all usages of user-images.githubusercontent.com Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-04-12 16:19:03 +02:00
Sara Zan	4862bbcd73	Add `devices` alongside `use_gpu` in `FARMReader` (#2294 ) * Make initialize_device_settings take a devices list, and change signature of FARMReader * reintroduce use_gpu and propagate devices to other methods * fix typing for initialize_device_settings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-12 14:21:25 +02:00
Michele Pangrazzi	dd4361c129	Print warning in `EmbeddingRetriever` if sentence-transformers model used with different model format (#2377 ) * ensure correct embedding_encoder is loaded when embedding_model is a sentence-transformers model but model_format is missing or wrong * minor refactoring * do not update model_format and ensure a warning is logged when it could be wrong * Apply black * Apply black Co-authored-by: Michele Pangrazzi <michele@wonderflow.ai> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-04-12 11:52:27 +02:00
tstadel	8342a6c1d6	Fix eval discrepancies (#2381 ) * fix eval discrepancies * Update Documentation & Code Style * fix reader eval comparison * Update Documentation & Code Style * slightly improve messed up top_n_f1 func * add no_answer hint to reader.eval metrics * fix tut5 * Update Documentation & Code Style * correct doc_relevance_col in tests * Update Documentation & Code Style * redefine recall metrics for no_answers * fix bugs in EvalAnswers Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-12 09:24:22 +02:00
MichelBartels	a6927be132	Pass `use_auth_token` to sentence transformers EmbeddingRetriever (#2284 ) * enable auth token for sentence transformers Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-11 19:07:32 +02:00
mathislucka	5ac5b4e241	Fix: Auth token not passed for EmbeddingRetriever (#2404 ) * passing auth token allows to access private models * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-11 17:28:14 +02:00
tstadel	ab8ba75664	Set ci job timeout to 45 minutes (#2401 )	2022-04-11 16:28:26 +02:00
Branden Chan	4ef099d211	Reduce num REST API workers to accommodate smaller machines (#2400 ) * Reduce num REST API workers from 8 to 2 * Incorporate reviewer feedback	2022-04-11 13:26:27 +02:00
Giannis Kitsos Kalyvianakis	b94d9effaf	extract extension based on file's content (#2330 ) * extract extension based on file's content * Add python-magic dependency * fix the _estimate_extension function and lowercase the file extensions * check if the FileTypeClassifier can be imported * add test and new file types * fix typing * import Optional * revert Optional and make sure a string is always returned * fix test so that it skips markdown files * Emulate Code & Docs action * Generate schemas * Tidy up test code & extensioness files * Improve error messages * Revert schema changes * Emulate black and docs CI again	2022-04-11 09:16:30 +02:00
Sara Zan	ae712fe6bf	Upgrade `weaviate-client` to `3.3.3` and fix `get_all_documents` (#1895 ) * Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents * Add type * Update Weaviate version on the CI * Fix bug on get_document_count where there are no documents * Add more info in the docstrings of get_all_documents and get_all_documents_generator * Add latest docstring and tutorial changes * Apply Black * Update Documentation & Code Style * Trigger pipeline * Update Documentation & Code Style * Include StefanBogdan feedback * Fix mypy issues and LogicalFilterClause * Add more types * Update Documentation & Code Style * update setup.cfg * Upgrade weaviate containers too * Allow to filter for content field in Weaviate * Use convert_to_weaviate instead of convert_to_pinecone * Fix _get_all_documents_in_index * Update docstrings and docs * Catching an exception in get_document(s)_by_id Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-04-01 15:37:34 +03:00
Timo Moeller	3459020600	Add confidence filtering to FARMReader (#2376 ) Add confidence filtering to FARMReader	2022-03-31 15:18:05 +02:00
tstadel	3561037e82	Use cache for hf requests during CI (#2379 ) * increase all_close tolerance for milvus2, improve assertion infos * use request-cache for huggingface	2022-03-31 12:36:45 +02:00
Sara Zan	57bb8c4131	Update launch script for Milvus from 1.x to 2.x (#2378 )	2022-03-31 12:03:18 +02:00
MichelBartels	fc1cb63bcc	Fix RouteDocuments documentation (#2380 ) * fix RouteDocuments documentation * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-31 11:45:02 +02:00
tstadel	5b52690c5c	Increase all_close tolerance for milvus2, improve assertion infos (#2375 )	2022-03-31 11:41:13 +02:00
Florian Hardow	a273c3a51d	EvaluationSetClient for deepset cloud to fetch evaluation sets and la… (#2345 ) * EvaluationSetClient for deepset cloud to fetch evaluation sets and labels for one specific evaluation set * make DeepsetCloudDocumentStore able to fetch uploaded evaluation set names * fix missing renaming of get_evaluation_set_names in DeepsetCloudDocumentStore * update documentation for evaluation set functionality in deepset cloud document store * DeepsetCloudDocumentStore tests for evaluation set functionality * rename index to evaluation_set_name for DeepsetCloudDocumentStore evaluation set functionality * raise DeepsetCloudError when no labels were found for evaluation set * make use of .get_with_auto_paging in EvaluationSetClient * Return result of get_with_auto_paging() as it parses the response already * Make schema import source more specific * fetch all evaluation sets for a workspace in deepset Cloud * Rename evaluation_set_name to label_index * make use of generator functionality for fetching labels * Update Documentation & Code Style * Adjust function input for DeepsetCloudDocumentStore.get_all_labels, adjust tests for it, fix typos, make linter happy * Match error message with pytest.raises * Update Documentation & Code Style * DeepsetCloudDocumentStore.get_labels_count raises DeepsetCloudError when no evaluation set was found to count labels on * remove unneeded import in tests * DeepsetCloudDocumentStore tests, make reponse bodies a string through json.dumps * DeepsetcloudDocumentStore.get_label_count - move raise to return * stringify uuid before json.dump as uuid is not serilizable * DeepsetcloudDocumentStore - adjust response mocking in tests * DeepsetcloudDocumentStore - json dump response body in test * DeepsetCloudDocumentStore introduce label_index, EvaluationSetClient rename label_index to evaluation_set * Update Documentation & Code Style * DeepsetCloudDocumentStore rename evaluation_set to evaluation_set_response as there is a name clash with the input variable * DeepsetCloudDocumentStore - rename missed variable in test * DeepsetCloudDocumentStore - rename missed label_index to index in doc string, rename label_index to evaluation_set in EvaluationSetClient * Update Documentation & Code Style * DeepsetCloudDocumentStore - update docstrings for EvaluationSetClient * DeepsetCloudDocumentStore - fix typo in doc string Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-31 08:59:58 +02:00
bogdankostic	ca988917c9	Fix `TableReader` for tables without rows (#2369 ) * Skip tables without rows * Update Documentation & Code Style * Add tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-30 17:02:39 +02:00
MichelBartels	eb514a6167	Add evaluation and document conversion to tutorial 15 (#2325 ) * update tutorial 15 with newer features * Update Documentation & Code Style * fix tutorial 15 * update telemetry with tutorial changes * Update Documentation & Code Style * remove error output * add output * update non-notebook tutorial 15 * Update Documentation & Code Style * delete distracting output from tutorial 15 notebook * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-29 17:09:05 +02:00
bogdankostic	834f8c4902	Change return types of indexing pipeline nodes (#2342 ) * Change return types of file converters * Change return types of preprocessor * Change return types of crawler * Adapt utils to functions to new return types * Adapt __init__.py to new method names * Prevent circular imports * Update Documentation & Code Style * Let DocStores' run method accept Documents * Adapt tests to new return types * Update Documentation & Code Style * Put "# type: ignore" to right place * Remove id_hash_keys property from Document primitive * Update Documentation & Code Style * Adapt tests to new return types and missing id_hash_keys property * Fix mypy * Fix mypy * Adapt PDFToTextOCRConverter * Remove id_hash_keys from RestAPI tests * Update Documentation & Code Style * Rename tests * Remove redundant setting of content_type="text" * Add DeprecationWarning * Add id_hash_keys to elasticsearch_index_to_document_store * Change document type from dict to Docuemnt in PreProcessor test * Fix file path in Tutorial 5 * Remove added output in Tutorial 5 * Update Documentation & Code Style * Fix file_paths in Tutorial 9 + fix gz files in fetch_archive_from_http * Adapt tutorials to new return types * Adapt tutorial 14 to new return types * Update Documentation & Code Style * Change assertions to HaystackErrors * Import HaystackError correctly Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-29 13:53:35 +02:00
tstadel	a73717b2ea	Support conjunctive queries in sparse retrieval (#2361 ) * support conjunctive queries in sparse retrieval * fix typo * test added * Update Documentation & Code Style * fix test_DeepsetCloudDocumentStore_query Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-28 22:10:50 +02:00
mkkuemmel	04b56f0b1c	Replace dpr with embeddingretriever tut14 (#2336 ) * add updated graph images for tutorial14 * ipynb: replaced DPR with EmbeddingRetriever, added TODO for further inspection of failing code * Revert "ipynb: replaced DPR with EmbeddingRetriever, added TODO for further inspection of failing code" This reverts commit f4b6f3e1dbbedfd1bbe5e0e33645899dbea5d924. * ipynb: replaced DPR with EmbeddingRetriever, added TODO for further inspection of failing code * ipynb: quick fix to avoid failure in print_answers * py: quick fix to avoid failure in print_answers * Update Documentation & Code Style * ipynb: remove DPR, remove images * Revert "ipynb: remove DPR, remove images" This reverts commit dfa1e7585da6743fcf97488405c356bf935a976d. * ipynb: remove DPR, remove images * py: replace DPR with EmbeddingRetriever * Update Documentation & Code Style * correcting a typo * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: TuanaCelik <tuana.celik@deepset.ai>	2022-03-28 16:54:49 +02:00
tstadel	b20a1f874b	Fix sparse retrieval with filters returns results without any text-match (#2359 ) * use "must" instead of "should" for query-matching * Update Documentation & Code Style * fix mypy issue * fix finding of new pylint version * add test * fix test_retrieval Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-25 17:53:42 +01:00
Julian Risch	a398094243	update version to next release candidate (#2355 ) * update version to next release candidate * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-25 12:06:35 +01:00
Raphaël Merx	4ebb71d42d	Fix link to squad_to_dpr.py in DPR train tutorial (#2334 ) * Fix link to squad_to_dpr.py in DPR train tutorial * update tutorial 9	2022-03-25 12:05:12 +01:00
Julian Risch	70bbb649a7	change docu text about how to opt-out (#2358 ) * change docu text about how to opt-out * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-25 11:59:39 +01:00
Julian Risch	bf71f03ff2	release v1.3.0 and re-add Makefile (#2354 ) * release v1.3.0 and re-add Makefile * Update Documentation & Code Style * make BaseKnowledgeGraph abstract to remove it from the JSON schema * Logging paths for JSON schema generation * Add debug command in autoforma.yml * Typo * Update Documentation & Code Style * Fix schema path in CI * Update Documentation & Code Style * Remove debug statement from autoformat.yml * Reintroduce compatibility between 1.3.0 and 1.2.1rc0 schema Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com> v1.3.0	2022-03-23 17:22:06 +01:00
Julian Risch	cec0137693	Change document attribute from text to content (#2352 ) * Change document attribute from text to content * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-23 16:55:01 +03:00
Chris Byrd	3b2001e66f	Set provider parameter when instantiating onnxruntime.InferenceSession (#1976 ) * Set provider parameter when instantiating onnxruntime.InferenceSession fixes #1973 * Change device type to torch.device * set type annotation of device to torch.device everywhere * Apply Black * Change types of device and devices params across the codebase * Update Documentation & Code Style * Add type: ignore in the right location * Update Documentation & Code Style * Add type: ignore * feedback * Update Documentation & Code Style * feedback 2 * Fix convert_to_transformers * Fix syntax error * Update Documentation & Code Style * Consider augment and load_glove user-facing as well * Update Documentation & Code Style * Fix mypy * Update Documentation & Code Style Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-03-23 12:08:56 +01:00
tstadel	851fe1cf07	Fix `normalize_embedding` using numba (#2347 ) * fix normalize_embedding using numba * Update Documentation & Code Style * fix too-many-public-methods pylint msg Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-22 23:04:55 +01:00
bogdankostic	7e6ff8a205	Run Pinecone tests only if files related to Pinecone changed (#2343 ) * Run Pinecone tests only if files related to Pinecone changed * Change in pinecone.py that will be reverted * Revert change in pinecone.py * Test Pinecone also when filter_utils.py changes	2022-03-22 15:58:12 +01:00
tstadel	d438011432	fix launch scripts (#2341 )	2022-03-22 10:48:29 +01:00
Branden Chan	6233dfce2f	Let SquadData support data from Annotation Tool (#2329 ) * Support data from Annotation Tool * Update Documentation & Code Style * Incorporate reviewer feedback * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-03-22 10:17:25 +01:00
Julian Risch	7ffeccece6	Fix tutorial dataset paths (#2340 ) * fix tutorial 4 dataset path * fix tutorial 8 dataset path * fix tutorial 10 event * Update Documentation & Code Style * fix send event for tutorial 15 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-22 09:19:50 +01:00

... 51 52 53 54 55 ...

3803 Commits