haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-10 09:39:22 +00:00

Author	SHA1	Message	Date
Fabian	61ebe4b5dc	fix: authenticate with aws4auth if set in OpenSearchDocumentStore (#3741 ) * bug(OpenSearchDocumentStore): fix authenticate with aws4auth if set. Rearrange check to authenticate with aws4auth before username and password, as the username is set to "admin" by default. * Make username check less restrictive * Fix test, do not used mocked _init_client function * Add warning for aws4auth and username to ElasticSearchDocumentStore Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-01-24 10:01:39 +01:00
Zoltan Fedor	e447bd728a	feat: adding the ability to use Ray Serve async functionality (#3769 ) * Adding the ability to call the Ray pipeline from concurrent apps with async This is to fix #2968 * Fixes: mype + pylint (`invalid-overridden-method`) * Simplifying - no real need for an `AsyncRayPipeline` anymore * Moving the new `run_async` method to the `RayPipeline` * Cleanup * [EMPTY] Re-trigger CI	2023-01-23 16:23:09 +01:00
Benjamin BERNARD	eed009eddb	feat: Add `CsvTextConverter` (#3587 ) * feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline Fixes #3550, allow user to build full FAQ using YAML pipeline description and with CSV import and indexing. * feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline Fix linter issues mypy and pylint. * feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline Fix linter issues mypy. * implement proposal's feedback * tidy up for merge * use BaseConverter * use BaseConverter * pylint * black * Revert "black" This reverts commit e1c45cb1848408bd52a630328750cb67c8eb7110. * black * add check for column names * add check for column names * add tests * fix tests * address lists of paths * typo * remove duplicate line Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-01-23 15:56:36 +01:00
ZanSara	94f660c56f	feat: store `id_hash_keys` in `Document` objects to make documents clonable (#3697 ) * store id_hash_keys in Document objects * fix id_hash_keys calls throughout codebase * generate schema * fix es * fix weaviate * backward compatible * openapi schema * remove unused deprecation warning * remove unused imports * openapi * unused var * Apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/schema.py * Apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/schema.py * review feedback * trailing spaces * pylint * add deprecation test Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-01-23 15:00:52 +01:00
Stefano Fiorucci	b910df7ec7	feat: `ImageToText` (caption generator) (#3859 ) * first draft * fix pylint and mypy * retry w mypy * mypy :-) * rem unused import * incorporate feedback and initial tests * better tests * fix import order * fix docstring * other fix docstring * more and better tests Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-01-23 11:59:56 +01:00
ZanSara	90c877a559	bug: `mypy` should ignore files in `test/` (#3894 ) * exclude files in test/ * verify that the CI ignores test files * dont fail in case of no files	2023-01-19 18:12:26 +01:00
Vladimir Blagojevic	4c28253955	feat: PromptNode - implement stop words (#3884 )	2023-01-19 12:26:15 +01:00
Vladimir Blagojevic	e2fb82b148	refactor: Move invocation_context from meta to own pipeline variable (#3888 )	2023-01-19 11:17:06 +01:00
ZanSara	6f5a2fb1da	fix: remove string validation in YAML (#3854 ) * remove string validation in YAML * unused import * fix import * remove tests * fix tests	2023-01-19 10:06:53 +01:00
Ahmed Nabil	12e057837b	Adding condition to `pinecone` object. (#3768 ) * Adding condition to `pinecone` object. While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening. * Added test, and changed the code to make sure the pinecone idx variable has correct instance * fixed black error Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-01-19 01:34:44 +05:30
ZanSara	6af4f14fe0	feat: preprocessor raises warning when doc length exceeds threshold (#3837 ) * add warning for excessive lenght * improve test * review feedback * fix test * move into _process_single	2023-01-17 13:48:28 +01:00
ZanSara	9e457db2e9	test: add version deprecation fixture (#3851 ) * add fixture * Update test/conftest.py * remove +2 and add tests * few typos * more cases * Update test/conftest.py	2023-01-16 15:36:14 +01:00
ZanSara	3ffdb0a9a3	chore: fix all EOF (#3852 ) * fix all eof * fix test * fix test * fix test * typo * fix sample * fix sample * add logs * fix page_dynamic_result.txt	2023-01-16 12:34:50 +01:00
Massimiliano Pippi	fa4404baa0	fix: ignore non-serializable params when hashing pipeline objects (#3842 ) * ignore non-serializable params when hashing pipeline objects * make tests more clear	2023-01-11 17:11:41 +01:00
Stefano Fiorucci	be31178892	fix: make the crawler runnable and testable on Windows (#3830 ) * fix crawler and try to run CI * more compact expression * try to fix * improve naming regex * revert regex * make test_url compatible wirh Windows * better conditional expression	2023-01-10 20:27:28 +01:00
Tobias Wochinger	dea10a51d3	fix: gracefully handle `FileExistsError` during `Preprocessor` resource download (#3816 ) * fix: use temp path for downloading punkt resources * fix: gracefully handle file exists error during download	2023-01-10 11:22:49 +01:00
Zoltan Fedor	0288e1be76	bug: The `PromptNode` handles all parameters as lists without checking if they are in fact lists (#3820 )	2023-01-10 08:08:17 +01:00
tstadel	6ca88bfd23	fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field (#3662 ) * fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field * fix pylint * add tests * fix mypy * fix merge * format * fix pylint * move tests to SearchEngineDocumentStoreTestAbstract * move missed constants * add mocked_document_store fixture to TestElasticsearchDocumentStore * fix mocked_document_store * fix get_all_documents tests for elasticsearch>=7.16 * fix tests * fix tests try 2	2023-01-09 11:58:23 +01:00
Sebastian	5b0b338175	fix: Ensure eval mode for TableReader model for predictions (#3743 ) * Adding model.eval() calls to prediction functions in table reader * Add unit test to check if model is set in train mode that inference time prediction still works.	2023-01-09 11:07:06 +01:00
Sebastian	659020fcac	fix: Convert table cells to strings for compatibility with TableReader (#3762 ) * Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader * Turn more strings into ints * Make sure answer text is always a string.	2023-01-09 10:42:11 +01:00
tstadel	4a0a054164	fix: linefeeds in custom_query (#3813 ) * fix linefeeds in custom_query * add double quote test case	2023-01-05 17:13:04 +01:00
Julian Risch	0c2d13f1b8	bug: skip validating empty embeddings (#3774 ) * skip validating empty embeddings * skip batches without embeddings to update * add unit test with mocked retriever	2023-01-05 15:13:57 +01:00
Sebastian	e84fae2894	Migrating to use native Pytorch AMP (#2827 ) * Started making changes to use native Pytorch AMP * Updated compute_loss functions to use torch.cuda.amp.autocast * Updating docstrings * Add use_amp to trainer_checkpoint * Removed mentions of apex and started to add the necessary warnings * Removing unused instances of use_amp variable * Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train * Make max_query_length optional in FARMReader.train * Update lg Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-01-05 09:14:28 +01:00
Julian Risch	a2c160e7d8	bug: skip empty documents in reader (#3773 ) * skip empty documents * test eval_batch and account for tables	2023-01-03 15:50:14 +01:00
Julian Risch	b155297a06	feat: change PipelineConfigError to DocumentStoreError with more details (#3783 )	2023-01-02 19:40:45 +01:00
Vladimir Blagojevic	bebd6b26ec	Improve robustness of PromptNode unit tests (#3747 )	2023-01-02 16:28:56 +01:00
bogdankostic	594d2a10f8	fix: Fix `predict_batch` in `TransformersReader` for single nested Document list (#3748 ) * Fix restoring of list structure * Add tests	2022-12-29 11:48:18 +01:00
Stefano Fiorucci	136928714c	refactor: remove deprecated parameters from `Summarizer` (#3740 ) * remove deprecated parameters * remove deprecation/removal test	2022-12-29 15:37:47 +05:30
tstadel	6c067b2b4f	feat: make `score_script` first class citizen via `knn_engine` param (#3284 ) * OpenSearchDocumentStore: make score_script accessible via knn_engine * blacken * fix tests * fix format * fix naming of 'score_script' consistently * fix tests * fix test * fix ef_search tests * always validate index * improve clone_embedding_field * fix pylint * reformat * remove port * update tests * set no_implicit_optional = false * fix myp * fix test * refactorings * reformat * fix and refactor tests * better tests * create search_field mappings * remove no_implicit_optional = false * skip validation for custom mapping * format * Apply suggestions from docs code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply tougher suggestions from code review * fix messages * fix typos * update tests * Update haystack/document_stores/opensearch.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * fix tests * fix ef_search validation * add test for ef_search nmslib * fix assert_not_called Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-27 15:24:31 +01:00
Mayank Jobanputra	76a16807d5	fix: Fixed local reader model loading (#3663 ) * Fixed local loading issue	2022-12-24 03:46:36 +05:30
Sebastian	756e0114e6	refactor: Remove duplicate code in TableReader (#3708 ) * Refactor table reader to use util functions to reduce code duplication. * Expanding the tests for the table reader * Adding types * Updating tests to work for RCIReader * Fix bug in RCIReader. Saving the wrong queries list. * Update _flatten_inputs to not change input variable * Remove duplicate code	2022-12-21 14:33:19 +01:00
Vladimir Blagojevic	9ebf164cfd	feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-12-20 11:21:26 +01:00
Zoltan Fedor	e143f7cc36	Fixing broken BM25 support with Weaviate - fixes #3720 (#3723 ) * Fixing broken BM25 support with Weaviate - fixes #3720 Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit. Please see more under issue #3720. * Fixing mypy issue - method signature wasn't matching the base class * Mypy related test fix Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime. I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests. * Adding a note regarding an upcomming fix in Weaviate v1.17.0 * Apply suggestions from code review * revert * [EMPTY] Re-trigger CI	2022-12-19 17:24:46 +01:00
Vladimir Blagojevic	56803e5465	feat: Enable text-embedding-ada-002 for EmbeddingRetriever (#3721 ) * Enable text-embedding-ada-002 for EmbeddingRetriever * Easier to understand code, more unit tests	2022-12-19 17:06:48 +01:00
Stefano Fiorucci	5b9c661155	feat: add `index` parameter to `TfidfRetriever` (#3666 ) * first draft to add index param to tfidf * better mypy handling * Revert "better mypy handling" This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c. * new check in auto_fit * new check also in retrieve * better dict typings * new test and improvements to other test * remove unnecessary lambda * improve test * remove newline from openapi json * fix test * language fix Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 2 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 3 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 4 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 5 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 6 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * explicit index value handling * fix test * better error messages Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-19 12:07:49 +01:00
Stefano Fiorucci	e1401f79b6	refactor: improve Multilabel design (#3658 ) * first try and new test * fix test * fix unused import * remove comments * no more dataclass * add __eq__ and extend test * better design from review * Update schema.py * fix black * fix openapi * fix openapi 2 * new try to fix openapi * remove newline from openapi json	2022-12-13 10:45:56 +01:00
James Briggs	520b23ec1b	fix: pinecone metadata format (#3660 ) * fix for multilevel metadata dictionaries * add metadata dict formating to update function * typing * added check for labels meta * added more info to input parameters * added test for multilayer metadata * removed todo	2022-12-13 10:11:24 +01:00
tstadel	600dc2d611	refactor: filters type (#3682 ) * consolidate filters type * remove unnecessary optionals * fix mypy * fix pylint * fix pylint * move FilterType to schema * remove Optional from FilterType * move to Dict[str, Any] * Revert "move to Dict[str, Any]" This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5. * fix mypy * fix pylint * revert isort changes in elasticsearch * remove todos in milvus.py * remove todos in sql.py * add aggregate_labels tests * consolidate aggregate_labels tests * remove superfluous type todos * remove ALL superfluous #todos	2022-12-12 14:04:29 +01:00
Unai Garay Maestre	77cea8b140	feat: Adds all_terms_must_match parameter to BM25Retriever at runtime (#3627 ) * Adds all_terms_must_match implementation and tests * Adds all_terms_must_match as Optional Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com> * Avoid mypy error and follow pattern checking var is None * Mypy works ok on this file now * added mypy ignores to BaseRetriever * ignoring all overrides for this file * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * marked elasticsearch Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-08 17:18:43 +05:30
tstadel	c1c1c97bb2	feat: add query_by_embedding_batch (#3546 ) * add query_by_embedding_batch * fix mypy * fix pylint * add test * move query_by_embedding_batch to search_engine * fix and add tests * fix pylint * remove Retriever query logs * add test for multimodal batch retrieval * allow for np.ndarray	2022-12-08 08:28:43 +01:00
Sebastian	25bf95d47f	Update table reader tests to include checking the score of answers. (#3641 )	2022-12-07 07:30:49 -08:00
Sara Zan	fc89f6ea74	fix: revert Weaviate query with filters and improve tests (#3646 ) * revert weaviate query with filters and improve tests * pylint * upgrade weaviate container * use latest docker tag * fix text * fix text	2022-12-06 14:48:58 +01:00
Vladimir Blagojevic	e4c3817d01	Adjust get_type() method for pipelines (#3657 )	2022-12-02 14:48:47 +01:00
Julian Risch	adb580b6b7	feat: add offsets_in_context to evaluation result (#3640 ) * add offsets_in_context to eval result * extend test case	2022-11-30 11:43:42 +01:00
Massimiliano Pippi	b20f808119	refactor: move more tests to the base class (#3637 ) * move more tests to the base class * skip tests where unsupported * do not pass index label explicitly * skip test for Pinecone	2022-11-29 08:43:27 +01:00
Mayank Jobanputra	95cf666a20	refactor: change MultiModal retriever to be of type DenseRetriever (#3598 ) * changed Multimodal retriever to be of type DenseRetriever * format fix * Pylint fix * Added embed_queries and tests	2022-11-28 19:24:22 +01:00
Massimiliano Pippi	6f9a0f2215	use 9200 as the default port in launch_opensearch (#3630 )	2022-11-28 19:06:45 +05:30
Sara Zan	eb7b9452d0	refactor: Weaviate query with filters (#3628 )	2022-11-28 12:26:33 +01:00
Massimiliano Pippi	c6890c3e86	chore: remove redundant tests (#3620 ) * remove redundant tests * skip test on win * fix missing import * revert mistake * revert	2022-11-25 20:55:21 +05:30
Massimiliano Pippi	a15af7f8c3	refactor: Move `InMemoryDocumentStore` tests to their own class (#3614 ) * move tests to their own class * move more tests * add specific job * fix test * Update test/document_stores/test_memory.py Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-11-23 15:33:46 +01:00

... 12 13 14 15 16 ...

1204 Commits