haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-10-29 00:39:05 +00:00

Author	SHA1	Message	Date
Daniel Bichuetti	5187cc1801	refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128 ) * fix: fix audio tests + unbound some dependencies * fix: update for Python 3.8 * refactor: change numpy assertion * feat: add voice recog. support on audio tests * fix: fix var assignement * chore: dummy commit * fix: fix sndfile error * refactor: change skip reason * refactor: hardcode variable * refactor: unpin numpy * fix: pin numpy only for audio	2023-02-16 22:12:17 +05:30
Sebastian	9a26942952	feat: Add model_kwargs option to PromptNode (#4151 ) * Add input option to PromptNode to allow the passing of default kwargs * Add yaml test for model_kwargs parameter	2023-02-15 18:46:26 +01:00
Sebastian	75ef959678	feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038 ) * added instruction_prompt and update defaults * Change back max_tokens * Code formatting * Starting to update instruction_prompt to be a PromptTemplate * Using PromptTemplate in OpenAIAnswerGenerator * Removed hardcoded value * pylint and make examples and examples_context optional prompt parameters * Added new test for when prompt length goes past max token limit * Improve doc strings. * Make "text-davinci-003" the new default model * Renaming variable to prompt_template and name to question-answering-with-examples * Reduced repetitive code. * Added some comments to explain key logic for future debuggers * Update docs for max_tokens and increase defaul * Updating variable name to prompt_template and docs. * Updated test and handled Answer case where no documents are used. * Slight update to docs. * Adding more doc strings * lg updates * Blackify --------- Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-12 00:08:07 +01:00
Vladimir Blagojevic	d839b9314f	Update PromptTemplate tests (#4131 )	2023-02-10 15:24:01 +01:00
Jack Butler	e6b6f70ae2	fix: Fix `TableTextRetriever` for input consisting of tables only (#4048 ) * fix: update kwargs for TriAdaptiveModel * fix: squeeze batch for TTR inference * test: add test for ttr + dataframe case * test: update and reorganise ttr tests * refactor: make triadaptive model handle shapes * refactor: remove duplicate reshaping * refactor: rename test with duplicate name * fix: add device assignment back to TTR * fix: remove duplicated vars in test --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-02-09 11:38:16 +01:00
Silvano Cerza	274746db07	style: Update black (#4101 ) * Update black version * Format file with new black style * Update black pre-commit hook version	2023-02-08 15:34:43 +01:00
Sebastian	1bbf10a376	Remove double batching in retrieve_batch (#4014 ) * Removed double batching around embed_queries * Add back tests for retrieve_batch for dpr and embedding retrievers * Updated table-text-retriever to not double batch * Fixing pylint * Update to test * Remove code breaking test * Updating dev comment to be clearer	2023-02-08 14:39:20 +01:00
Sebastian	01d39df863	feat: Update allowed models to be used with Prompt Node (#4018 ) * Update allowed models to be used with Prompt Node * Added try except block around the config to skip over OpenAI models. * Fixing tests * Adding warning message * Adding test for different HF models that could be used in prompt node	2023-02-08 12:47:52 +01:00
Stefano Fiorucci	5c009c2a1a	feat: OpenAI - warn users if `max_tokens` is too short (#4094 ) * warn users if max_tokens is too short * skip test if not API KEY * add counters * correctly run precommit	2023-02-08 10:39:40 +01:00
tstadel	92c58cfda1	feat: Support multiple document_ids in Answer object (for generative QA) (#4062 ) * initial version without shapers * set document_ids for BaseGenerator * introduce question-answering-with-references template * better prompt * make PromptTemplate control output_variable * update schema * fix add_doc_meta_data_to_answer * Revert "fix add_doc_meta_data_to_answer" This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9. * fix add_doc_meta_data_to_answer * fix eval * fix pylint * fix pinecone * fix other tests * fix test * fix flaky test * Revert "fix flaky test" This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775. * adjust docstrings * make Label loading backward-compatible * fix Label backward compatibility for pinecone * fix Label backward compatibility for search engines * fix Label backward compatibility for deepset Cloud * fix tests * fix None issue * fix test_write_feedback * add tests for legacy label support * add document_id test for pinecone * reduce unnecessary contents * add comment to pinecone test	2023-02-08 08:37:22 +01:00
Vladimir Blagojevic	3273a2714d	fix: Add PromptTemplate __repr__ method (#4058 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-02-07 08:14:32 +01:00
ZanSara	9009a9ae58	feat: add `Shaper` (#3880 ) * Shaper initial version * Inital pydoc * Add more unit tests * Fix pydoc, expand Shaper pydoc with YAML example * Minor fix * Improve pydoc * More unit tests with prompt node * Describe Shaper functions in pydoc * More pydoc * Use pytest.raises instead of catching errors * Improve test_function_invocation_order unit test * pylint fixes * Improve run_batch handling * simpler version, initial stub * stubbing tests * promptnode compatibility * add tests * simplify * fix promptnode tests * pylint * mypy * fix corner case & mypy * mypy * review feedback * tests * Add lg updates * add rename * pylint * Add complex unit test with two PNs and ICMs in between (#3921) Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> * docstring * fix tests * add join_lists * add documents_to_strings * fix tests * allow lists of input values * doc review feedback * do not use locals() * Update with minor lg changes * fix corner case in ICM * fix merge * review feedback * answers conversions * mypy * add tests * generative answers * forgot to commit --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-01 18:36:13 +01:00
Zoltan Fedor	2b1849f525	fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 (#3898 ) * fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 * Add comments and refactoring todo note * Fix logging-fstring-interpolation pylint * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-01-31 09:33:47 +01:00
Daniel Bichuetti	3009ac2988	feat: Add page range support to PDF converters. (#3965 ) * feat: add start and eng page to PDF converters * docs: add missing docstrings * refactor: change list set up, add docstrings and comment * fix: add missing parameter * tests: add page range basic test * tests: test correct page numbers * tests: remove OCR page range test Poppler and Tesseract not installed on CI fix: remove mobile change error	2023-01-30 14:09:22 +01:00
hsm207	08ec059b14	refactor: use weaviate client to build BM25 query (#3939 ) * refactor: use weaviate client to build BM25 query * refactor: remove manual BM25 query building * refactor: apply BM25 to the content_field only * test: update weaviate BM25 retrieval test case update to account for lack of stemming --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-01-30 10:07:07 +01:00
Tuana Celik	93312138de	fix: removing code block in `MarkdownConverter` (#3960 ) * first attempt to add frontmatter of markdown to the metadata * remove bug fix * running black and pre-commit * moving the import line * adding a test * adding pydoc * fix to removing code blocks in markdown converter * adding a test * fixing a test * improving tests * adding language to code block	2023-01-27 15:25:54 +01:00
Tuana Celik	790e9acd3e	feat: add frontmatter to meta in `MarkdownConverter` (#3953 ) * first attempt to add frontmatter of markdown to the metadata * remove bug fix * running black and pre-commit * moving the import line * adding a test * adding pydoc	2023-01-26 17:15:02 +01:00
Vladimir Blagojevic	ec85207cf7	Remove __eq__ and __hash__ from PromptNode (#3923 )	2023-01-26 13:38:35 +01:00
Vladimir Blagojevic	b945eaeabd	PromptNode: expose output_variable, adjust unit tests (#3892 )	2023-01-26 11:01:11 +01:00
ZanSara	0e471d5e5a	fix: change model in distillation test (#3944 ) * change model * change layer count * move promptnode tests in integration * fix marker	2023-01-25 23:32:11 +05:30
Vladimir Blagojevic	4d8b1d0b22	refactor: Improve stop_words handling, add unit test cases (#3918 ) * Improve stop_words handling, add unit test cases * Update test/nodes/test_prompt_node.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-01-24 12:52:41 +01:00
Benjamin BERNARD	eed009eddb	feat: Add `CsvTextConverter` (#3587 ) * feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline Fixes #3550, allow user to build full FAQ using YAML pipeline description and with CSV import and indexing. * feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline Fix linter issues mypy and pylint. * feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline Fix linter issues mypy. * implement proposal's feedback * tidy up for merge * use BaseConverter * use BaseConverter * pylint * black * Revert "black" This reverts commit e1c45cb1848408bd52a630328750cb67c8eb7110. * black * add check for column names * add check for column names * add tests * fix tests * address lists of paths * typo * remove duplicate line Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-01-23 15:56:36 +01:00
Stefano Fiorucci	b910df7ec7	feat: `ImageToText` (caption generator) (#3859 ) * first draft * fix pylint and mypy * retry w mypy * mypy :-) * rem unused import * incorporate feedback and initial tests * better tests * fix import order * fix docstring * other fix docstring * more and better tests Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-01-23 11:59:56 +01:00
Vladimir Blagojevic	4c28253955	feat: PromptNode - implement stop words (#3884 )	2023-01-19 12:26:15 +01:00
Vladimir Blagojevic	e2fb82b148	refactor: Move invocation_context from meta to own pipeline variable (#3888 )	2023-01-19 11:17:06 +01:00
ZanSara	6af4f14fe0	feat: preprocessor raises warning when doc length exceeds threshold (#3837 ) * add warning for excessive lenght * improve test * review feedback * fix test * move into _process_single	2023-01-17 13:48:28 +01:00
ZanSara	3ffdb0a9a3	chore: fix all EOF (#3852 ) * fix all eof * fix test * fix test * fix test * typo * fix sample * fix sample * add logs * fix page_dynamic_result.txt	2023-01-16 12:34:50 +01:00
Stefano Fiorucci	be31178892	fix: make the crawler runnable and testable on Windows (#3830 ) * fix crawler and try to run CI * more compact expression * try to fix * improve naming regex * revert regex * make test_url compatible wirh Windows * better conditional expression	2023-01-10 20:27:28 +01:00
Tobias Wochinger	dea10a51d3	fix: gracefully handle `FileExistsError` during `Preprocessor` resource download (#3816 ) * fix: use temp path for downloading punkt resources * fix: gracefully handle file exists error during download	2023-01-10 11:22:49 +01:00
Zoltan Fedor	0288e1be76	bug: The `PromptNode` handles all parameters as lists without checking if they are in fact lists (#3820 )	2023-01-10 08:08:17 +01:00
Sebastian	5b0b338175	fix: Ensure eval mode for TableReader model for predictions (#3743 ) * Adding model.eval() calls to prediction functions in table reader * Add unit test to check if model is set in train mode that inference time prediction still works.	2023-01-09 11:07:06 +01:00
Sebastian	659020fcac	fix: Convert table cells to strings for compatibility with TableReader (#3762 ) * Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader * Turn more strings into ints * Make sure answer text is always a string.	2023-01-09 10:42:11 +01:00
tstadel	4a0a054164	fix: linefeeds in custom_query (#3813 ) * fix linefeeds in custom_query * add double quote test case	2023-01-05 17:13:04 +01:00
Julian Risch	0c2d13f1b8	bug: skip validating empty embeddings (#3774 ) * skip validating empty embeddings * skip batches without embeddings to update * add unit test with mocked retriever	2023-01-05 15:13:57 +01:00
Sebastian	e84fae2894	Migrating to use native Pytorch AMP (#2827 ) * Started making changes to use native Pytorch AMP * Updated compute_loss functions to use torch.cuda.amp.autocast * Updating docstrings * Add use_amp to trainer_checkpoint * Removed mentions of apex and started to add the necessary warnings * Removing unused instances of use_amp variable * Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train * Make max_query_length optional in FARMReader.train * Update lg Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-01-05 09:14:28 +01:00
Julian Risch	a2c160e7d8	bug: skip empty documents in reader (#3773 ) * skip empty documents * test eval_batch and account for tables	2023-01-03 15:50:14 +01:00
Vladimir Blagojevic	bebd6b26ec	Improve robustness of PromptNode unit tests (#3747 )	2023-01-02 16:28:56 +01:00
bogdankostic	594d2a10f8	fix: Fix `predict_batch` in `TransformersReader` for single nested Document list (#3748 ) * Fix restoring of list structure * Add tests	2022-12-29 11:48:18 +01:00
Stefano Fiorucci	136928714c	refactor: remove deprecated parameters from `Summarizer` (#3740 ) * remove deprecated parameters * remove deprecation/removal test	2022-12-29 15:37:47 +05:30
Mayank Jobanputra	76a16807d5	fix: Fixed local reader model loading (#3663 ) * Fixed local loading issue	2022-12-24 03:46:36 +05:30
Sebastian	756e0114e6	refactor: Remove duplicate code in TableReader (#3708 ) * Refactor table reader to use util functions to reduce code duplication. * Expanding the tests for the table reader * Adding types * Updating tests to work for RCIReader * Fix bug in RCIReader. Saving the wrong queries list. * Update _flatten_inputs to not change input variable * Remove duplicate code	2022-12-21 14:33:19 +01:00
Vladimir Blagojevic	9ebf164cfd	feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate (#3667 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2022-12-20 11:21:26 +01:00
Zoltan Fedor	e143f7cc36	Fixing broken BM25 support with Weaviate - fixes #3720 (#3723 ) * Fixing broken BM25 support with Weaviate - fixes #3720 Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit. Please see more under issue #3720. * Fixing mypy issue - method signature wasn't matching the base class * Mypy related test fix Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime. I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests. * Adding a note regarding an upcomming fix in Weaviate v1.17.0 * Apply suggestions from code review * revert * [EMPTY] Re-trigger CI	2022-12-19 17:24:46 +01:00
Vladimir Blagojevic	56803e5465	feat: Enable text-embedding-ada-002 for EmbeddingRetriever (#3721 ) * Enable text-embedding-ada-002 for EmbeddingRetriever * Easier to understand code, more unit tests	2022-12-19 17:06:48 +01:00
Stefano Fiorucci	5b9c661155	feat: add `index` parameter to `TfidfRetriever` (#3666 ) * first draft to add index param to tfidf * better mypy handling * Revert "better mypy handling" This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c. * new check in auto_fit * new check also in retrieve * better dict typings * new test and improvements to other test * remove unnecessary lambda * improve test * remove newline from openapi json * fix test * language fix Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 2 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 3 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 4 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 5 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * language fix 6 Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * explicit index value handling * fix test * better error messages Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-19 12:07:49 +01:00
tstadel	600dc2d611	refactor: filters type (#3682 ) * consolidate filters type * remove unnecessary optionals * fix mypy * fix pylint * fix pylint * move FilterType to schema * remove Optional from FilterType * move to Dict[str, Any] * Revert "move to Dict[str, Any]" This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5. * fix mypy * fix pylint * revert isort changes in elasticsearch * remove todos in milvus.py * remove todos in sql.py * add aggregate_labels tests * consolidate aggregate_labels tests * remove superfluous type todos * remove ALL superfluous #todos	2022-12-12 14:04:29 +01:00
Unai Garay Maestre	77cea8b140	feat: Adds all_terms_must_match parameter to BM25Retriever at runtime (#3627 ) * Adds all_terms_must_match implementation and tests * Adds all_terms_must_match as Optional Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com> * Avoid mypy error and follow pattern checking var is None * Mypy works ok on this file now * added mypy ignores to BaseRetriever * ignoring all overrides for this file * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retriever `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updates sparse retrieve_batch `all_terms_must_match` docstring Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * marked elasticsearch Signed-off-by: Unai Garay <unaigaraymaestre@gmail.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-12-08 17:18:43 +05:30
tstadel	c1c1c97bb2	feat: add query_by_embedding_batch (#3546 ) * add query_by_embedding_batch * fix mypy * fix pylint * add test * move query_by_embedding_batch to search_engine * fix and add tests * fix pylint * remove Retriever query logs * add test for multimodal batch retrieval * allow for np.ndarray	2022-12-08 08:28:43 +01:00
Sebastian	25bf95d47f	Update table reader tests to include checking the score of answers. (#3641 )	2022-12-07 07:30:49 -08:00
Mayank Jobanputra	95cf666a20	refactor: change MultiModal retriever to be of type DenseRetriever (#3598 ) * changed Multimodal retriever to be of type DenseRetriever * format fix * Pylint fix * Added embed_queries and tests	2022-11-28 19:24:22 +01:00

1 2 3

110 Commits