haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-09 09:08:19 +00:00

Author	SHA1	Message	Date
Silvano Cerza	4a93517eb4	test: Fix deprecation fixture (#4219 ) * Fix deprecation fixture * Update docstring * Update docstring --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-02-27 09:55:03 +01:00
ZanSara	13c4ff1b52	refactor: remove direct logging without a logger (#4253 ) * remove direct logging without a logger * add custom pylint checker * add test * pylint * improve checker message * mypy * remove test * add checker for basicConfig * more logging missed * ignore basicConfig * move out logger * move out statement * remove logging configuration	2023-02-23 20:42:42 +01:00
Stefano Fiorucci	5e85f33bd3	refactor: Remove deprecated nodes `EvalDocuments` and `EvalAnswers` (#4194 ) * remove deprecated classed and update test * remove deprecated classed and update test * remove unused code * remove unused import * remove empty evaluator node * unused import :-) * move sas to metrics	2023-02-23 15:26:17 +01:00
Massimiliano Pippi	722dead1b2	fix agents tests (#4237 )	2023-02-23 13:03:45 +01:00
Massimiliano Pippi	764eaa035f	skip summarizer tests to reduce pressure (#4241 )	2023-02-23 09:50:24 +01:00
ZanSara	f816efa50c	feat: reduce and focus telemetry (#4087 ) * simplified telemetry and docker containers detection * pylint * mypy * mypy * Add new credentials and metadata * remove prints * mypy * remove comment * simplify inout len measurement * black * removed old telemetry, to revert * reintroduce env function * reintroduce old telemetry * fix telemetry selection * telemetry for promptnode * telemetry for some training methods * telemetry for eval and distillation * mypy & pylint * review * Update lg * mypy * improve docstrings * pylint * mypy * fix test * linting * remove old tests --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-22 19:02:47 +01:00
Daniel Bichuetti	e0b0fe1bc3	feat!: Increase Crawler standardization regarding Pipelines (#4122 ) * feat!(Crawler): Integrate Crawler in the Pipeline. +Output Documents +Optional file saving +Optional Document meta about file path * refactor: add Optional decl. * chore: dummy commit * chore: dummy commit * refactor: improve overwrite flow * refactor: change custom file path meta logic + add test * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-02-22 17:34:19 +01:00
tstadel	32b2abf9d5	fix: add option to not override results by `Shaper` (#4231 ) * add option to shaper and support answers * remove publish restrictions on outputs * support list	2023-02-22 14:36:58 +01:00
Massimiliano Pippi	262c9771f4	relax test assertion (#4229 )	2023-02-22 12:37:09 +01:00
Massimiliano Pippi	40f772a9b0	refact: move the first batch of unit tests into the proper job (#4216 ) * move the first batch of unit tests into the proper job * leftover	2023-02-21 17:00:02 +01:00
Julian Risch	5ce7a404ac	feat: Add Agent (#4148 ) * initial Agent implementation * mypy and pylint fixes * add missing ABC import * improved prompt template * refactor and shorten run method * refactor and shorten run method * add tests for extracting * fix mixed up tool_input/observation & make tests more robust * fix bug with max_iterations and update prompt template * allow setting prompt_template in Agent init * remove example yml for agent * add final prediction to transcript * add transcript to errors and accept PromptTemplate in init * simplify if else to elif Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * add checks for max_iter<2 and empty list returned by prompt node --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-02-21 14:27:40 +01:00
Sebastian	bde01cbf1f	Checking if output keys and output_values are same length and fix bug in storing output keys (#4223 )	2023-02-21 13:36:15 +01:00
Sebastian	2bedb80ba5	Fix for custom template in OpenAIAnswerGenerator (#4220 )	2023-02-21 13:35:17 +01:00
Bijay Gurung	d4b822646e	feat: Add JsonConverter node (#4130 ) * Add JsonConverter node * Update language * JsonConverter: Remove id_hash_keys overwrite when it's None Also, changes in docstring based on review * Update docstring for JsonConverter --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>	2023-02-21 09:23:42 +01:00
bogdankostic	18e7b8399b	refactor: Remove `id_hash_keys` parameter in `from_dict` method (#4207 ) * Remove id_hash_keys parameter in from_dict method * Remove unused import * Adapt `from_dict` of `SpeechDocument` * Revert "Adapt `from_dict` of `SpeechDocument`" This reverts commit 309cbeb7fbb3094c43be76d9e431db9391913144. * Adapt `from_dict` of `SpeechDocument`	2023-02-20 17:37:35 +01:00
tstadel	14578aa54f	feat: add `top_k` to `PromptNode` (#4159 ) * add top_k to PromptNode * fix OpenAI * fix openai test	2023-02-20 14:51:45 +01:00
Sebastian	d129598203	Prompt node/run batch (#4072 ) * Starting to implement first pass at run_batch * Started to add _flatten_input function * First pass at run_batch method. * Fixed bug * Adding tests for run_batch * Update doc strings * Pylint and mypy * Pylint * Fixing mypy * Restructurig of run_batch tests * Add minor lg updates * Adding more tests * Update dev comments and call static method differently * Fixed the setting of output variable * Set output_variable in __init__ of PromptNode * Make a one-liner --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-20 11:58:13 +01:00
Massimiliano Pippi	83d615a32b	feat: include testing facilities into haystack package (#4182 )	2023-02-17 19:38:03 +01:00
bogdankostic	7eeb3e07bf	feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850 ) * Add IVF and Product Quantization support for OpenSearchDocumentStore * Remove unused import statement * Fix mypy * Adapt doc strings and error messages to account for PQ * Adapt validation of indices * Adapt existing tests * Fix pylint * Add tests * Update lg * Adapt based on PR review comments * Fix Pylint * Adapt based on PR review * Add request_timeout * Adapt based on PR review * Adapt based on PR review * Adapt tests * Pin tenacity * Unpin tenacity * Adapt based on PR comments * Add match to tests --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-17 10:28:36 +01:00
Daniel Bichuetti	5187cc1801	refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128 ) * fix: fix audio tests + unbound some dependencies * fix: update for Python 3.8 * refactor: change numpy assertion * feat: add voice recog. support on audio tests * fix: fix var assignement * chore: dummy commit * fix: fix sndfile error * refactor: change skip reason * refactor: hardcode variable * refactor: unpin numpy * fix: pin numpy only for audio	2023-02-16 22:12:17 +05:30
Massimiliano Pippi	ec72dd73fc	refactor: complete the document stores test refactoring (#4125 ) * add e2e tests * move tests to their own module * add e2e workflow * pylint * remove from job * fix index field name * skip test on sql * removed unused code * fix embedding tests * adjust test for pinecone * adjust assertions to the new documents * bad copypasta * test * fix tests * fix tests * fix test * fix tests * pylint * update milvus version * remove debug * move graphdb tests under e2e	2023-02-16 09:43:25 +01:00
Sebastian	9a26942952	feat: Add model_kwargs option to PromptNode (#4151 ) * Add input option to PromptNode to allow the passing of default kwargs * Add yaml test for model_kwargs parameter	2023-02-15 18:46:26 +01:00
Stefano Fiorucci	24405f851c	refactor: `InMemoryDocumentStore` - manage documents without embedding & fix mypy errors (#4113 ) * refactoring and test * try to replace error with warning * more expressive and robust get_scores methods * make get_scores methods internal	2023-02-14 17:43:11 +01:00
Sebastian	75ef959678	feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038 ) * added instruction_prompt and update defaults * Change back max_tokens * Code formatting * Starting to update instruction_prompt to be a PromptTemplate * Using PromptTemplate in OpenAIAnswerGenerator * Removed hardcoded value * pylint and make examples and examples_context optional prompt parameters * Added new test for when prompt length goes past max token limit * Improve doc strings. * Make "text-davinci-003" the new default model * Renaming variable to prompt_template and name to question-answering-with-examples * Reduced repetitive code. * Added some comments to explain key logic for future debuggers * Update docs for max_tokens and increase defaul * Updating variable name to prompt_template and docs. * Updated test and handled Answer case where no documents are used. * Slight update to docs. * Adding more doc strings * lg updates * Blackify --------- Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-12 00:08:07 +01:00
Vladimir Blagojevic	d839b9314f	Update PromptTemplate tests (#4131 )	2023-02-10 15:24:01 +01:00
bogdankostic	05950719ba	fix: Deduplicate same Documents in isolated evaluation of Reader (#4114 ) * Deduplicate same Documents in one MultiLabel * Add tests * Update label * Update label * Update test * Update test * Revert change to check CI * Revert reversion * Use deepcopy * Update tests	2023-02-10 13:55:14 +01:00
Jack Butler	e6b6f70ae2	fix: Fix `TableTextRetriever` for input consisting of tables only (#4048 ) * fix: update kwargs for TriAdaptiveModel * fix: squeeze batch for TTR inference * test: add test for ttr + dataframe case * test: update and reorganise ttr tests * refactor: make triadaptive model handle shapes * refactor: remove duplicate reshaping * refactor: rename test with duplicate name * fix: add device assignment back to TTR * fix: remove duplicated vars in test --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-02-09 11:38:16 +01:00
bogdankostic	986472c26f	feat: Add BM25 support for tables in InMemoryDocumentStore (#4090 ) * Add BM25 support for tables in InMemoryDocumentStore * Add table type to query method * Fix import order * Adapt tests	2023-02-09 10:47:35 +01:00
Silvano Cerza	274746db07	style: Update black (#4101 ) * Update black version * Format file with new black style * Update black pre-commit hook version	2023-02-08 15:34:43 +01:00
Sebastian	1bbf10a376	Remove double batching in retrieve_batch (#4014 ) * Removed double batching around embed_queries * Add back tests for retrieve_batch for dpr and embedding retrievers * Updated table-text-retriever to not double batch * Fixing pylint * Update to test * Remove code breaking test * Updating dev comment to be clearer	2023-02-08 14:39:20 +01:00
Sebastian	01d39df863	feat: Update allowed models to be used with Prompt Node (#4018 ) * Update allowed models to be used with Prompt Node * Added try except block around the config to skip over OpenAI models. * Fixing tests * Adding warning message * Adding test for different HF models that could be used in prompt node	2023-02-08 12:47:52 +01:00
Stefano Fiorucci	5c009c2a1a	feat: OpenAI - warn users if `max_tokens` is too short (#4094 ) * warn users if max_tokens is too short * skip test if not API KEY * add counters * correctly run precommit	2023-02-08 10:39:40 +01:00
tstadel	92c58cfda1	feat: Support multiple document_ids in Answer object (for generative QA) (#4062 ) * initial version without shapers * set document_ids for BaseGenerator * introduce question-answering-with-references template * better prompt * make PromptTemplate control output_variable * update schema * fix add_doc_meta_data_to_answer * Revert "fix add_doc_meta_data_to_answer" This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9. * fix add_doc_meta_data_to_answer * fix eval * fix pylint * fix pinecone * fix other tests * fix test * fix flaky test * Revert "fix flaky test" This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775. * adjust docstrings * make Label loading backward-compatible * fix Label backward compatibility for pinecone * fix Label backward compatibility for search engines * fix Label backward compatibility for deepset Cloud * fix tests * fix None issue * fix test_write_feedback * add tests for legacy label support * add document_id test for pinecone * reduce unnecessary contents * add comment to pinecone test	2023-02-08 08:37:22 +01:00
Vladimir Blagojevic	3273a2714d	fix: Add PromptTemplate __repr__ method (#4058 ) Co-authored-by: ZanSara <sarazanzo94@gmail.com>	2023-02-07 08:14:32 +01:00
Jack Butler	f006eded7d	fix: allow Biadaptive & Triadaptive to work with EarlyStopping (#4033 ) * fix: allow str when saving tri/bi-adaptive models * fix: make trainer model loading class-agnostic * test: add test for DPR with EarlyStopping * refactor: simplify model reloading via classmethod --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2023-02-03 11:13:18 +01:00
tstadel	9611b64ec5	fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885 ) * fix document retrieval metrics for all document_relevance_criteria * fix tests * fix eval_batch metrics * small refactorings * evaluate metrics on label level * document retrieval tests added * fix pylint * fix test * support file retrieval * add comment about threshold * rename test	2023-02-02 15:00:07 +01:00
ZanSara	9009a9ae58	feat: add `Shaper` (#3880 ) * Shaper initial version * Inital pydoc * Add more unit tests * Fix pydoc, expand Shaper pydoc with YAML example * Minor fix * Improve pydoc * More unit tests with prompt node * Describe Shaper functions in pydoc * More pydoc * Use pytest.raises instead of catching errors * Improve test_function_invocation_order unit test * pylint fixes * Improve run_batch handling * simpler version, initial stub * stubbing tests * promptnode compatibility * add tests * simplify * fix promptnode tests * pylint * mypy * fix corner case & mypy * mypy * review feedback * tests * Add lg updates * add rename * pylint * Add complex unit test with two PNs and ICMs in between (#3921) Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> * docstring * fix tests * add join_lists * add documents_to_strings * fix tests * allow lists of input values * doc review feedback * do not use locals() * Update with minor lg changes * fix corner case in ICM * fix merge * review feedback * answers conversions * mypy * add tests * generative answers * forgot to commit --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-01 18:36:13 +01:00
Zoltan Fedor	2b1849f525	fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 (#3898 ) * fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 * Add comments and refactoring todo note * Fix logging-fstring-interpolation pylint * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-01-31 09:33:47 +01:00
bogdankostic	1a8fe0031d	feat: Add `use_prefiltering` parameter to `DeepsetCloudDocumentStore` (#3969 ) * Add `use_prefiltering` parameter * Adapt doc string * Pass use_prefiltering via API to dC * Adapt doc string * Adapt test	2023-01-30 15:12:34 +01:00
Daniel Bichuetti	3009ac2988	feat: Add page range support to PDF converters. (#3965 ) * feat: add start and eng page to PDF converters * docs: add missing docstrings * refactor: change list set up, add docstrings and comment * fix: add missing parameter * tests: add page range basic test * tests: test correct page numbers * tests: remove OCR page range test Poppler and Tesseract not installed on CI fix: remove mobile change error	2023-01-30 14:09:22 +01:00
Sebastian	71de0524de	fix: fixed `InMemoryDocumentStore.get_embedding_count` to return correct number (#3980 ) * Fix the embedding count function of InMemoryDocumentStore * Adding some doc strings explaining how many docs with embeddings to expect.	2023-01-30 12:38:30 +01:00
hsm207	08ec059b14	refactor: use weaviate client to build BM25 query (#3939 ) * refactor: use weaviate client to build BM25 query * refactor: remove manual BM25 query building * refactor: apply BM25 to the content_field only * test: update weaviate BM25 retrieval test case update to account for lack of stemming --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-01-30 10:07:07 +01:00
Tuana Celik	93312138de	fix: removing code block in `MarkdownConverter` (#3960 ) * first attempt to add frontmatter of markdown to the metadata * remove bug fix * running black and pre-commit * moving the import line * adding a test * adding pydoc * fix to removing code blocks in markdown converter * adding a test * fixing a test * improving tests * adding language to code block	2023-01-27 15:25:54 +01:00
Tuana Celik	790e9acd3e	feat: add frontmatter to meta in `MarkdownConverter` (#3953 ) * first attempt to add frontmatter of markdown to the metadata * remove bug fix * running black and pre-commit * moving the import line * adding a test * adding pydoc	2023-01-26 17:15:02 +01:00
Massimiliano Pippi	52b195faf6	increase the timeout for testing (#3957 )	2023-01-26 16:04:43 +01:00
Vladimir Blagojevic	ec85207cf7	Remove __eq__ and __hash__ from PromptNode (#3923 )	2023-01-26 13:38:35 +01:00
Vladimir Blagojevic	b945eaeabd	PromptNode: expose output_variable, adjust unit tests (#3892 )	2023-01-26 11:01:11 +01:00
ZanSara	0e471d5e5a	fix: change model in distillation test (#3944 ) * change model * change layer count * move promptnode tests in integration * fix marker	2023-01-25 23:32:11 +05:30
Mayank Jobanputra	5c53b2bd4a	feat: adding secure loading of models by default for haystack (#3901 ) * adding secure loading of models by default * simplified set function * testing import effect correctly * added appropriate log line, adapted the test * change log string formatting Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * remove extra closing bracket ) Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-01-24 23:01:20 +05:30
Vladimir Blagojevic	4d8b1d0b22	refactor: Improve stop_words handling, add unit test cases (#3918 ) * Improve stop_words handling, add unit test cases * Update test/nodes/test_prompt_node.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-01-24 12:52:41 +01:00

... 11 12 13 14 15 ...

1204 Commits