haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-12 08:03:50 +00:00

Author	SHA1	Message	Date
ZanSara	024332f98f	refactor: simplify registration of `PromptModelInvocationLayer` (#4339 ) * use __init_subclass__ and remove registering functions	2023-03-07 20:53:48 +01:00
Sebastian	7d5e7c089c	refactor: Use TableQuestionAnsweringPipeline from transformers (#4303 ) * Added changes from table-qa-pipeline * Moved classes around to make diff to main look nicer. * Cleaned things up. Removed option to return_no_answer (not needed), added docs and added integration marks. * Remove unneeded code * Added fix for test * Add check for document_ids in answer * Prevent passing of empty list to np.mean * Batching doesn't work with TableQAPipeline b/c of HF issue * Cleanup of table reader tests, added check for document ids. * Fixing pylint * More pylint * PR comments --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-03-07 11:46:50 +01:00
Daniel Bichuetti	af6efbdcb0	refactor: Allow flexible document id generation (#4326 )	2023-03-07 07:25:27 +01:00
Zoltan Fedor	4dea9db01e	feat: Report execution time for pipeline components in `_debug` (#4197 ) * Adding execution time to the debug output of pipeline components * Linting issue fix * [EMPTY] Re-trigger CI * fixed test --------- Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-07 04:45:31 +05:30
tstadel	19311119db	fix: EvalResult load migration (#4289 ) * fix evalresult load migration * handle none values correctly * better None check * improve logic and add test	2023-03-06 20:05:02 +01:00
ZanSara	c802305ccf	test: move tests on standard pipelines in `e2e/` (#4309 ) * move out standard pipelines e2e * fixing unit tests * add test data * feedback * pylint * black	2023-03-06 17:26:19 +01:00
Vladimir Blagojevic	348e7d2dfe	refactor: Separate PromptModelInvocationLayers in providers.py (#4327 ) * Refactor PromptNode, separate PromptModelInvocationLayers in providers.py	2023-03-06 16:34:59 +01:00
Daniel Bichuetti	1548c5ba0f	feat: Add Azure OpenAI embeddings support (#4332 ) * feate: add Azure OpenAI as embedding option * feat: Add Azure OpenAI embeddings support * refactor: check api key * refactor: better type checking for Azure * refactor: enable parallelism + separate and update tests * refactor: string reformat * refactor: explicit typing * refactor: update refs and remove unused code	2023-03-06 13:37:20 +01:00
Sebastian	1a42166978	fix: Prevent going past token limit in OpenAI calls in PromptNode (#4179 ) * Refactoring to remove duplicate code when using OpenAI API * Adding docstrings * Fix mypy issue * Moved retry mechanism to openai_request function in openai_utils * Migrate OpenAI embedding encoder to use the openai_request util function. * Adding docstrings. * pylint import errors * More pylint import errors * Move construction of headers into openai_request and api_key as input variable. * Made _openai_text_completion_tokenization_details so can be resued in PromptNode and OpenAIAnswerGenerator * Add prompt truncation to the PromptNode. * Removed commented out test. * Bump version of tiktoken to 0.2.0 so we can use MODEL_TO_ENCODING to automatically determine correct tokenizer for the requested model * Change one method back to public * Fixed bug in token length truncation. Included answer length into truncation amount. Moved truncation higher up to PromptNode level. * Pylint error * Improved warning message * Added _ensure_token_limit for HFLocalInvocationLayer. Had to remove max_length from base PromptModelInvocationLayer to ensure that max_length has a default value. * Adding tests * Expanded on doc strings * Updated tests * Update docstrings * Update tests, and go back to how USE_TIKTOKEN was used before. * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/_openai_encoder.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/openai_utils.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/openai_utils.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updated docstrings, and added integration marks * Remove comment * Update test * Fix test * Update test * Updated openai_request function to work with the azure api * Fixed error in _openai_encodery.py --------- Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-03-03 13:49:21 +01:00
Vladimir Blagojevic	79bf25aaea	feat: Add Azure as OpenAI endpoint (#4170 ) * Add Azure as OpenAI endpoint --------- Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>	2023-03-02 09:55:09 +01:00
Daniel Bichuetti	7c49fffc71	feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation (#4226 ) * refactor: isolate PDF converters * refactor: remove xpdf dependency and fix tests * refactor: add min. version * feat: enable multiprocessing and add tests * fix: remove unused imports * fix: regression when moved code * refactor: use itertools * fix: mypy claims * refactor: double tool support * refactor: add fallback to xpdf * refactor: black formatting * refactor: make superclass signature compatible * refactor: complete removal of xPdf * refactor: regroup Haystack imports and fix regression * refactor: remove original declaration * docs: fix docstrings * tests: add [pdf] to [all] * refactor: remove redundant checks, avoid extra processes * refactor: add deprecation warning * refactor: add pytest mark * tests: change PDF test file * fix: correct pytest mark * refactor: deprecate parameter and add new * tests: change pdf sample * Add minor lg changes to docstrings * Fix default value in doc strings * Update test/nodes/test_file_converter.py Co-authored-by: bogdankostic <bogdankostic@web.de> * tests: fix page count * refactor: add imported function * refactor: change default value * tests: change parameters and fix typo * Unify sort_by_position parameter names --------- Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-03-01 22:34:38 +01:00
ZanSara	ae04ce3c6a	test: mock all Summarizer tests and move a few into e2e (#4299 ) * stub e2e folders * simplify pipeline test * mocking * unit tests fixed * clean up e2e * pipeline tests work * pylint * leftover * small fix from #2994 and additional tests * review feedback * change summaries * black * revert models and summaries	2023-03-01 17:30:55 +01:00
ZanSara	165a0a5faa	test: mock all `Translator` tests and move one to `e2e` (#4290 ) * mock all translator tests and move one to e2e * typo * extract pipeline tests using translator * remove duplicate test * move generator test in e2e * Update e2e/pipelines/test_extractive_qa.py * pytest.mark.unit * black * remove model name as well * remove unused fixture * rename original and improve pipeline tests * fixes * pylint	2023-03-01 14:52:05 +01:00
Stefano Fiorucci	e8f9b1b65d	test: replace `ElasticsearchDS` with `InMemoryDS` when it makes sense; support `scale_score` in `InMemoryDS` (#4283 ) * replace elasticds with imds - first draft * fix * fix tests and implement scale_score in imds bm25 * add docstrings for scale_score	2023-03-01 11:35:10 +01:00
Malte Pietsch	2a1d73e16d	refactor: Make extraction of "Tool" and "Tool input" for Agent more robust and user-friendly (#4269 ) * adjust [] in prompt template. Add error+docs for Tool name. * fix test * update error message	2023-02-28 20:01:34 +01:00
Massimiliano Pippi	c3a38a59c0	Update test_prompt_node.py (#4281 )	2023-02-28 09:37:40 +01:00
Julian Risch	662441a62b	fix: FARMReader produces Answers with negative start and end position (#4248 )	2023-02-28 09:27:42 +01:00
Sebastian	040d806b42	test: Added integration test for using EntityExtractor in query pipeline (#4117 ) * Added new test for using EntityExtractor in query node and made some fixtures to reduce code duplication. * Reuse ner_node fixture * Added pytest unit markings and swapped over to in memory doc store. * Change to integration tests	2023-02-28 09:20:44 +01:00
Massimiliano Pippi	4b8d195288	refact: mark unit tests under the `test/nodes/*` path (#4235 ) document merger * mark unit tests * revert	2023-02-27 15:00:19 +01:00
Sebastian	efe46b1214	Fix: Allow `torch_dtype="auto"` in PromptNode (#4166 ) * Fix for allowing torch_dtype="auto" * Fix to logic of torch_dtype detection * separate test for dtype	2023-02-27 09:59:27 +01:00
Silvano Cerza	4a93517eb4	test: Fix deprecation fixture (#4219 ) * Fix deprecation fixture * Update docstring * Update docstring --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-02-27 09:55:03 +01:00
ZanSara	13c4ff1b52	refactor: remove direct logging without a logger (#4253 ) * remove direct logging without a logger * add custom pylint checker * add test * pylint * improve checker message * mypy * remove test * add checker for basicConfig * more logging missed * ignore basicConfig * move out logger * move out statement * remove logging configuration	2023-02-23 20:42:42 +01:00
Stefano Fiorucci	5e85f33bd3	refactor: Remove deprecated nodes `EvalDocuments` and `EvalAnswers` (#4194 ) * remove deprecated classed and update test * remove deprecated classed and update test * remove unused code * remove unused import * remove empty evaluator node * unused import :-) * move sas to metrics	2023-02-23 15:26:17 +01:00
Massimiliano Pippi	722dead1b2	fix agents tests (#4237 )	2023-02-23 13:03:45 +01:00
Massimiliano Pippi	764eaa035f	skip summarizer tests to reduce pressure (#4241 )	2023-02-23 09:50:24 +01:00
ZanSara	f816efa50c	feat: reduce and focus telemetry (#4087 ) * simplified telemetry and docker containers detection * pylint * mypy * mypy * Add new credentials and metadata * remove prints * mypy * remove comment * simplify inout len measurement * black * removed old telemetry, to revert * reintroduce env function * reintroduce old telemetry * fix telemetry selection * telemetry for promptnode * telemetry for some training methods * telemetry for eval and distillation * mypy & pylint * review * Update lg * mypy * improve docstrings * pylint * mypy * fix test * linting * remove old tests --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-22 19:02:47 +01:00
Daniel Bichuetti	e0b0fe1bc3	feat!: Increase Crawler standardization regarding Pipelines (#4122 ) * feat!(Crawler): Integrate Crawler in the Pipeline. +Output Documents +Optional file saving +Optional Document meta about file path * refactor: add Optional decl. * chore: dummy commit * chore: dummy commit * refactor: improve overwrite flow * refactor: change custom file path meta logic + add test * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update haystack/nodes/connector/crawler.py Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-02-22 17:34:19 +01:00
tstadel	32b2abf9d5	fix: add option to not override results by `Shaper` (#4231 ) * add option to shaper and support answers * remove publish restrictions on outputs * support list	2023-02-22 14:36:58 +01:00
Massimiliano Pippi	262c9771f4	relax test assertion (#4229 )	2023-02-22 12:37:09 +01:00
Massimiliano Pippi	40f772a9b0	refact: move the first batch of unit tests into the proper job (#4216 ) * move the first batch of unit tests into the proper job * leftover	2023-02-21 17:00:02 +01:00
Julian Risch	5ce7a404ac	feat: Add Agent (#4148 ) * initial Agent implementation * mypy and pylint fixes * add missing ABC import * improved prompt template * refactor and shorten run method * refactor and shorten run method * add tests for extracting * fix mixed up tool_input/observation & make tests more robust * fix bug with max_iterations and update prompt template * allow setting prompt_template in Agent init * remove example yml for agent * add final prediction to transcript * add transcript to errors and accept PromptTemplate in init * simplify if else to elif Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * add checks for max_iter<2 and empty list returned by prompt node --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-02-21 14:27:40 +01:00
Sebastian	bde01cbf1f	Checking if output keys and output_values are same length and fix bug in storing output keys (#4223 )	2023-02-21 13:36:15 +01:00
Sebastian	2bedb80ba5	Fix for custom template in OpenAIAnswerGenerator (#4220 )	2023-02-21 13:35:17 +01:00
Bijay Gurung	d4b822646e	feat: Add JsonConverter node (#4130 ) * Add JsonConverter node * Update language * JsonConverter: Remove id_hash_keys overwrite when it's None Also, changes in docstring based on review * Update docstring for JsonConverter --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>	2023-02-21 09:23:42 +01:00
bogdankostic	18e7b8399b	refactor: Remove `id_hash_keys` parameter in `from_dict` method (#4207 ) * Remove id_hash_keys parameter in from_dict method * Remove unused import * Adapt `from_dict` of `SpeechDocument` * Revert "Adapt `from_dict` of `SpeechDocument`" This reverts commit 309cbeb7fbb3094c43be76d9e431db9391913144. * Adapt `from_dict` of `SpeechDocument`	2023-02-20 17:37:35 +01:00
tstadel	14578aa54f	feat: add `top_k` to `PromptNode` (#4159 ) * add top_k to PromptNode * fix OpenAI * fix openai test	2023-02-20 14:51:45 +01:00
Sebastian	d129598203	Prompt node/run batch (#4072 ) * Starting to implement first pass at run_batch * Started to add _flatten_input function * First pass at run_batch method. * Fixed bug * Adding tests for run_batch * Update doc strings * Pylint and mypy * Pylint * Fixing mypy * Restructurig of run_batch tests * Add minor lg updates * Adding more tests * Update dev comments and call static method differently * Fixed the setting of output variable * Set output_variable in __init__ of PromptNode * Make a one-liner --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-20 11:58:13 +01:00
Massimiliano Pippi	83d615a32b	feat: include testing facilities into haystack package (#4182 )	2023-02-17 19:38:03 +01:00
bogdankostic	7eeb3e07bf	feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850 ) * Add IVF and Product Quantization support for OpenSearchDocumentStore * Remove unused import statement * Fix mypy * Adapt doc strings and error messages to account for PQ * Adapt validation of indices * Adapt existing tests * Fix pylint * Add tests * Update lg * Adapt based on PR review comments * Fix Pylint * Adapt based on PR review * Add request_timeout * Adapt based on PR review * Adapt based on PR review * Adapt tests * Pin tenacity * Unpin tenacity * Adapt based on PR comments * Add match to tests --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-17 10:28:36 +01:00
Daniel Bichuetti	5187cc1801	refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128 ) * fix: fix audio tests + unbound some dependencies * fix: update for Python 3.8 * refactor: change numpy assertion * feat: add voice recog. support on audio tests * fix: fix var assignement * chore: dummy commit * fix: fix sndfile error * refactor: change skip reason * refactor: hardcode variable * refactor: unpin numpy * fix: pin numpy only for audio	2023-02-16 22:12:17 +05:30
Massimiliano Pippi	ec72dd73fc	refactor: complete the document stores test refactoring (#4125 ) * add e2e tests * move tests to their own module * add e2e workflow * pylint * remove from job * fix index field name * skip test on sql * removed unused code * fix embedding tests * adjust test for pinecone * adjust assertions to the new documents * bad copypasta * test * fix tests * fix tests * fix test * fix tests * pylint * update milvus version * remove debug * move graphdb tests under e2e	2023-02-16 09:43:25 +01:00
Sebastian	9a26942952	feat: Add model_kwargs option to PromptNode (#4151 ) * Add input option to PromptNode to allow the passing of default kwargs * Add yaml test for model_kwargs parameter	2023-02-15 18:46:26 +01:00
Stefano Fiorucci	24405f851c	refactor: `InMemoryDocumentStore` - manage documents without embedding & fix mypy errors (#4113 ) * refactoring and test * try to replace error with warning * more expressive and robust get_scores methods * make get_scores methods internal	2023-02-14 17:43:11 +01:00
Sebastian	75ef959678	feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode (#4038 ) * added instruction_prompt and update defaults * Change back max_tokens * Code formatting * Starting to update instruction_prompt to be a PromptTemplate * Using PromptTemplate in OpenAIAnswerGenerator * Removed hardcoded value * pylint and make examples and examples_context optional prompt parameters * Added new test for when prompt length goes past max token limit * Improve doc strings. * Make "text-davinci-003" the new default model * Renaming variable to prompt_template and name to question-answering-with-examples * Reduced repetitive code. * Added some comments to explain key logic for future debuggers * Update docs for max_tokens and increase defaul * Updating variable name to prompt_template and docs. * Updated test and handled Answer case where no documents are used. * Slight update to docs. * Adding more doc strings * lg updates * Blackify --------- Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-12 00:08:07 +01:00
Vladimir Blagojevic	d839b9314f	Update PromptTemplate tests (#4131 )	2023-02-10 15:24:01 +01:00
bogdankostic	05950719ba	fix: Deduplicate same Documents in isolated evaluation of Reader (#4114 ) * Deduplicate same Documents in one MultiLabel * Add tests * Update label * Update label * Update test * Update test * Revert change to check CI * Revert reversion * Use deepcopy * Update tests	2023-02-10 13:55:14 +01:00
Jack Butler	e6b6f70ae2	fix: Fix `TableTextRetriever` for input consisting of tables only (#4048 ) * fix: update kwargs for TriAdaptiveModel * fix: squeeze batch for TTR inference * test: add test for ttr + dataframe case * test: update and reorganise ttr tests * refactor: make triadaptive model handle shapes * refactor: remove duplicate reshaping * refactor: rename test with duplicate name * fix: add device assignment back to TTR * fix: remove duplicated vars in test --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-02-09 11:38:16 +01:00
bogdankostic	986472c26f	feat: Add BM25 support for tables in InMemoryDocumentStore (#4090 ) * Add BM25 support for tables in InMemoryDocumentStore * Add table type to query method * Fix import order * Adapt tests	2023-02-09 10:47:35 +01:00
Silvano Cerza	274746db07	style: Update black (#4101 ) * Update black version * Format file with new black style * Update black pre-commit hook version	2023-02-08 15:34:43 +01:00
Sebastian	1bbf10a376	Remove double batching in retrieve_batch (#4014 ) * Removed double batching around embed_queries * Add back tests for retrieve_batch for dpr and embedding retrievers * Updated table-text-retriever to not double batch * Fixing pylint * Update to test * Remove code breaking test * Updating dev comment to be clearer	2023-02-08 14:39:20 +01:00

... 17 18 19 20 21 ...

1524 Commits