haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-19 06:52:56 +00:00

Author	SHA1	Message	Date
Stefano Fiorucci	7d17ca7391	add DocumentLanguageClassifier API (#4401 )	2023-03-14 09:12:03 +01:00
Vladimir Blagojevic	98256ecf57	Add Whisper node (#4335 ) * Add Whisper node * Add support for audio path, improve tests * Add docs * Improve tests	2023-03-13 16:17:07 +01:00
Daniel Bichuetti	28724e2e25	feat: add automatic OCR detection mechanism and improve performance (#4329 ) * feat: add automatic OCR detection mechanism and improve performance * refactor: add error message * refactor: ignore pdftoppm bad typing * refactor: add Tesseract install. docstrings * fix: check if OCR var. assigned on mp * tests: add path to windows/linux tests * tests: add tessdata path * tests: include matrix ref. * tests: custom Tesseract matrix install * refactor: improve user guide * tests: fix macos path * tests: remove brew formulae version * fix: macos paths * tests: fix macos path * tests: add Tesseract to Windows Path * tests: pytesseract path * tests: macos path * refactor: fix path message and remove extra path from tests * refactor: raise exception when path not found * refactor: expression simplification Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * refactor: check ocr parameter * tests: mark as integration * tests: mock deprecation warning * refactor: simplify code Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * refactor: change deprecation test Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * refactor: add unit patch * refactor: black formatting --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-13 20:19:22 +05:30
ZanSara	fd3f3143d4	feat: `LanguageClassifier` (#2994 ) * add lanaguage classifier node * Fix a few bugs and general code style * whitespace * first draft and refactoring * draft of classes separation * improve base class * fix inivisible character; add some tests * fix and more tests * more docs and tests * move __init__ to base * add transformers node; improve tests * incorporate feedback; little fix to other node * labels_to_languages mapping * better docstrings * use logger instead of logging --------- Co-authored-by: Stanislav Zamecnik <stanislav.zamecnik@telekom.com> Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> Co-authored-by: stazam <zamecnik.stanislav@gmail.com>	2023-03-13 10:30:03 +01:00
Mahipal Singh Rathore	405aee0cfa	Update table.py (#4376 ) Answer should be checked if it is not none before adding id to it	2023-03-13 10:27:59 +01:00
ZanSara	8ea7ba3a94	proposal: drop `BaseComponent` and re-implement `Pipeline` (#4284 ) * draft proposal * pr number * reminder for an agent pipeline example * proposal number * add real query pipeline * add paragraph on validation * wording * add_store * decorator * add rollout process and parameter's hierarchy examples * rename project into application * feedback from the meeting * defer evaluation to another proposal * smaller changes * remove applications for now * u-turn on pipeline.connect() * typo * connect_from/to * update with Malte's feedback	2023-03-13 10:05:59 +01:00
Vladimir Blagojevic	95a48c6c9d	refactor: Simplify agent and tool interaction (#4362 ) * Simplify agent and tool interaction	2023-03-10 18:07:44 +01:00
Stefano Fiorucci	444a3116c4	docs: `TransformersImageToText`- inform about supported models, better exception handling (#4310 ) * better docs, exception handling and tests * Update lg * fix little error --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-03-09 15:35:17 +01:00
Mayank Jobanputra	39a20c37fd	fix: hf-tiny-roberta model loading from disk and mypy errors (#4363 ) * Fix mypy failures * Fix try 1 hf model on windows * Fix try 2 hf model on windows --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-03-09 18:06:09 +05:30
Vítor Bernardes	95851b82fb	fix: Fix `print_answers` for output of query `run_batch` (#4273 ) * fix: Fix `print_answers` for output of query `run_batch` (#4255) * fix: print "Answers" label even with no query list Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * test: add unit tests for `print_answers` on `run`, `run_batch` output (#4255) --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-03-09 12:10:50 +01:00
bogdankostic	e3503a92c9	build: Use `uvicorn` instead of `gunicorn` as server in REST API's Dockerfile (#4304 ) * Use uvicorn instead of gunicorn as server * Added comments and changed service names * comments improvised --------- Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-09 01:46:07 +05:30
Stefano Fiorucci	f90ffb6851	increase MetaDocumentORM value length (#4333 )	2023-03-08 03:15:27 +05:30
Bilge Yücel	9198d5ec42	chore: add `topic:promptnode` label (#4347 )	2023-03-07 21:23:40 +01:00
ZanSara	024332f98f	refactor: simplify registration of `PromptModelInvocationLayer` (#4339 ) * use __init_subclass__ and remove registering functions	2023-03-07 20:53:48 +01:00
Sebastian	7d5e7c089c	refactor: Use TableQuestionAnsweringPipeline from transformers (#4303 ) * Added changes from table-qa-pipeline * Moved classes around to make diff to main look nicer. * Cleaned things up. Removed option to return_no_answer (not needed), added docs and added integration marks. * Remove unneeded code * Added fix for test * Add check for document_ids in answer * Prevent passing of empty list to np.mean * Batching doesn't work with TableQAPipeline b/c of HF issue * Cleanup of table reader tests, added check for document ids. * Fixing pylint * More pylint * PR comments --------- Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-03-07 11:46:50 +01:00
tstadel	d096f03230	proposal: Shapers in Prompt Templates (#4172 ) * add proposal * Update 0000-shaper-in-prompt-template.md * rename proposal file * update proposal according to feedback * add clarification about the number of prompts generated * add section about parsing logic * Revert "add section about parsing logic" This reverts commit 904713558706206637eefe1579420d89663f58b8. * add section about parsing logic * fix typo * improved the detailed design section * fix code section * chore formatting * chore formatting * updated adoption strategy * final typo and expression changes	2023-03-07 09:52:18 +01:00
Tuana Çelik	8cd8ff6cbb	Update README.md (#4340 )	2023-03-07 08:34:21 +01:00
Daniel Bichuetti	af6efbdcb0	refactor: Allow flexible document id generation (#4326 )	2023-03-07 07:25:27 +01:00
Zoltan Fedor	4dea9db01e	feat: Report execution time for pipeline components in `_debug` (#4197 ) * Adding execution time to the debug output of pipeline components * Linting issue fix * [EMPTY] Re-trigger CI * fixed test --------- Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-07 04:45:31 +05:30
tstadel	19311119db	fix: EvalResult load migration (#4289 ) * fix evalresult load migration * handle none values correctly * better None check * improve logic and add test	2023-03-06 20:05:02 +01:00
Silvano Cerza	9253990bdf	Add workflow to push CI metrics to Datadog (#4336 )	2023-03-06 18:02:24 +01:00
ZanSara	c802305ccf	test: move tests on standard pipelines in `e2e/` (#4309 ) * move out standard pipelines e2e * fixing unit tests * add test data * feedback * pylint * black	2023-03-06 17:26:19 +01:00
Vladimir Blagojevic	348e7d2dfe	refactor: Separate PromptModelInvocationLayers in providers.py (#4327 ) * Refactor PromptNode, separate PromptModelInvocationLayers in providers.py	2023-03-06 16:34:59 +01:00
Daniel Bichuetti	1548c5ba0f	feat: Add Azure OpenAI embeddings support (#4332 ) * feate: add Azure OpenAI as embedding option * feat: Add Azure OpenAI embeddings support * refactor: check api key * refactor: better type checking for Azure * refactor: enable parallelism + separate and update tests * refactor: string reformat * refactor: explicit typing * refactor: update refs and remove unused code	2023-03-06 13:37:20 +01:00
Daniel Bichuetti	c7dddfeaea	chore: add intelijus (#4330 )	2023-03-06 13:12:04 +01:00
Sebastian	1a42166978	fix: Prevent going past token limit in OpenAI calls in PromptNode (#4179 ) * Refactoring to remove duplicate code when using OpenAI API * Adding docstrings * Fix mypy issue * Moved retry mechanism to openai_request function in openai_utils * Migrate OpenAI embedding encoder to use the openai_request util function. * Adding docstrings. * pylint import errors * More pylint import errors * Move construction of headers into openai_request and api_key as input variable. * Made _openai_text_completion_tokenization_details so can be resued in PromptNode and OpenAIAnswerGenerator * Add prompt truncation to the PromptNode. * Removed commented out test. * Bump version of tiktoken to 0.2.0 so we can use MODEL_TO_ENCODING to automatically determine correct tokenizer for the requested model * Change one method back to public * Fixed bug in token length truncation. Included answer length into truncation amount. Moved truncation higher up to PromptNode level. * Pylint error * Improved warning message * Added _ensure_token_limit for HFLocalInvocationLayer. Had to remove max_length from base PromptModelInvocationLayer to ensure that max_length has a default value. * Adding tests * Expanded on doc strings * Updated tests * Update docstrings * Update tests, and go back to how USE_TIKTOKEN was used before. * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/prompt/prompt_node.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/_openai_encoder.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/openai_utils.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/utils/openai_utils.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Updated docstrings, and added integration marks * Remove comment * Update test * Fix test * Update test * Updated openai_request function to work with the azure api * Fixed error in _openai_encodery.py --------- Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-03-03 13:49:21 +01:00
Silvano Cerza	18e83b3ed4	Pin requests-cache test dependency to <1.0.0 (#4325 )	2023-03-03 12:47:15 +01:00
bogdankostic	f33829fabf	Remove xpdf dependencies (#4314 )	2023-03-02 11:12:03 +01:00
Vladimir Blagojevic	79bf25aaea	feat: Add Azure as OpenAI endpoint (#4170 ) * Add Azure as OpenAI endpoint --------- Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>	2023-03-02 09:55:09 +01:00
Daniel Bichuetti	7c49fffc71	feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation (#4226 ) * refactor: isolate PDF converters * refactor: remove xpdf dependency and fix tests * refactor: add min. version * feat: enable multiprocessing and add tests * fix: remove unused imports * fix: regression when moved code * refactor: use itertools * fix: mypy claims * refactor: double tool support * refactor: add fallback to xpdf * refactor: black formatting * refactor: make superclass signature compatible * refactor: complete removal of xPdf * refactor: regroup Haystack imports and fix regression * refactor: remove original declaration * docs: fix docstrings * tests: add [pdf] to [all] * refactor: remove redundant checks, avoid extra processes * refactor: add deprecation warning * refactor: add pytest mark * tests: change PDF test file * fix: correct pytest mark * refactor: deprecate parameter and add new * tests: change pdf sample * Add minor lg changes to docstrings * Fix default value in doc strings * Update test/nodes/test_file_converter.py Co-authored-by: bogdankostic <bogdankostic@web.de> * tests: fix page count * refactor: add imported function * refactor: change default value * tests: change parameters and fix typo * Unify sort_by_position parameter names --------- Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-03-01 22:34:38 +01:00
Silvano Cerza	90da7bf4f8	Fix docstring-labeler.yml workflow (#4307 )	2023-03-01 17:49:04 +01:00
ZanSara	ae04ce3c6a	test: mock all Summarizer tests and move a few into e2e (#4299 ) * stub e2e folders * simplify pipeline test * mocking * unit tests fixed * clean up e2e * pipeline tests work * pylint * leftover * small fix from #2994 and additional tests * review feedback * change summaries * black * revert models and summaries	2023-03-01 17:30:55 +01:00
bogdankostic	583d2d8244	Fix search path for Shaper API docs (#4306 )	2023-03-01 16:10:39 +01:00
ZanSara	165a0a5faa	test: mock all `Translator` tests and move one to `e2e` (#4290 ) * mock all translator tests and move one to e2e * typo * extract pipeline tests using translator * remove duplicate test * move generator test in e2e * Update e2e/pipelines/test_extractive_qa.py * pytest.mark.unit * black * remove model name as well * remove unused fixture * rename original and improve pipeline tests * fixes * pylint	2023-03-01 14:52:05 +01:00
Agnieszka Marzec	7e0f9715ba	Docs: Add shaper API (#4288 ) * Add shaper and update category id * Fix the category id * Update category	2023-03-01 14:02:47 +01:00
Stefano Fiorucci	e8f9b1b65d	test: replace `ElasticsearchDS` with `InMemoryDS` when it makes sense; support `scale_score` in `InMemoryDS` (#4283 ) * replace elasticds with imds - first draft * fix * fix tests and implement scale_score in imds bm25 * add docstrings for scale_score	2023-03-01 11:35:10 +01:00
Silvano Cerza	ee74421212	ci: Refactor docs config and generation (#4280 ) * Change docs yml category config * Update docs renderers to fetch categories from Readme.io * Update readme_sync.yml to handle new docs rendering * Remove unecessary script and related workflow step * Fix sys.exits	2023-03-01 09:51:02 +01:00
Silvano Cerza	6e241262ad	ci: Change docker_release.yml workflow to run after successful PyPi release (#4293 ) * Change docker_release.yml workflow to run after successful PyPi release * Add warning on name change in pypi_release.yml	2023-03-01 09:50:47 +01:00
tstadel	d1c9407a25	fix opensearch delete_index (#4295 )	2023-03-01 08:40:38 +01:00
Malte Pietsch	2a1d73e16d	refactor: Make extraction of "Tool" and "Tool input" for Agent more robust and user-friendly (#4269 ) * adjust [] in prompt template. Add error+docs for Tool name. * fix test * update error message	2023-02-28 20:01:34 +01:00
Massimiliano Pippi	c3a38a59c0	Update test_prompt_node.py (#4281 )	2023-02-28 09:37:40 +01:00
Julian Risch	662441a62b	fix: FARMReader produces Answers with negative start and end position (#4248 )	2023-02-28 09:27:42 +01:00
Sebastian	040d806b42	test: Added integration test for using EntityExtractor in query pipeline (#4117 ) * Added new test for using EntityExtractor in query node and made some fixtures to reduce code duplication. * Reuse ner_node fixture * Added pytest unit markings and swapped over to in memory doc store. * Change to integration tests	2023-02-28 09:20:44 +01:00
Silvano Cerza	5678bb6375	Parallellize Docker build job (#4268 )	2023-02-27 16:03:24 +01:00
Massimiliano Pippi	4b8d195288	refact: mark unit tests under the `test/nodes/*` path (#4235 ) document merger * mark unit tests * revert	2023-02-27 15:00:19 +01:00
Sebastian	efe46b1214	Fix: Allow `torch_dtype="auto"` in PromptNode (#4166 ) * Fix for allowing torch_dtype="auto" * Fix to logic of torch_dtype detection * separate test for dtype	2023-02-27 09:59:27 +01:00
Silvano Cerza	4a93517eb4	test: Fix deprecation fixture (#4219 ) * Fix deprecation fixture * Update docstring * Update docstring --------- Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-02-27 09:55:03 +01:00
Kshitij Pawar	3d3e9c9b32	Fix: Issue of failure to initialize input_converter in Seq2SeqGenerator when model_file_path is given as folder path on local disk after manual model download (#4213 ) * test * test documentation commit: * added original return statement for linting * removed empty lines * formatted code using black * made changes based on suggestions	2023-02-26 18:13:26 +01:00
Silvano Cerza	2c9e4c5ff9	Remove unnecessary operations in minor_version_release.yml (#4267 )	2023-02-24 14:29:42 +01:00
Silvano Cerza	280414e5c6	Fix OpenAPI specs upload (#4266 )	2023-02-24 10:50:59 +01:00

... 35 36 37 38 39 ...

3803 Commits