1204 Commits

Author SHA1 Message Date
bogdankostic
8c63e295f4
fix: Allow filtering on list fields in InMemoryDocumentStore with all operators (#5208)
* Add support for list fields

* Unskip tests
2023-06-29 12:10:39 +02:00
Massimiliano Pippi
6373e2ea66
refactor: prepare support to Elasticsearch 8 (#5226)
* make  a package

* Update haystack/document_stores/elasticsearch/es7.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* do not expose ES types from the package

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-06-29 11:06:20 +02:00
bogdankostic
ed1bad1155
fix: Use add_isolated_node_eval of eval_batch in run_batch (#5223)
* Fix isolated node eval in eval_batch

* Add unit test
2023-06-28 16:51:23 +02:00
Vladimir Blagojevic
bc86f57715
feat: BM25 retrieval for MemoryDocumentStore (#5151) 2023-06-27 17:42:23 +02:00
Massimiliano Pippi
c068e34954
Remove deprecated param return_table_cell (#5218)
* remove deprecated param

* Update haystack/nodes/reader/table.py

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>

* try

* remove unused functions and ignore mypy error

---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-27 16:14:29 +02:00
ZanSara
462f3a5c99
feat: globally disable progress bars (#5207)
* add SilenceableTqdm and update usage

* pylint

* rename module

* add tests
2023-06-27 11:45:17 +02:00
Vladimir Blagojevic
5ee393226d
fix: Support all SageMaker HF text generation models (other than Falcon) (#5205)
* Create SageMaker base class and two implementation subclasses
---------

Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-06-26 19:59:16 +02:00
bogdankostic
82291b56ad
fix: Send batches of query-doc pairs to inference_from_objects (#5125)
* Send batches of query-doc pairs to inference_from_objects

* Use absolute import path

* Add separate preprocessing_batch_size parameter
2023-06-26 14:26:26 +02:00
Vladimir Blagojevic
eb2255c0dd
Rename SageMakerInvocationLayer -> SageMakerHFTextGenerationInvocationLayer (#5204) 2023-06-26 11:03:30 +02:00
Stefano Fiorucci
25d5dedb46
Fix: FARMReader - Consider the max number of labels/answers during training (#5197)
* first draft

* improve it a bit

* unit tests

* PR review, improved tests

* PR review, improved tests 2
2023-06-26 10:14:21 +02:00
Sebastian
f1932492f1
feat: Add CohereRanker node using Cohere reranking endpoint (#5152)
* Started to add CohereRanker node

* Small refactoring of SentenceTransformersRanker node

* Started to add predict_batch method

* Simplified predict_batch code

* Added missing imports

* Undoing a change

* Fix mypy

* Adding unit tests using mocking

* Updated truncation warning message.

* Update doc strings

* Update to docs

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Updating docs to reflect PR discussion

* Update haystack/nodes/ranker/cohere.py

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-23 16:46:46 +02:00
Malte Pietsch
c9179ed0eb
feat: enable LLMs hosted via AWS SageMaker in PromptNode (#5155)
* Add SageMakerInvocationLayer
---------

Co-authored-by: oryx1729 <78848855+oryx1729@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-06-23 15:33:20 +02:00
ZanSara
31664627eb
feat: hard document length limit at max_chars_check (#5191)
* implement hard cut at max_chars_check

* regenerate ids

* black

* docstring

* black
2023-06-23 12:34:19 +02:00
ZanSara
36192eca72
feat: current_datetime shaper function (#5195)
* current_datetime shaper

* explicitly add current_datetime to the functions allowed in a prompt template
2023-06-23 10:33:34 +02:00
bogdankostic
612c5cd005
chore: Remove add_tool from ToolsManager (#5192)
* Remove add_tool from ToolsManager

* Fix tests
2023-06-23 09:26:06 +02:00
Sebastian
1602f3abdd
test: Adding unit tests to Ranker (#5167)
* adding unit tests for sentence transformers ranker

* Adding more unit tests

* Remove empty line

* Undo static method

* Revert change

* Updated indentation and added match message

* Remove unneeded paranthesis
2023-06-22 15:23:23 +02:00
Michael Feil
cfd703fa3e
fix: model_tokenizer in openai text completion tokenization details (#5104)
* fix: model_tokenizer

* Update test

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2023-06-22 14:23:19 +02:00
Stefano Fiorucci
637433841e
chore: remove deprecated Seq2SeqGenerator and RAGenerator (#5180)
* first draft of removal

* more removals

* don't download unused models
2023-06-21 16:38:45 +02:00
Sebastian
7a140c1524
feat: add ensure token limit for direct prompting of ChatGPT (#5166)
* Add support for prompt truncation when using chatgpt if direct prompting is used

* Update tests for test token limit for prompt node

* Update warning message to be correct

* Minor cleanup

* Mark back to integration

* Update count_openai_tokens_messages to reflect changes shown in tiktoken

* Use mocking to avoid request call

* Fix test to make it comply with unit test requirements

* Move tests to respective invocation layers

* Moved fixture to one spot
2023-06-21 15:41:28 +02:00
Vladimir Blagojevic
089187ac8b
fix: Check Agent's prompt template variables and prompt resolver parameters are aligned (#5163)
* Check Agent's prompt template parameters and prompt resolver parameters are aligned

* Lower the logger warning

* Automatically append transcript if needed

* Amend flaky test
2023-06-21 14:34:41 +02:00
Bilge Yücel
6a1b6b1ae3
feat: Update ConversationalAgent (#5065)
* feat: Update ConversationalAgent

* Add Tools
* Add test
* Change default params

* fix tests

* Fix circular import error
* Update conversational-agent prompt
* Add conversational-agent-without-tools to legacy list

* Add warning to add tools to conversational agent

* Add callable tools

* Add example script

* Fix linter errors

* Update ConversationalAgent depending on the existance of tools

* Initialize the base Agent with different arguments when there's tool
* Inject memory to the prompt in both cases, update prompts accordingly

* Override the add_tools method to prevent adding tools to ConversationalAgent without tools

* Update test

* Fix linter error

* Remove unused import

* Update docstrings and api reference

* Fix imports and doc string code snippet

* docstrings update

* Update conversational.py

* Mock PromptNode

* Prevent circular import error

* Add max_steps to the ConversationalAgent

* Update resolver description

* Add prompt_template as parameter

* Change docstring

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-20 13:09:21 +03:00
Shukri
916e8452f5
feat!: simplify weaviate auth (#5115)
* feat!: simplify weaviate auth

* docs: explain param precedence

* refactor: simplify _get_embedded_options
2023-06-19 15:46:58 +02:00
Ben Heckmann
1318ac5074
feat: Optional Content Moderation for OpenAI PromptNode & OpenAIAnswerGenerator (#5017)
* #4071 implemented optional content moderation for OpenAI PromptNode

* added two simple integration tests

* improved documentation & renamed _invoke method to _execute_openai_request

* added a flag to check_openai_policy_violation that will return a full dict of all text violations and their categories

* re-implemented the tests as unit tests & without use of the OpenAI APIs

* removed unused patch

* changed check_openai_policy_violation back to only return a bool

* fixed pylint and test error

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-06-19 13:27:11 +02:00
ZanSara
f52477d31b
fix: small improvement to pipeline v2 tests (#5153)
* add missing return

* improve test

* docstring
2023-06-16 12:07:00 +02:00
Vladimir Blagojevic
8d8de65492
Add AgentToolLogger, unit test, and example usage (#5087) 2023-06-15 08:43:20 +02:00
bogdankostic
7731713a1e
test: Add benchmark config files (#5093)
* Add config files

* Add top-k and batch size to configs

* Add batch size to configs

* Add batch size to configs

* Remove configs using 1m docs
2023-06-14 18:15:50 +02:00
Ben Heckmann
60e5d73424
fix: changing document scores (#5090)
* #4653 fix changing scores by returning new document objects from document store queries

* added integration test for InMemoryDocumentStore demonstrating the desired behavior

* Update test/document_stores/test_memory.py
2023-06-14 17:35:46 +02:00
Julian Risch
ce1c9c9ddb
fix: Relax ChatGPT model name check to support gpt-3.5-turbo-0613 (#5142)
* relax model name checking for chatgpt

* add unit tests
2023-06-14 09:53:00 +02:00
Julian Risch
4c8e0b9d4a
fix: PromptNode falls back to empty list of documents if none are provided but expected (#5132)
* add warning, default to empty docs list, tests

* pylint
2023-06-13 16:35:19 +02:00
Silvano Cerza
3b8992968d
test: Skip flaky PromptNode test (#5039)
* Skip flaky PromptNode test

* Add skip reason

* Update test/prompt/test_prompt_node.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-06-13 16:24:29 +02:00
ZanSara
65cdf36d72
chore: block all HTTP requests in CI (#5088) 2023-06-13 14:52:24 +02:00
ZanSara
3c71f0ae3d
chore: mark some unit tests under test/pipeline (#5124)
* mark some unit tests as such

* remove marker
2023-06-12 17:58:31 +02:00
ZanSara
49e037a055
fix: rename requests.py into requests_utils.py (#5099)
* requests.py -> requests_utils.py

* fix tests

* reimport requrests

* fix more tests

* review feedback
2023-06-12 12:40:21 +02:00
Vladimir Blagojevic
0cc9ce7522
fix: WebRetriever top_k is ignored in a pipeline (#5106)
* Initial changes

* Add WebSearch, WebRetriever top_k unit tests

* Add exact integration test that failed Tuana

* PR review
2023-06-09 10:42:37 +02:00
Julian Risch
d8a4f20379
feat: Consider prompt_node's default_prompt_template in agent (#5095)
* consider prompt_node's default_prompt_template in agent

* make test a unit test via mocking

* updated docstring
2023-06-08 13:42:28 +02:00
Vladimir Blagojevic
e3b069620b
feat: pass model parameters to HFLocalInvocationLayer via model_kwargs, enabling direct model usage (#4956)
* Simplify HFLocalInvocationLayer, move/add unit tests

* PR feedback

* Better pipeline invocation, add mocked tests

* Minor improvements

* Mock pipeline directly,  unit test updates

* PR feedback, change pytest type to integration

* Mock supports unit test

* add full stop

* PR feedback, improve unit tests

* Add mock_get_task fixture

* Further improve unit tests

* Minor unit test improvement

* Add unit tests, increase coverage

* Add unit tests, increase test coverage

* Small optimization, improve _ensure_token_limit unit test

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 13:34:45 +02:00
Silvano Cerza
a2156ee8fb
fix: Fix handling of streaming response in AnthropicClaudeInvocationLayer (#4993)
* Fix handling of streaming response in AnthropicClaudeInvocationLayer
---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 10:57:36 +02:00
bogdankostic
da1f245a84
feat: Add batch_size parameter and cast timeout_config value to tuple for WeaviateDocumentStore (#5079)
* Add batch_size parameter and cast timeout_config to tuple

* Add unit test

* Remove debug tqdm

* Remove debug tqdm introduced in #5063
2023-06-06 17:06:10 +02:00
Sebastian
1777b22fcb
fix: Ensure eval mode for farm and transformer models for predictions (#3791)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-06 13:06:30 +02:00
Michael Feil
6ea8ae01a2
feat: Allow setting custom api_base for OpenAI nodes (#5033)
* add changes for api_base

* format retriever

* Update haystack/nodes/retriever/dense.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/audio/whisper_transcriber.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/preview/components/audio/whisper_remote.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/answer_generator/openai.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test_retriever.py

* Update test_whisper_remote.py

* Update test_generator.py

* Update test_retriever.py

* reformat with black

* Update haystack/nodes/prompt/invocation_layer/chatgpt.py

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* Add unit tests

* apply docstring suggestions

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: michaelfeil <me@michaelfeil.eu>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-05 11:32:06 +02:00
bogdankostic
a9a49e2c0a
feat: Add batching for querying in ElasticsearchDocumentStore and OpenSearchDocumentStore (#5063)
* Include benchmark config in output

* Use queries from aggregated labels

* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore

* Fix mypy

* Use self.batch_size in write_documents

* Use 10_000 as default batch size

* Add unit tests for write documents
2023-06-01 18:47:24 +02:00
bogdankostic
c3e59914da
refactor: Delete outdated benchmark files (#5008) 2023-06-01 13:59:12 +02:00
bogdankostic
6774e0ae58
fix: Use queries from aggregated labels in benchmarks (#5054)
* Include benchmark config in output

* Use queries from aggregated labels
2023-06-01 10:49:54 +02:00
ZanSara
89de76d5fe
feat: move cli out from preview (#5055)
* move cli from preview

* readme

* review feedback

* test mocks & import paths

* import path
2023-05-31 18:34:14 +02:00
Silvano Cerza
3fd9e0fd89
feat: Add CLI prompt cache command (#5050)
* Add CLI prompt cache command

* Rename prompt cache to prompt fetch
2023-05-30 18:04:52 +02:00
ZanSara
6249e65bc8
feat: prompts caching from PromptHub (#5048)
* split up prompttemplate init

* caching

* docstring

* add platformdirs

* use user_data_dir

* fix tests

* add tests

* pylint

* mypy
2023-05-30 16:55:48 +02:00
Silvano Cerza
37518c8b8c
chore: Simplify DefaultPromptHandler logic and add tests (#4979)
* Simplify DefaultPromptHandler logic and add tests

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Remove commented code

* Split single unit test into multiple tests

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-05-29 12:13:32 +02:00
ZanSara
7e5fa0dd94
fix: Move check for default PromptTemplates in PromptTemplate itself (#5018)
* make prompttemplate load the defaults instead of promptnode

* add test

* fix tenacity decorator

* fix tests

* fix error handling

* mypy

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-27 18:05:05 +02:00
bogdankostic
b8ff1052d4
refactor: Adapt running benchmarks (#5007)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Adapt running benchmarks script

* Adapt README.md

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* minor readme update

* Create separate methods for checking if pipeline contains reader or retriever

* Fix reader pipeline case

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-26 18:48:11 +02:00
bogdankostic
5633446173
refactor: Add reader-retriever benchmark script (#5006)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* Remove unused line

* Create separate method for getting reader config

* Make use of get_reader_config

* Create separate method for retriever config
2023-05-26 13:54:52 +02:00