993 Commits

Author SHA1 Message Date
Malte Pietsch
c9179ed0eb
feat: enable LLMs hosted via AWS SageMaker in PromptNode (#5155)
* Add SageMakerInvocationLayer
---------

Co-authored-by: oryx1729 <78848855+oryx1729@users.noreply.github.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-06-23 15:33:20 +02:00
ZanSara
31664627eb
feat: hard document length limit at max_chars_check (#5191)
* implement hard cut at max_chars_check

* regenerate ids

* black

* docstring

* black
2023-06-23 12:34:19 +02:00
ZanSara
36192eca72
feat: current_datetime shaper function (#5195)
* current_datetime shaper

* explicitly add current_datetime to the functions allowed in a prompt template
2023-06-23 10:33:34 +02:00
bogdankostic
612c5cd005
chore: Remove add_tool from ToolsManager (#5192)
* Remove add_tool from ToolsManager

* Fix tests
2023-06-23 09:26:06 +02:00
Sebastian
1602f3abdd
test: Adding unit tests to Ranker (#5167)
* adding unit tests for sentence transformers ranker

* Adding more unit tests

* Remove empty line

* Undo static method

* Revert change

* Updated indentation and added match message

* Remove unneeded paranthesis
2023-06-22 15:23:23 +02:00
Michael Feil
cfd703fa3e
fix: model_tokenizer in openai text completion tokenization details (#5104)
* fix: model_tokenizer

* Update test

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2023-06-22 14:23:19 +02:00
Stefano Fiorucci
637433841e
chore: remove deprecated Seq2SeqGenerator and RAGenerator (#5180)
* first draft of removal

* more removals

* don't download unused models
2023-06-21 16:38:45 +02:00
Sebastian
7a140c1524
feat: add ensure token limit for direct prompting of ChatGPT (#5166)
* Add support for prompt truncation when using chatgpt if direct prompting is used

* Update tests for test token limit for prompt node

* Update warning message to be correct

* Minor cleanup

* Mark back to integration

* Update count_openai_tokens_messages to reflect changes shown in tiktoken

* Use mocking to avoid request call

* Fix test to make it comply with unit test requirements

* Move tests to respective invocation layers

* Moved fixture to one spot
2023-06-21 15:41:28 +02:00
Vladimir Blagojevic
089187ac8b
fix: Check Agent's prompt template variables and prompt resolver parameters are aligned (#5163)
* Check Agent's prompt template parameters and prompt resolver parameters are aligned

* Lower the logger warning

* Automatically append transcript if needed

* Amend flaky test
2023-06-21 14:34:41 +02:00
Bilge Yücel
6a1b6b1ae3
feat: Update ConversationalAgent (#5065)
* feat: Update ConversationalAgent

* Add Tools
* Add test
* Change default params

* fix tests

* Fix circular import error
* Update conversational-agent prompt
* Add conversational-agent-without-tools to legacy list

* Add warning to add tools to conversational agent

* Add callable tools

* Add example script

* Fix linter errors

* Update ConversationalAgent depending on the existance of tools

* Initialize the base Agent with different arguments when there's tool
* Inject memory to the prompt in both cases, update prompts accordingly

* Override the add_tools method to prevent adding tools to ConversationalAgent without tools

* Update test

* Fix linter error

* Remove unused import

* Update docstrings and api reference

* Fix imports and doc string code snippet

* docstrings update

* Update conversational.py

* Mock PromptNode

* Prevent circular import error

* Add max_steps to the ConversationalAgent

* Update resolver description

* Add prompt_template as parameter

* Change docstring

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-20 13:09:21 +03:00
Shukri
916e8452f5
feat!: simplify weaviate auth (#5115)
* feat!: simplify weaviate auth

* docs: explain param precedence

* refactor: simplify _get_embedded_options
2023-06-19 15:46:58 +02:00
Ben Heckmann
1318ac5074
feat: Optional Content Moderation for OpenAI PromptNode & OpenAIAnswerGenerator (#5017)
* #4071 implemented optional content moderation for OpenAI PromptNode

* added two simple integration tests

* improved documentation & renamed _invoke method to _execute_openai_request

* added a flag to check_openai_policy_violation that will return a full dict of all text violations and their categories

* re-implemented the tests as unit tests & without use of the OpenAI APIs

* removed unused patch

* changed check_openai_policy_violation back to only return a bool

* fixed pylint and test error

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-06-19 13:27:11 +02:00
ZanSara
f52477d31b
fix: small improvement to pipeline v2 tests (#5153)
* add missing return

* improve test

* docstring
2023-06-16 12:07:00 +02:00
Vladimir Blagojevic
8d8de65492
Add AgentToolLogger, unit test, and example usage (#5087) 2023-06-15 08:43:20 +02:00
bogdankostic
7731713a1e
test: Add benchmark config files (#5093)
* Add config files

* Add top-k and batch size to configs

* Add batch size to configs

* Add batch size to configs

* Remove configs using 1m docs
2023-06-14 18:15:50 +02:00
Ben Heckmann
60e5d73424
fix: changing document scores (#5090)
* #4653 fix changing scores by returning new document objects from document store queries

* added integration test for InMemoryDocumentStore demonstrating the desired behavior

* Update test/document_stores/test_memory.py
2023-06-14 17:35:46 +02:00
Julian Risch
ce1c9c9ddb
fix: Relax ChatGPT model name check to support gpt-3.5-turbo-0613 (#5142)
* relax model name checking for chatgpt

* add unit tests
2023-06-14 09:53:00 +02:00
Julian Risch
4c8e0b9d4a
fix: PromptNode falls back to empty list of documents if none are provided but expected (#5132)
* add warning, default to empty docs list, tests

* pylint
2023-06-13 16:35:19 +02:00
Silvano Cerza
3b8992968d
test: Skip flaky PromptNode test (#5039)
* Skip flaky PromptNode test

* Add skip reason

* Update test/prompt/test_prompt_node.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-06-13 16:24:29 +02:00
ZanSara
65cdf36d72
chore: block all HTTP requests in CI (#5088) 2023-06-13 14:52:24 +02:00
ZanSara
3c71f0ae3d
chore: mark some unit tests under test/pipeline (#5124)
* mark some unit tests as such

* remove marker
2023-06-12 17:58:31 +02:00
ZanSara
49e037a055
fix: rename requests.py into requests_utils.py (#5099)
* requests.py -> requests_utils.py

* fix tests

* reimport requrests

* fix more tests

* review feedback
2023-06-12 12:40:21 +02:00
Vladimir Blagojevic
0cc9ce7522
fix: WebRetriever top_k is ignored in a pipeline (#5106)
* Initial changes

* Add WebSearch, WebRetriever top_k unit tests

* Add exact integration test that failed Tuana

* PR review
2023-06-09 10:42:37 +02:00
Julian Risch
d8a4f20379
feat: Consider prompt_node's default_prompt_template in agent (#5095)
* consider prompt_node's default_prompt_template in agent

* make test a unit test via mocking

* updated docstring
2023-06-08 13:42:28 +02:00
Vladimir Blagojevic
e3b069620b
feat: pass model parameters to HFLocalInvocationLayer via model_kwargs, enabling direct model usage (#4956)
* Simplify HFLocalInvocationLayer, move/add unit tests

* PR feedback

* Better pipeline invocation, add mocked tests

* Minor improvements

* Mock pipeline directly,  unit test updates

* PR feedback, change pytest type to integration

* Mock supports unit test

* add full stop

* PR feedback, improve unit tests

* Add mock_get_task fixture

* Further improve unit tests

* Minor unit test improvement

* Add unit tests, increase coverage

* Add unit tests, increase test coverage

* Small optimization, improve _ensure_token_limit unit test

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 13:34:45 +02:00
Silvano Cerza
a2156ee8fb
fix: Fix handling of streaming response in AnthropicClaudeInvocationLayer (#4993)
* Fix handling of streaming response in AnthropicClaudeInvocationLayer
---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 10:57:36 +02:00
bogdankostic
da1f245a84
feat: Add batch_size parameter and cast timeout_config value to tuple for WeaviateDocumentStore (#5079)
* Add batch_size parameter and cast timeout_config to tuple

* Add unit test

* Remove debug tqdm

* Remove debug tqdm introduced in #5063
2023-06-06 17:06:10 +02:00
Sebastian
1777b22fcb
fix: Ensure eval mode for farm and transformer models for predictions (#3791)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-06 13:06:30 +02:00
Michael Feil
6ea8ae01a2
feat: Allow setting custom api_base for OpenAI nodes (#5033)
* add changes for api_base

* format retriever

* Update haystack/nodes/retriever/dense.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/audio/whisper_transcriber.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/preview/components/audio/whisper_remote.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/answer_generator/openai.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test_retriever.py

* Update test_whisper_remote.py

* Update test_generator.py

* Update test_retriever.py

* reformat with black

* Update haystack/nodes/prompt/invocation_layer/chatgpt.py

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* Add unit tests

* apply docstring suggestions

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: michaelfeil <me@michaelfeil.eu>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-05 11:32:06 +02:00
bogdankostic
a9a49e2c0a
feat: Add batching for querying in ElasticsearchDocumentStore and OpenSearchDocumentStore (#5063)
* Include benchmark config in output

* Use queries from aggregated labels

* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore

* Fix mypy

* Use self.batch_size in write_documents

* Use 10_000 as default batch size

* Add unit tests for write documents
2023-06-01 18:47:24 +02:00
bogdankostic
c3e59914da
refactor: Delete outdated benchmark files (#5008) 2023-06-01 13:59:12 +02:00
bogdankostic
6774e0ae58
fix: Use queries from aggregated labels in benchmarks (#5054)
* Include benchmark config in output

* Use queries from aggregated labels
2023-06-01 10:49:54 +02:00
ZanSara
89de76d5fe
feat: move cli out from preview (#5055)
* move cli from preview

* readme

* review feedback

* test mocks & import paths

* import path
2023-05-31 18:34:14 +02:00
Silvano Cerza
3fd9e0fd89
feat: Add CLI prompt cache command (#5050)
* Add CLI prompt cache command

* Rename prompt cache to prompt fetch
2023-05-30 18:04:52 +02:00
ZanSara
6249e65bc8
feat: prompts caching from PromptHub (#5048)
* split up prompttemplate init

* caching

* docstring

* add platformdirs

* use user_data_dir

* fix tests

* add tests

* pylint

* mypy
2023-05-30 16:55:48 +02:00
Silvano Cerza
37518c8b8c
chore: Simplify DefaultPromptHandler logic and add tests (#4979)
* Simplify DefaultPromptHandler logic and add tests

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Remove commented code

* Split single unit test into multiple tests

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-05-29 12:13:32 +02:00
ZanSara
7e5fa0dd94
fix: Move check for default PromptTemplates in PromptTemplate itself (#5018)
* make prompttemplate load the defaults instead of promptnode

* add test

* fix tenacity decorator

* fix tests

* fix error handling

* mypy

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-27 18:05:05 +02:00
bogdankostic
b8ff1052d4
refactor: Adapt running benchmarks (#5007)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Adapt running benchmarks script

* Adapt README.md

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* minor readme update

* Create separate methods for checking if pipeline contains reader or retriever

* Fix reader pipeline case

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-26 18:48:11 +02:00
bogdankostic
5633446173
refactor: Add reader-retriever benchmark script (#5006)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* Remove unused line

* Create separate method for getting reader config

* Make use of get_reader_config

* Create separate method for retriever config
2023-05-26 13:54:52 +02:00
bogdankostic
796340e788
refactor: Adapt reader benchmarks (#5005) 2023-05-26 11:40:35 +02:00
bogdankostic
6e10fdab27
refactor: Adapt retriever benchmarks script (#5004)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory
2023-05-25 15:39:02 +02:00
bogdankostic
c5f0f820cf
refactor: Adapt benchmarking utils (#5003)
* Adapt benchmarking utils

* Adapt error message

* Adapt doc store launcher registry

* Revert "Adapt doc store launcher registry"

This reverts commit e034936363dde760d393fe00cac998a54a0f5152.
2023-05-25 11:19:46 +02:00
Massimiliano Pippi
929b8d1fb0
ci: run Elasticsearch 8.6 in compatibility mode (#3853)
* bump ES version in CI

disable ssl

wait for service to start

set env vars

do not use choco to install ES

re-enable jobs deps

skip test on windows CI because of OOM

allocate more memory for ES

uniform ES installation and use default heap size

skip tests causing OOM

increase job timeout

restore memory limit for ES8

* Use latest elasticsearch version
2023-05-24 18:53:54 +02:00
Silvano Cerza
56d033e7e7
Add back hardcoded default templates (#4998) 2023-05-24 16:50:11 +02:00
bogdankostic
b85bc44c00
Mock request from prompt hub (#5011) 2023-05-24 12:23:49 +02:00
Silvano Cerza
524d2cba36
Fix CohereInvocationLayer _ensure_token_limit not returning resized (#4978)
prompt
2023-05-23 17:58:01 +02:00
Massimiliano Pippi
68924161df
chore: remove deprecated node PDFToTextOCRConverter (#4982)
* remove deprecated node

* remove related test
2023-05-23 16:55:54 +02:00
ZanSara
949b1b63b3
PromptHub integration in PromptNode (#4879)
* initial integration

* upgrade of prompthub

* fix get_prompt_template

* feedback

* add prompthub-py to dependencies

* tests

* mypy

* stray changes

* review feedback

* missing init

* fix test

* move logic in prompttemplate

* linting

* bugfixes

* fix unit tests

* fix cache

* simplify prompttemplate init

* remove unused function

* removing wrong params

* try remove all instances of prompt names

* more tests

* fix agent tests

* more tests

* fix tests

* pylint

* comma

* black

* fix test

* docstring

* review feedback

* review feedback

* fix mocks

* mypy

* fix mocks

* fix reference to missing templates

* feedback

* remove direct references to default template var

* tests

* Update haystack/nodes/prompt/prompt_node.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-23 15:22:58 +02:00
ZanSara
f80ae01174
LocalWhisperTranscriber (v2) (#4909)
* original component

* remove remote parts

* unit tests

* polish docstrings

* fix unit tests

* fix e2e tests

* pylint

* remove check

* review feedback

* add type: ignore

* improve tests

* test stream handling

* upgrade canals and improve tests

* pylint
2023-05-22 18:30:35 +02:00
ZanSara
516db4cb52
RemoteWhisperTranscriber (v2) (#4910)
* original-component

* stub

* fix implementation

* fix tests

* review feedback

* review feedback

* upgrade canals

* upgrade canals

* upgrade canals to fix pipeline test

* remove requests_with_retry

* feedback
2023-05-22 16:02:58 +02:00