1524 Commits

Author SHA1 Message Date
ZanSara
65cdf36d72
chore: block all HTTP requests in CI (#5088) 2023-06-13 14:52:24 +02:00
ZanSara
3c71f0ae3d
chore: mark some unit tests under test/pipeline (#5124)
* mark some unit tests as such

* remove marker
2023-06-12 17:58:31 +02:00
ZanSara
49e037a055
fix: rename requests.py into requests_utils.py (#5099)
* requests.py -> requests_utils.py

* fix tests

* reimport requrests

* fix more tests

* review feedback
2023-06-12 12:40:21 +02:00
Vladimir Blagojevic
0cc9ce7522
fix: WebRetriever top_k is ignored in a pipeline (#5106)
* Initial changes

* Add WebSearch, WebRetriever top_k unit tests

* Add exact integration test that failed Tuana

* PR review
2023-06-09 10:42:37 +02:00
Julian Risch
d8a4f20379
feat: Consider prompt_node's default_prompt_template in agent (#5095)
* consider prompt_node's default_prompt_template in agent

* make test a unit test via mocking

* updated docstring
2023-06-08 13:42:28 +02:00
Vladimir Blagojevic
e3b069620b
feat: pass model parameters to HFLocalInvocationLayer via model_kwargs, enabling direct model usage (#4956)
* Simplify HFLocalInvocationLayer, move/add unit tests

* PR feedback

* Better pipeline invocation, add mocked tests

* Minor improvements

* Mock pipeline directly,  unit test updates

* PR feedback, change pytest type to integration

* Mock supports unit test

* add full stop

* PR feedback, improve unit tests

* Add mock_get_task fixture

* Further improve unit tests

* Minor unit test improvement

* Add unit tests, increase coverage

* Add unit tests, increase test coverage

* Small optimization, improve _ensure_token_limit unit test

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 13:34:45 +02:00
Silvano Cerza
a2156ee8fb
fix: Fix handling of streaming response in AnthropicClaudeInvocationLayer (#4993)
* Fix handling of streaming response in AnthropicClaudeInvocationLayer
---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 10:57:36 +02:00
bogdankostic
da1f245a84
feat: Add batch_size parameter and cast timeout_config value to tuple for WeaviateDocumentStore (#5079)
* Add batch_size parameter and cast timeout_config to tuple

* Add unit test

* Remove debug tqdm

* Remove debug tqdm introduced in #5063
2023-06-06 17:06:10 +02:00
Sebastian
1777b22fcb
fix: Ensure eval mode for farm and transformer models for predictions (#3791)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-06 13:06:30 +02:00
Michael Feil
6ea8ae01a2
feat: Allow setting custom api_base for OpenAI nodes (#5033)
* add changes for api_base

* format retriever

* Update haystack/nodes/retriever/dense.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/audio/whisper_transcriber.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/preview/components/audio/whisper_remote.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/answer_generator/openai.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test_retriever.py

* Update test_whisper_remote.py

* Update test_generator.py

* Update test_retriever.py

* reformat with black

* Update haystack/nodes/prompt/invocation_layer/chatgpt.py

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* Add unit tests

* apply docstring suggestions

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: michaelfeil <me@michaelfeil.eu>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-05 11:32:06 +02:00
bogdankostic
a9a49e2c0a
feat: Add batching for querying in ElasticsearchDocumentStore and OpenSearchDocumentStore (#5063)
* Include benchmark config in output

* Use queries from aggregated labels

* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore

* Fix mypy

* Use self.batch_size in write_documents

* Use 10_000 as default batch size

* Add unit tests for write documents
2023-06-01 18:47:24 +02:00
bogdankostic
c3e59914da
refactor: Delete outdated benchmark files (#5008) 2023-06-01 13:59:12 +02:00
bogdankostic
6774e0ae58
fix: Use queries from aggregated labels in benchmarks (#5054)
* Include benchmark config in output

* Use queries from aggregated labels
2023-06-01 10:49:54 +02:00
ZanSara
89de76d5fe
feat: move cli out from preview (#5055)
* move cli from preview

* readme

* review feedback

* test mocks & import paths

* import path
2023-05-31 18:34:14 +02:00
Silvano Cerza
3fd9e0fd89
feat: Add CLI prompt cache command (#5050)
* Add CLI prompt cache command

* Rename prompt cache to prompt fetch
2023-05-30 18:04:52 +02:00
ZanSara
6249e65bc8
feat: prompts caching from PromptHub (#5048)
* split up prompttemplate init

* caching

* docstring

* add platformdirs

* use user_data_dir

* fix tests

* add tests

* pylint

* mypy
2023-05-30 16:55:48 +02:00
Silvano Cerza
37518c8b8c
chore: Simplify DefaultPromptHandler logic and add tests (#4979)
* Simplify DefaultPromptHandler logic and add tests

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Remove commented code

* Split single unit test into multiple tests

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-05-29 12:13:32 +02:00
ZanSara
7e5fa0dd94
fix: Move check for default PromptTemplates in PromptTemplate itself (#5018)
* make prompttemplate load the defaults instead of promptnode

* add test

* fix tenacity decorator

* fix tests

* fix error handling

* mypy

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-27 18:05:05 +02:00
bogdankostic
b8ff1052d4
refactor: Adapt running benchmarks (#5007)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Adapt running benchmarks script

* Adapt README.md

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* minor readme update

* Create separate methods for checking if pipeline contains reader or retriever

* Fix reader pipeline case

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-26 18:48:11 +02:00
bogdankostic
5633446173
refactor: Add reader-retriever benchmark script (#5006)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* Remove unused line

* Create separate method for getting reader config

* Make use of get_reader_config

* Create separate method for retriever config
2023-05-26 13:54:52 +02:00
bogdankostic
796340e788
refactor: Adapt reader benchmarks (#5005) 2023-05-26 11:40:35 +02:00
bogdankostic
6e10fdab27
refactor: Adapt retriever benchmarks script (#5004)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory
2023-05-25 15:39:02 +02:00
bogdankostic
c5f0f820cf
refactor: Adapt benchmarking utils (#5003)
* Adapt benchmarking utils

* Adapt error message

* Adapt doc store launcher registry

* Revert "Adapt doc store launcher registry"

This reverts commit e034936363dde760d393fe00cac998a54a0f5152.
2023-05-25 11:19:46 +02:00
Massimiliano Pippi
929b8d1fb0
ci: run Elasticsearch 8.6 in compatibility mode (#3853)
* bump ES version in CI

disable ssl

wait for service to start

set env vars

do not use choco to install ES

re-enable jobs deps

skip test on windows CI because of OOM

allocate more memory for ES

uniform ES installation and use default heap size

skip tests causing OOM

increase job timeout

restore memory limit for ES8

* Use latest elasticsearch version
2023-05-24 18:53:54 +02:00
Silvano Cerza
56d033e7e7
Add back hardcoded default templates (#4998) 2023-05-24 16:50:11 +02:00
bogdankostic
b85bc44c00
Mock request from prompt hub (#5011) 2023-05-24 12:23:49 +02:00
Silvano Cerza
524d2cba36
Fix CohereInvocationLayer _ensure_token_limit not returning resized (#4978)
prompt
2023-05-23 17:58:01 +02:00
Massimiliano Pippi
68924161df
chore: remove deprecated node PDFToTextOCRConverter (#4982)
* remove deprecated node

* remove related test
2023-05-23 16:55:54 +02:00
ZanSara
949b1b63b3
PromptHub integration in PromptNode (#4879)
* initial integration

* upgrade of prompthub

* fix get_prompt_template

* feedback

* add prompthub-py to dependencies

* tests

* mypy

* stray changes

* review feedback

* missing init

* fix test

* move logic in prompttemplate

* linting

* bugfixes

* fix unit tests

* fix cache

* simplify prompttemplate init

* remove unused function

* removing wrong params

* try remove all instances of prompt names

* more tests

* fix agent tests

* more tests

* fix tests

* pylint

* comma

* black

* fix test

* docstring

* review feedback

* review feedback

* fix mocks

* mypy

* fix mocks

* fix reference to missing templates

* feedback

* remove direct references to default template var

* tests

* Update haystack/nodes/prompt/prompt_node.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-23 15:22:58 +02:00
ZanSara
f80ae01174
LocalWhisperTranscriber (v2) (#4909)
* original component

* remove remote parts

* unit tests

* polish docstrings

* fix unit tests

* fix e2e tests

* pylint

* remove check

* review feedback

* add type: ignore

* improve tests

* test stream handling

* upgrade canals and improve tests

* pylint
2023-05-22 18:30:35 +02:00
ZanSara
516db4cb52
RemoteWhisperTranscriber (v2) (#4910)
* original-component

* stub

* fix implementation

* fix tests

* review feedback

* review feedback

* upgrade canals

* upgrade canals

* upgrade canals to fix pipeline test

* remove requests_with_retry

* feedback
2023-05-22 16:02:58 +02:00
Vladimir Blagojevic
068a967e5b
feat: HFInferenceEndpointInvocationLayer streaming support (#4819)
* HFInferenceEndpointInvocationLayer streaming support

* Small fixes

* Add unit test

* PR feedback

* Alphabetically sort params

* Convert PromptNode tests to HFInferenceEndpointInvocationLayer invoke tests

* Rewrite streaming with sseclient

* More PR updates

* Implement and test _ensure_token_limit

* Further optimize DefaultPromptHandler

* Fix CohereInvocationLayer mistypes

* PR feedback

* Break up unit tests, simplify

* Simplify unit tests even further

* PR feedback on unit test simplification

* Proper code identation under patch context manager

* More unit tests, slight adjustments

* Remove unrelated CohereInvocationLayer change

This reverts commit 82337151e8328d982f738e5da9129ff99350ea0c.

* Revert "Further optimize DefaultPromptHandler"

This reverts commit 606a761b6e3333f27df51a304cfbd1906c806e05.

* lg update

mostly full stops at the end of docstrings

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-22 14:45:53 +02:00
Silvano Cerza
9398183447
Simplify PromptNode generation_kwargs tests (#4975) 2023-05-22 14:28:08 +02:00
Fanli Lin
cd2ea4bc91
feat: enable passing generation_kwargs to the PromptNode in pipeline.run() (#4832)
* add generation_kwargs

* add documentation

* enable max_new_tokens  customization

* add code formatting

* add unit test

* fix formatting

* test with black

* add a new unit test

* remove doc and update tests

* unpack generation_kwargs

* ix comment

* update unit test

* remove generation_kwargs

* not pass `generation_kwargs`

* update tests

* add max_length

* fix formatting

* revert

* reformatting
2023-05-22 11:45:06 +02:00
Massimiliano Pippi
8228081e7a
chore: leftovers from removing knowledge graph support (#4974)
* leftovers from removing knowledge graph support

* more leftovers
2023-05-22 10:03:51 +02:00
Massimiliano Pippi
c6ea542b57
chore: remove BaseKnowledgeGraph (#4953)
* remove BaseKnowledgeGraph

* fix pylint
2023-05-21 10:42:02 +02:00
Massimiliano Pippi
4974bf7ab3
chore: remove deprecated MilvusDocumentStore (#4951)
* remove deprecated MilvusDocumentStore

* remove leftovers

* fix pylint
2023-05-19 16:37:38 +02:00
Massimiliano Pippi
85254fe9f6
leftover from merge conflict (#4962) 2023-05-19 16:10:26 +02:00
Vladimir Blagojevic
eb9d14faeb
fix: Adjust tool pattern to support multi-line inputs (#4801)
* Add support for multi line tool input

* Fix failing agent test, additional test_tools_manager.py tests

* Allow empty tool input, add more tests

* More unit tests

* String formatting

* Small str fix
2023-05-18 16:39:31 +02:00
Massimiliano Pippi
58acef77c4
avoid importing the weaviate client directly (#4945) 2023-05-18 16:08:53 +02:00
ZanSara
123ee55a5c
docstring (#4950) 2023-05-18 16:00:02 +02:00
Vladimir Blagojevic
5d7ee2e5e6
feat: Add max_tokens to BaseGenerator params (#4168)
* Add max_tokens to BaseGenerator params

* Make mypy happy

* Rebase and resolve conflicts

* Fix signature issues

* Update lg

* Add a mocked unit test method

* end-of-file-fixer corrected file

* Convert to unit test

* Mark test as integration

* make the test unit

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-18 15:19:29 +02:00
Shukri
ad162f2e65
feat: Support authentication using AuthBearerToken and AuthClientCredentials in Weaviate (#4028)
* refactor: make the scope param configurable

the scope parameter is used when authenticating using
AuthClientPassword and AuthClientCredentials

* feat: add support for AuthClientCredentials

add support for authenticating using the OIDC Client Credentials
authentication flow

* feat: add support for AuthBearerToken

Add support for authenticating using OIDC and bearer tokens

* Update lg

* refactor how client is built

Signed-off-by: hsm207 <hsm207@users.noreply.github.com>

* unit test the auth methods

Signed-off-by: hsm207 <hsm207@users.noreply.github.com>

* Update test_weaviate.py

* revert formatting change

* Fix type hints

---------

Signed-off-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: John Doe <johndoe@example.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-18 10:17:11 +02:00
Massimiliano Pippi
3ea784464a
add test case for #4929 (#4936) 2023-05-18 09:12:03 +02:00
Julian Risch
8cfeed095d
build: Remove mmh3 dependency (#4896)
* build: Remove mmh3 dependency

* resolve circular import

* pylint

* make mmh3.py sibling of schema.py

* pylint import order

* pylint

* undo example changes

* increase coverage in modeling module

* increase coverage further

* rename new unit tests
2023-05-17 21:31:08 +02:00
bogdankostic
df46e7fadd
fix: Use AutoTokenizer instead of DPR specific tokenizer (#4898)
* Use AutoTokenizer instead of DPR specific tokenizer

* Adapt TableTextRetriever

* Adapt tests

* Adapt tests
2023-05-17 18:54:34 +02:00
Vladimir Blagojevic
9d52998b25
feat: Add conversational agent (#4931) 2023-05-17 15:19:09 +02:00
tstadel
7625829684
fix: EvaluationResult serialization changes dataframes (#4906)
* fix nan and index values

* add test

* make test for None values after evalresult read explicit
2023-05-16 16:03:09 +02:00
Vladimir Blagojevic
37cadd702a
fix: Make sure summary memory is cumulative (#4932)
* Fix summary memory not being cummulative

* PR feedback - Julian
2023-05-16 13:35:19 +02:00
Stefano Fiorucci
6e0000732d
feat: add BLIP support in TransformersImageToText (#4912)
* add blip support

* fix typo

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-16 10:57:41 +02:00