1308 Commits

Author SHA1 Message Date
Julian Risch
30fdf2b5df
feat!: Add extra for inference dependencies such as torch (#5147)
* feat!: add extra for inference dependencies such as torch

* add inference extra to 'all' and 'all-gpu' extra

* install inference extra in selected integration tests

* import LazyImport

* review feedback

* add import error messages and update readme

* remove extra dot
2023-06-20 09:54:10 +02:00
Shukri
916e8452f5
feat!: simplify weaviate auth (#5115)
* feat!: simplify weaviate auth

* docs: explain param precedence

* refactor: simplify _get_embedded_options
2023-06-19 15:46:58 +02:00
Ben Heckmann
1318ac5074
feat: Optional Content Moderation for OpenAI PromptNode & OpenAIAnswerGenerator (#5017)
* #4071 implemented optional content moderation for OpenAI PromptNode

* added two simple integration tests

* improved documentation & renamed _invoke method to _execute_openai_request

* added a flag to check_openai_policy_violation that will return a full dict of all text violations and their categories

* re-implemented the tests as unit tests & without use of the OpenAI APIs

* removed unused patch

* changed check_openai_policy_violation back to only return a bool

* fixed pylint and test error

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-06-19 13:27:11 +02:00
erwanlc
97f136b901
Added fix when using Azure OpenAI with gpt-4 (#5105) 2023-06-19 10:17:58 +02:00
ZanSara
f52477d31b
fix: small improvement to pipeline v2 tests (#5153)
* add missing return

* improve test

* docstring
2023-06-16 12:07:00 +02:00
Daria Fokina
23a22be03c
docs: update CLI readme (#5129)
* docs: update CLI readme

Update CLI Readme for easier understanding and more details.

* update cache path

* version as separate command

* resolve comments

* prompthub_cache_path env var

* wording

* Update fetch.py
2023-06-15 16:32:36 +02:00
Vladimir Blagojevic
8d8de65492
Add AgentToolLogger, unit test, and example usage (#5087) 2023-06-15 08:43:20 +02:00
Ben Heckmann
60e5d73424
fix: changing document scores (#5090)
* #4653 fix changing scores by returning new document objects from document store queries

* added integration test for InMemoryDocumentStore demonstrating the desired behavior

* Update test/document_stores/test_memory.py
2023-06-14 17:35:46 +02:00
darionreyes
58c022ef86
fix: increase max token length for openai 16k models (#5145) 2023-06-14 16:24:04 +02:00
ZanSara
20c1f23fff
feat: optional transformers (#5101)
* generalimport -> lazy-imports

* remove generalimport

* fix pdftotextconverter import check

* customize error messages

* pylint

* fix sql.py

* pylint

* Update haystack/document_stores/sql.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* make contextmanager less verbose

* do not catch syntax errors

* review feedback

* make all torch and transformers import lazy

* fix environment.py

* mypy

* merge leftovers

* fix schema

* pylint

* review feedback

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-14 12:00:20 +02:00
Julian Risch
ce1c9c9ddb
fix: Relax ChatGPT model name check to support gpt-3.5-turbo-0613 (#5142)
* relax model name checking for chatgpt

* add unit tests
2023-06-14 09:53:00 +02:00
Julian Risch
4c8e0b9d4a
fix: PromptNode falls back to empty list of documents if none are provided but expected (#5132)
* add warning, default to empty docs list, tests

* pylint
2023-06-13 16:35:19 +02:00
bogdankostic
29a6bfe621
fix: Don't log info message in DataSilo with SquadProcessor about clipping (#5127) 2023-06-13 10:31:39 +02:00
ZanSara
49e037a055
fix: rename requests.py into requests_utils.py (#5099)
* requests.py -> requests_utils.py

* fix tests

* reimport requrests

* fix more tests

* review feedback
2023-06-12 12:40:21 +02:00
Julian Risch
72fe43a7cc
build: Move Azure's Form Recognizer dependency to extras (#5096)
* build: Move Azure's Form Recognizer dependency to extras

* try catch imports for AzureConverter

* assign None to failed imports

* use lazy import

* use forward reference in type hints
2023-06-12 12:23:32 +02:00
Vladimir Blagojevic
0cc9ce7522
fix: WebRetriever top_k is ignored in a pipeline (#5106)
* Initial changes

* Add WebSearch, WebRetriever top_k unit tests

* Add exact integration test that failed Tuana

* PR review
2023-06-09 10:42:37 +02:00
Julian Risch
d8a4f20379
feat: Consider prompt_node's default_prompt_template in agent (#5095)
* consider prompt_node's default_prompt_template in agent

* make test a unit test via mocking

* updated docstring
2023-06-08 13:42:28 +02:00
ZanSara
52e7a77595
feat: introduce lazy_import (#5084)
* generalimport -> lazy-imports

* remove generalimport

* fix pdftotextconverter import check

* customize error messages

* pylint

* fix sql.py

* pylint

* Update haystack/document_stores/sql.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* make contextmanager less verbose

* do not catch syntax errors

* review feedback

* Update haystack/nodes/file_converter/pdf.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-08 12:11:38 +02:00
ZanSara
5022abb546
chore: Remove stray import (#5097) 2023-06-07 18:07:27 +02:00
Vladimir Blagojevic
e3b069620b
feat: pass model parameters to HFLocalInvocationLayer via model_kwargs, enabling direct model usage (#4956)
* Simplify HFLocalInvocationLayer, move/add unit tests

* PR feedback

* Better pipeline invocation, add mocked tests

* Minor improvements

* Mock pipeline directly,  unit test updates

* PR feedback, change pytest type to integration

* Mock supports unit test

* add full stop

* PR feedback, improve unit tests

* Add mock_get_task fixture

* Further improve unit tests

* Minor unit test improvement

* Add unit tests, increase coverage

* Add unit tests, increase test coverage

* Small optimization, improve _ensure_token_limit unit test

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 13:34:45 +02:00
Silvano Cerza
a2156ee8fb
fix: Fix handling of streaming response in AnthropicClaudeInvocationLayer (#4993)
* Fix handling of streaming response in AnthropicClaudeInvocationLayer
---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-06-07 10:57:36 +02:00
bogdankostic
da1f245a84
feat: Add batch_size parameter and cast timeout_config value to tuple for WeaviateDocumentStore (#5079)
* Add batch_size parameter and cast timeout_config to tuple

* Add unit test

* Remove debug tqdm

* Remove debug tqdm introduced in #5063
2023-06-06 17:06:10 +02:00
Sebastian
1777b22fcb
fix: Ensure eval mode for farm and transformer models for predictions (#3791)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-06 13:06:30 +02:00
Michael Feil
6ea8ae01a2
feat: Allow setting custom api_base for OpenAI nodes (#5033)
* add changes for api_base

* format retriever

* Update haystack/nodes/retriever/dense.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/audio/whisper_transcriber.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/preview/components/audio/whisper_remote.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/nodes/answer_generator/openai.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update test_retriever.py

* Update test_whisper_remote.py

* Update test_generator.py

* Update test_retriever.py

* reformat with black

* Update haystack/nodes/prompt/invocation_layer/chatgpt.py

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* Add unit tests

* apply docstring suggestions

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: michaelfeil <me@michaelfeil.eu>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-06-05 11:32:06 +02:00
bogdankostic
9cb83402c4
refactor: Use globally defined request timeout in ElasticsearchDocumentStore and OpenSearchDocumentStore (#5064)
* Include benchmark config in output

* Use queries from aggregated labels

* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore

* Use globally defined timeout

* Fix mypy

* Use self.batch_size in write_documents

* Use 10_000 as default batch size

* Add unit tests for write documents
2023-06-05 09:47:31 +02:00
bogdankostic
a9a49e2c0a
feat: Add batching for querying in ElasticsearchDocumentStore and OpenSearchDocumentStore (#5063)
* Include benchmark config in output

* Use queries from aggregated labels

* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore

* Fix mypy

* Use self.batch_size in write_documents

* Use 10_000 as default batch size

* Add unit tests for write documents
2023-06-01 18:47:24 +02:00
ZanSara
89de76d5fe
feat: move cli out from preview (#5055)
* move cli from preview

* readme

* review feedback

* test mocks & import paths

* import path
2023-05-31 18:34:14 +02:00
Philippe Creux
e209abd48e
Fix doc FARMReader.predict (#5049) 2023-05-31 10:01:43 +02:00
Silvano Cerza
3fd9e0fd89
feat: Add CLI prompt cache command (#5050)
* Add CLI prompt cache command

* Rename prompt cache to prompt fetch
2023-05-30 18:04:52 +02:00
ZanSara
6249e65bc8
feat: prompts caching from PromptHub (#5048)
* split up prompttemplate init

* caching

* docstring

* add platformdirs

* use user_data_dir

* fix tests

* add tests

* pylint

* mypy
2023-05-30 16:55:48 +02:00
Silvano Cerza
ba06bc4805
Unpin typing_extensions and remove all its uses (#5040) 2023-05-29 15:31:34 +02:00
Silvano Cerza
37518c8b8c
chore: Simplify DefaultPromptHandler logic and add tests (#4979)
* Simplify DefaultPromptHandler logic and add tests

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Remove commented code

* Split single unit test into multiple tests

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-05-29 12:13:32 +02:00
Fanli Lin
7001aee3fe
fix: prompt_template_resolved.output_variable is NoneType issue (#4976)
* try except instead or

* fix black formatting

* bug fix

* revert back the formatting
2023-05-29 10:48:10 +02:00
ZanSara
7e5fa0dd94
fix: Move check for default PromptTemplates in PromptTemplate itself (#5018)
* make prompttemplate load the defaults instead of promptnode

* add test

* fix tenacity decorator

* fix tests

* fix error handling

* mypy

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-27 18:05:05 +02:00
Julian Risch
2ede4d1d1d
build: Remove dill dependency (#4985)
* remove dill dependency

* remove dill from .toml
2023-05-26 17:50:55 +02:00
David Tippett
934db42528
docs: Updating docstrings to say OpenSearch and backlink to correct docs
- Added backlinks to OpenSearch's documentation where documentation was present

Signed-off-by: David Tippett <17506770+dtaivpp@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-25 16:52:42 +02:00
bogdankostic
aaab925508
feat: Allow setting java options when launching Elasticsearch / OpenSearch (#5002)
* Allow launching Elasticsearch and OpenSearch with java options and deleting Weaviate

* Remove unneeded imports

* Simplify java opts tring generation
2023-05-25 10:30:59 +02:00
bogdankostic
19829da01b
refactor: Generate eval result in separate method (#5001)
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-25 10:30:41 +02:00
ZanSara
44fd0cff7a
fix: fitz import switcher (#5012)
* fix pymupdf import switcher

* install pdf

* check after the import

* revert workflow change

* pylint

* pylint

* pylint again
2023-05-24 18:58:40 +02:00
Silvano Cerza
56d033e7e7
Add back hardcoded default templates (#4998) 2023-05-24 16:50:11 +02:00
Julian Risch
ae9f384a97
feat: Add prompt_template to conversational agent init params (#4994) 2023-05-24 09:22:29 +02:00
Silvano Cerza
524d2cba36
Fix CohereInvocationLayer _ensure_token_limit not returning resized (#4978)
prompt
2023-05-23 17:58:01 +02:00
Massimiliano Pippi
68924161df
chore: remove deprecated node PDFToTextOCRConverter (#4982)
* remove deprecated node

* remove related test
2023-05-23 16:55:54 +02:00
ZanSara
949b1b63b3
PromptHub integration in PromptNode (#4879)
* initial integration

* upgrade of prompthub

* fix get_prompt_template

* feedback

* add prompthub-py to dependencies

* tests

* mypy

* stray changes

* review feedback

* missing init

* fix test

* move logic in prompttemplate

* linting

* bugfixes

* fix unit tests

* fix cache

* simplify prompttemplate init

* remove unused function

* removing wrong params

* try remove all instances of prompt names

* more tests

* fix agent tests

* more tests

* fix tests

* pylint

* comma

* black

* fix test

* docstring

* review feedback

* review feedback

* fix mocks

* mypy

* fix mocks

* fix reference to missing templates

* feedback

* remove direct references to default template var

* tests

* Update haystack/nodes/prompt/prompt_node.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-23 15:22:58 +02:00
Julian Risch
9e4feb6bed
build: Remove tiktoken alternative (#4991)
* remove conditional statements from tiktoken

* remove count_openai_tokens method
2023-05-23 13:05:30 +02:00
Julian Risch
6747f1f0a6
build: Remove SPARQLWrapper and rdflib from generalimport (#4986)
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-05-23 12:05:24 +02:00
Silvano Cerza
afc342b5fe
Pin typing_extensions to fix Pydantic issue (#4987) 2023-05-23 11:08:50 +02:00
Silvano Cerza
0fb47b5fda
Fix request_with_retry kwargs (#4980) 2023-05-22 18:36:00 +02:00
ZanSara
f80ae01174
LocalWhisperTranscriber (v2) (#4909)
* original component

* remove remote parts

* unit tests

* polish docstrings

* fix unit tests

* fix e2e tests

* pylint

* remove check

* review feedback

* add type: ignore

* improve tests

* test stream handling

* upgrade canals and improve tests

* pylint
2023-05-22 18:30:35 +02:00
ZanSara
516db4cb52
RemoteWhisperTranscriber (v2) (#4910)
* original-component

* stub

* fix implementation

* fix tests

* review feedback

* review feedback

* upgrade canals

* upgrade canals

* upgrade canals to fix pipeline test

* remove requests_with_retry

* feedback
2023-05-22 16:02:58 +02:00