187 Commits

Author SHA1 Message Date
Stefano Fiorucci
188b2a7f06
feat: support for tools in OpenAIChatGenerator (#8666)
* move chatmsg>openai conversion to chatmsg dataclass

* implementation and tests cleanup

* release note

* try fixing azure chat generator

* add serde test for toolinvoker

* small fix
2024-12-20 14:20:54 +00:00
Stefano Fiorucci
2045f6f16a
try test jsonschema (#8496) 2024-10-29 16:21:51 +01:00
Silvano Cerza
fd1a06d171
Disable tracing when running tests (#7934) 2024-06-26 12:32:05 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Silvano Cerza
de4fca4526
ci: Skip collection of test_json_schema.py to fix CI failures (#7353)
* Skip collection of test_json_schema.py to fix CI failures

* mock chroma instance

* revert

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-03-13 16:59:26 +01:00
Tobias Wochinger
6e580e4430
feat: implement pipeline tracing (#7046)
* feat: implement pipeline tracing

* tests: improve test setup for spying tracer

* feat: implement util for type coercion

* fix: trace a after checking pipeline output

* docs: add release notes

* docs: drop unused imports

* refactor: simplify getting raw span

* refactor: implement `ProxyTracer`
2024-02-22 12:52:04 +01:00
Vladimir Blagojevic
4d08be0c2a
feat: Update OpenAI Python Client in Haystack 2.x (#6584)
* Update openai python client

* Add release note

* Consolidate multiple mock_chat_completion into one

* Ensure all components have api_base_url, organization params

* Update tests

* Enable function calling

* Oversight

* Minor fixes, add streaming test mocks

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* metadata -> meta

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-12-21 16:21:24 +01:00
Silvano Cerza
8a513f3b8c
test: Add fixture to block requests in tests (#6585)
* Add fixture to block requests in tests

* Mark tests making requests as integration
2023-12-21 08:51:54 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00
Daniel Fleischer
0cef17ac13
feat: embedding instructions for dense retrieval (#6372)
* Embedding instructions in EmbeddingRetriever

Query and documents embeddings are prefixed with instructions, useful
for retrievers finetuned on specific tasks, such as Q&A.

* Tests

Checking vectors 0th component vs. reference, using different stores.

* Normalizing vectors

* Release notes
2023-11-21 12:56:40 +01:00
DanShatford
07048791aa
feat: allow list of file paths in convert_files_to_docs (#5961)
* feat: allow list of file paths in `convert_files_to_docs`

* Fix validation

* Fix check errors
2023-10-09 20:19:03 +02:00
Christian Clauss
6dd52d91b2
ci: Fix typos discovered by codespell (#5778)
* Fix typos discovered by codespell

* pylint: max-args = 38
2023-09-13 16:14:45 +02:00
Sebastian Husch Lee
b5aef24a7e
feat: Add support for meta fields that are lists when using embed_meta_fields (#5307)
* Add support for meta fields that are lists when using embed_meta_fields

* Make sure unit test doesn't download model

* Adding more unit tests
2023-07-11 17:32:33 +02:00
bogdankostic
0697f5c63e
fix: Support isolated node eval in run_batch in Generators (#5291)
* Add isolated node eval to BaseGenerator's run_batch

* Add unit tests
2023-07-07 10:32:43 +02:00
Silvano Cerza
a1a390056a
Remove requests_cache in tests (#5285) 2023-07-06 13:22:52 +02:00
Sebastian Husch Lee
12f319b4c9
Remove deprecated return_table_cell from conftest.py (#5264) 2023-07-05 09:37:41 +02:00
Stefano Fiorucci
637433841e
chore: remove deprecated Seq2SeqGenerator and RAGenerator (#5180)
* first draft of removal

* more removals

* don't download unused models
2023-06-21 16:38:45 +02:00
ZanSara
65cdf36d72
chore: block all HTTP requests in CI (#5088) 2023-06-13 14:52:24 +02:00
ZanSara
6249e65bc8
feat: prompts caching from PromptHub (#5048)
* split up prompttemplate init

* caching

* docstring

* add platformdirs

* use user_data_dir

* fix tests

* add tests

* pylint

* mypy
2023-05-30 16:55:48 +02:00
ZanSara
949b1b63b3
PromptHub integration in PromptNode (#4879)
* initial integration

* upgrade of prompthub

* fix get_prompt_template

* feedback

* add prompthub-py to dependencies

* tests

* mypy

* stray changes

* review feedback

* missing init

* fix test

* move logic in prompttemplate

* linting

* bugfixes

* fix unit tests

* fix cache

* simplify prompttemplate init

* remove unused function

* removing wrong params

* try remove all instances of prompt names

* more tests

* fix agent tests

* more tests

* fix tests

* pylint

* comma

* black

* fix test

* docstring

* review feedback

* review feedback

* fix mocks

* mypy

* fix mocks

* fix reference to missing templates

* feedback

* remove direct references to default template var

* tests

* Update haystack/nodes/prompt/prompt_node.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-05-23 15:22:58 +02:00
ZanSara
516db4cb52
RemoteWhisperTranscriber (v2) (#4910)
* original-component

* stub

* fix implementation

* fix tests

* review feedback

* review feedback

* upgrade canals

* upgrade canals

* upgrade canals to fix pipeline test

* remove requests_with_retry

* feedback
2023-05-22 16:02:58 +02:00
Massimiliano Pippi
8228081e7a
chore: leftovers from removing knowledge graph support (#4974)
* leftovers from removing knowledge graph support

* more leftovers
2023-05-22 10:03:51 +02:00
Massimiliano Pippi
4974bf7ab3
chore: remove deprecated MilvusDocumentStore (#4951)
* remove deprecated MilvusDocumentStore

* remove leftovers

* fix pylint
2023-05-19 16:37:38 +02:00
Vladimir Blagojevic
5d7ee2e5e6
feat: Add max_tokens to BaseGenerator params (#4168)
* Add max_tokens to BaseGenerator params

* Make mypy happy

* Rebase and resolve conflicts

* Fix signature issues

* Update lg

* Add a mocked unit test method

* end-of-file-fixer corrected file

* Convert to unit test

* Mark test as integration

* make the test unit

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-05-18 15:19:29 +02:00
ZanSara
1b57b96210
refactor!: extract elasticsearch (#4668)
* extract elasticsearch

* update pyproject.toml

* make more import optional

* move MockBaseRetriever in conftest

* install es in the es integration tests
2023-04-26 10:14:20 +02:00
Sebastian
8d9136bad4
feat: Implementation of Table Cell Proposal (#4616)
* Starting adding support for TableCell

* Update tests to use row and col

* Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables.

* Update eval test to use TableCell

* Added more schema tests for table docs, labels and answers.

* Add boolean to toggle between Span and TableCell

* Add deprecation message

* Test that table answers work as responses in the rest API

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-19 13:14:49 +02:00
Silvano Cerza
f13cc751c3
Block requests_cache in unit tests (#4696) 2023-04-18 16:15:26 +02:00
Silvano Cerza
79727ed31f
Add requests blocker fixture (#4671) 2023-04-14 18:01:30 +02:00
Vladimir Blagojevic
1dcac11133
feat: Add Hugging Face inferencing PromptNode layer (#4641) 2023-04-14 17:59:17 +02:00
ZanSara
174d80ab41
skip tests (#4654) 2023-04-13 17:56:51 +02:00
ZanSara
ba11d1c2a8
refactor!: extract evaluation and statistical dependencies (#4457)
* try-catch sklearn and scipy

* haystack imports

* linting

* mypy

* try to import baseretriever

* remove typing

* unused import

* remove more typing

* pylint

* isolate sql imports for postgres, which we don't use anyway

* remove stats

* replace expit

* als inmemory

* mypy

* feedback

* docker

* expit

* re-add njit
2023-04-12 15:38:56 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest (#4614)
* Split root conftest into multiple ones and remove unused fixtures

* Remove some constants and make them fixtures

* Remove unnecessary fixture scoping

* Fix failing whisper tests

* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00
Silvano Cerza
e85dc79eaa
test: Add pytest fixture to block requests in unit tests (#4433)
* Add pytest fixture to block requests in unit tests

* Mark test correctly as integration

* Fix crawler unit test failing cause it tries to install chromedriver
2023-04-06 18:04:57 +02:00
Silvano Cerza
c3abf73332
refactor: Rework prompt tests (#4600)
* Rework some PromptNode and PromptModel tests

* Remove duplicate code in PromptNode

* Fix mypy

* Fix test cause of missing fixture

* Revert "Fix mypy"

This reverts commit e530295a06cb260d9a8bd89679534958cb3d9776.

* Revert "Remove duplicate code in PromptNode"

This reverts commit 4a678ae81504dcc78a737372c061d12dc8799639.
2023-04-06 14:47:44 +02:00
Vladimir Blagojevic
be25655663
feat: Add agent tools (#4437)
* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Initial commit, add search_engine

* Add TopPSampler

* Add more TopPSampler unit tests

* Remove SearchEngineSampler (converted to TopPSampler)

* Add some basic WebSearch unit tests

* Rename unit tests

* Add WebRetriever into agent_tools

* Adjust to WebRetriever

* Add WebRetriever mode [snippet|document]

* Minor changes

* SerperDev: add peopleAlsoAsk search results

* First agent for hotpotqa

* Making WebRetriever work on hotpotqa

* refactor: minor WebRetriever improvements (#4377)

* refactor: remove doc ids rebuild + antecipate cache

* refactor: improve caching, fix Document ids

* Minor WebRetriever improvements

* Overlooked minor fixes

* feat: add Bing API as search engine

* refactor: let kwargs pass-through

* feat: increase search context

* check sampler result, improve batch typing

* refactor: increase mypy compliance

* Fix mypy

* Minor example fixes

* Fix the descriptions

* PR feedback updates

* More fixes

* TopPSampler: handle top p None value, add unit test

* Add top_k to WebSearch

* Use boilerpy3 instead trafilatura

* Remove date finding

* Add more WebRetriever docs

* Refactor long methods

* making the preprocessor optional

* hide WebSearch and make NeuralWebSearch a pipeline

* remove unused imports

* add WebQAPipeline and split example into two

* change example search engine to SerperDev

* Turn off progress bars in WebRetriever's PreProcesssor

* Agent tool examples - final updates

* Add webqa test, search results ranking scores

* Better answer box handling for SerperDev and SerpAPI

* Minor fixes

* pylint

* pylint fixes

* extract TopPSampler from WebRetriever

* use sampler only for WebRetriever modes other than snippet

* add web retriever tests

* add web retriever tests

* exclude rdflib@6.3.2 due to license issues

* add test for preprocessed docs and kwargs examples in docstrings

* Move test_webqa_pipeline to test/pipelines

* change docstring for join_documents_and_scores

* Use WebQAPipeline in examples/web_lfqa.py

* Use WebQAPipeline in examples/web_lfqa.py

* Move test_webqa_pipeline to e2e

* Updated lg

* Sampler added automatically in WebQAPipeline, no need to add it

* Updated lg

* Updated lg

* :ignore Update agent tools examples to new templates (#4503)

* Update examples to new templates

* Add print back

* fix linting and black format issues

---------

Co-authored-by: Daniel Bichuetti <daniel.bichuetti@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-03-27 18:14:58 +02:00
tstadel
f8bb270d62
feat: prompt at query time (#4454)
* use outputshapers in prompttemplate

* fix pylint

* first iteration on regex

* implement new promptnode syntax based on f-strings

* finish fstring implementation

* add additional tests

* add security tests

* fix mypy

* fix pylint

* fix test_prompt_templates

* fix test_prompt_template_repr

* fix test_prompt_node_with_custom_invocation_layer

* fix test_invalid_template

* more security tests

* fix test_complex_pipeline_with_all_features

* fix agent tests

* refactor get_prompt_template

* fix test_prompt_template_syntax_parser

* fix test_complex_pipeline_with_all_features

* allow functions in comprehensions

* break out of fstring test

* fix additional tests

* mark new tests as unit tests

* fix agents tests

* convert missing templates

* proper use of get_prompt_template

* refactor and add docstrings

* fix tests

* fix pylint

* fix agents test

* fix tests

* refactor globals

* make allowed functions configurable via env variable

* better dummy variable

* fix special alias

* don't replace special char variables

* more special chars, better docstrings

* cherrypick fix audio tests

* fix test

* rework shapers

* fix pylint

* fix tests

* add new templates

* add reference parsing

* add more shaper tests

* add tests for join and to_string

* fix pylint

* fix pylint

* fix pylint for real

* auto fill shaper function params

* fix reference parsing for multiple references

* fix output variable inference

* consolidate qa prompt template output and make shaper work per-document

* implement prompt at query time

* support serialized PromptTemplates

* fix tests

* add tests for prompt template at query time

* fix types after merge

* fix types after merge

* improve test

* add test for nested shaper syntax in pipelines

* better docstrings

* Correct copilot errors

* found another copilot error

* Another one

* introduce output_parser

* introduce output_parser

* Fix tests for output_parser update

* fix black

* fix tests

* fix tests

* fix tests

* better docstring

* better docstring

* fix test

* fix mypy

* rename RegexAnswerParser to AnswerParser

* rename RegexAnswerParser to AnswerParser

* better docstrings

* better docstrings

* fix docstring example
2023-03-27 14:10:20 +02:00
tstadel
382ca8094e
feat: PromptTemplate extensions (#4378)
* use outputshapers in prompttemplate

* fix pylint

* first iteration on regex

* implement new promptnode syntax based on f-strings

* finish fstring implementation

* add additional tests

* add security tests

* fix mypy

* fix pylint

* fix test_prompt_templates

* fix test_prompt_template_repr

* fix test_prompt_node_with_custom_invocation_layer

* fix test_invalid_template

* more security tests

* fix test_complex_pipeline_with_all_features

* fix agent tests

* refactor get_prompt_template

* fix test_prompt_template_syntax_parser

* fix test_complex_pipeline_with_all_features

* allow functions in comprehensions

* break out of fstring test

* fix additional tests

* mark new tests as unit tests

* fix agents tests

* convert missing templates

* proper use of get_prompt_template

* refactor and add docstrings

* fix tests

* fix pylint

* fix agents test

* fix tests

* refactor globals

* make allowed functions configurable via env variable

* better dummy variable

* fix special alias

* don't replace special char variables

* more special chars, better docstrings

* cherrypick fix audio tests

* fix test

* rework shapers

* fix pylint

* fix tests

* add new templates

* add reference parsing

* add more shaper tests

* add tests for join and to_string

* fix pylint

* fix pylint

* fix pylint for real

* auto fill shaper function params

* fix reference parsing for multiple references

* fix output variable inference

* consolidate qa prompt template output and make shaper work per-document

* fix types after merge

* introduce output_parser

* fix tests

* better docstring

* rename RegexAnswerParser to AnswerParser

* better docstrings
2023-03-27 12:14:11 +02:00
Vladimir Blagojevic
53528c96a0
feat: Add ChatGPT PromptNode layer (#4357)
* Initial ChatGPTInvocationLayer
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
2023-03-17 14:16:41 +01:00
Silvano Cerza
9802fb159a
Remove unnecessary imports in conftest.py (#4434) 2023-03-16 10:02:01 +01:00
ZanSara
c802305ccf
test: move tests on standard pipelines in e2e/ (#4309)
* move out standard pipelines e2e

* fixing unit tests

* add test data

* feedback

* pylint

* black
2023-03-06 17:26:19 +01:00
Daniel Bichuetti
1548c5ba0f
feat: Add Azure OpenAI embeddings support (#4332)
* feate: add Azure OpenAI as embedding option

* feat: Add Azure OpenAI embeddings support

* refactor: check api key

* refactor: better type checking for Azure

* refactor: enable parallelism + separate and update tests

* refactor: string reformat

* refactor: explicit typing

* refactor: update refs and remove unused code
2023-03-06 13:37:20 +01:00
Vladimir Blagojevic
79bf25aaea
feat: Add Azure as OpenAI endpoint (#4170)
* Add Azure as OpenAI endpoint
---------

Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-03-02 09:55:09 +01:00
ZanSara
ae04ce3c6a
test: mock all Summarizer tests and move a few into e2e (#4299)
* stub e2e folders

* simplify pipeline test

* mocking

* unit tests fixed

* clean up e2e

* pipeline tests work

* pylint

* leftover

* small fix from #2994 and additional tests

* review feedback

* change summaries

* black

* revert models and summaries
2023-03-01 17:30:55 +01:00
ZanSara
165a0a5faa
test: mock all Translator tests and move one to e2e (#4290)
* mock all translator tests and move one to e2e

* typo

* extract pipeline tests using translator

* remove duplicate test

* move generator test in e2e

* Update e2e/pipelines/test_extractive_qa.py

* pytest.mark.unit

* black

* remove model name as well

* remove unused fixture

* rename original and improve pipeline tests

* fixes

* pylint
2023-03-01 14:52:05 +01:00
Stefano Fiorucci
e8f9b1b65d
test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS (#4283)
* replace elasticds with imds - first draft

* fix

* fix tests and implement scale_score in imds bm25

* add docstrings for scale_score
2023-03-01 11:35:10 +01:00
Silvano Cerza
4a93517eb4
test: Fix deprecation fixture (#4219)
* Fix deprecation fixture

* Update docstring

* Update docstring

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-27 09:55:03 +01:00
Julian Risch
5ce7a404ac
feat: Add Agent (#4148)
* initial Agent implementation

* mypy and pylint fixes

* add missing ABC import

* improved prompt template

* refactor and shorten run method

* refactor and shorten run method

* add tests for extracting

* fix mixed up tool_input/observation & make tests more robust

* fix bug with max_iterations and update prompt template

* allow setting prompt_template in Agent init

* remove example yml for agent

* add final prediction to transcript

* add transcript to errors and accept PromptTemplate in init

* simplify if else to elif

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* add checks for max_iter<2 and empty list returned by prompt node

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-02-21 14:27:40 +01:00
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00