261 Commits

Author SHA1 Message Date
Stefano Fiorucci
bcc4104729
refactor: utility function for docstore deserialization (#8226)
* refactor docstore deserialization

* more tests

* reno; headers

* expose key
2024-08-14 13:29:27 +02:00
Vladimir Blagojevic
3318d894c0
Add sede_with_list_output_type_in_pipeline unit test (#8196) 2024-08-13 14:37:24 +02:00
Amna Mubashar
373de97426
Deprecate SentenceWindowRetrieval (#8206) 2024-08-13 13:49:41 +02:00
Nicola Procopio
4c798470b2
added precision parameter to sentence transformers embeddings (#8179)
* added `precision` parameter to sentence transformers embeddings

* fixed test

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/embedders/test_sentence_transformers_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/embedders/test_sentence_transformers_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fix format

* Update sentence_transformers_text_embedder.py

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-09 11:38:47 +02:00
Marie-Luise Klaus
ec02817f14
fix: OutputAdapter from_dict with custom_filters None (#8173)
Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>
2024-08-08 14:02:40 +02:00
Corentin Meyer
58517014ec
fix: DocumentCleaner: keep the \f in text (#8078)
* Keep the \f in Document Cleaner

* Add Reno

* Add Test

* Simplified _remove_empty_lines() code
2024-08-07 14:50:14 +02:00
Marie-Luise Klaus
031b0bfbd8
fix: ChatPromptBuilder from_dict if template is None (#8165)
* fix ChatPromptBuilder from dict if template=None

* fix ChatPromptBuilder from dict if template=None

* leave template None

---------

Co-authored-by: Marie-Luise Klaus <marieluise.klaus@deepset.ai>
2024-08-06 14:48:04 +02:00
Tim Wellbrock
2e2f5f17bb
feat: add unicode normalization & ascii_only mode for DocumentCleaner (#8103)
* feat: add unicode normalization & ascii_only mode for DocumentCleaner.

* feat: add unicode_normalization parameter valdiation to DocumentCleaner.

* test: fix the unit test to work after code linting.
2024-08-05 13:00:39 +02:00
Stefano Fiorucci
e17d0c4192
chore: deprecate to_openai_format and create similar utility functions (#8146)
* deprecate and add new specific functions

* reno
2024-08-02 16:47:17 +02:00
Sebastian Husch Lee
c90495c2e8
feat: Add model and tokenizer kwargs to TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder (#8145)
* Start adding model and tokenizer kwargs support

* Add model and tokenizer kwargs to doc embedder

* Some updates and fixes in tests

* Fix more tests

* Fix tests

* Add release note

* Fix test

* Add from_dict tests
2024-08-02 10:37:10 +02:00
Vladimir Blagojevic
25d3520f5a
feat: Add AnswerJoiner new component (#8122)
* Initial AnswerJoiner

* Initial tests

* Add release note

* Resove mypy warning

* Add custom join function

* Serialize custom join function

* Handle all Answer types, add integration test, improve pydoc

* Make fixes

* Add to API docs

* Add more tests

* Update haystack/components/joiners/answer_joiner.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update docstrings and release notes

* update docstrings

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Darja Fokina <daria.fokina@deepset.ai>
2024-08-01 12:51:17 +02:00
Stefano Fiorucci
3d1ad10385
fix html test (#8127) 2024-07-31 10:59:53 +02:00
Corentin Meyer
1c53aae8f0
fix: Tika converter not yielding page break tags (\f) (#8082)
* Fix TikaConverter not having \f page tag by using HTML mode of parsing and then parsing the HTML to text using the old Haystack 1.X integration as template.

* Add Reno

* Fix test by making Mock Tika return XML (before parsing)

* refinements and test

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-07-26 20:13:47 +02:00
Amna Mubashar
e0de423ee0
Rename SentenceWindowRetrieval to SentenceWindowRetriever 2024-07-26 17:46:44 +02:00
Silvano Cerza
3fed1366c4
fix: Fix issue that could lead to RCE if using unsecure Jinja templates (#8095)
* Fix issue that could lead to RCE if using unsecure Jinja templates

* Add comment explaining exception suppression

* Update release note

* Update release note
2024-07-26 14:02:09 +00:00
Nicola Procopio
47f4db8698
added truncate_dim to sentence transformers embedder (#8077)
* added truncate_dim to sentence transformers embedder

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/release-note-2b603a123cd36214.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fixed parameter description

* added test for truncation to text embedder

* fix format

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-26 10:39:48 +02:00
Madeesh Kannan
b2aef217da
chore: Remove deprecated DynamicPromptBuilder and DynamicChatPromptBuilder components (#8085) 2024-07-26 10:00:59 +02:00
Amna Mubashar
b374c528b2
Assign streaming_callback to OpenAIGenerator and OpenAIChatGenerator in run() method (#8054)
* Add optional parameter for streaming_callback in run() method
2024-07-24 15:49:19 +02:00
Sebastian Husch Lee
baed478f23
fix: Fix split_start_idx and _split_overlap information in DocumentSplitter (#8046)
* Fix bug in DocumentSplitter and expand tests to catch said bug

* Fix split overlap information calc and actually test it

* Add release notes

* Remove comments

* Same fix in SentenceWindowRetrieval

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-24 15:15:36 +02:00
David S. Batista
0c9dc008f0
fix: improve context relevancy metric (#7964)
* fixing tests

* fixing tests

* updating tests

* updating tests

* updating docstring

* adding release notes

* making the insufficient information more robust

* updating docstring and release notes

* empty list instead of informative string

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* fixing tests

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* reverting commit

* reverting again commit

* fixing docstrings

* removing deprecation warning

* removing warning import

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-22 15:13:46 +02:00
Vladimir Blagojevic
a59de1d7b3
chore: Combined main unblock (#8045)
* Pin structlog to 24.2.0 due to unit test failures

* Remove object init parameter in huggingface_hub unit tests

* Use less restrictive structlog pin

* Add release note
2024-07-19 10:39:10 +02:00
David S. Batista
431aa4a406
updating sentence window retriever tests (#8034)
* updating sentence window retriever tests

* fix
2024-07-16 22:10:55 +02:00
Amna Mubashar
499fbcc59f
Remove Multiplexer and related tests (#8020) 2024-07-16 15:39:40 +02:00
Anushree Bannadabhavi
1f05e633a9
refactor: refactor DocumentJoiner to follow enum pattern for join_mode parameter (#8010)
* refactor document joiner to follow enum pattern for join mode

* Added to_dict and from_dict
2024-07-12 11:29:44 +02:00
Madeesh Kannan
8faa3fa465
Revert "fix: make PyPDF backward compatible (#7996)" (#8014)
This reverts commit 58b48e36eb56a896365133ab4a9d8e327989948c.
2024-07-11 13:06:08 +00:00
Ulises M
6f8834d036
feat: add and expose api_params for OpenAIGenerator in LLMEvaluator based classes (#7987)
* initial support for api_params

* add tests and reno

* resolve suggestions and add integration test

* fix mypy

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-07-11 13:14:03 +02:00
David S. Batista
ebfeb571d7
feat: add sentence window retrieval (#7997)
* initial import

* adding tests

* adding license and release notes

* adding missing release notes

* working with any type of doc store

* nit

* adding get_class_object to serialization package

* nit

* refactoring get_class_object()

* refactoring get_class_object()

* chaning type and var names

* more refactoring

* Update haystack/core/serialization.py

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Update haystack/core/serialization.py

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Update test/core/test_serialization.py

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* more refactoring

* more refactoring

* Pydoc syntax

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-10 13:13:46 +00:00
Sebastian Husch Lee
c121c86c4c
fix: Fix from_dict methods of components using HF models to work with default values (#8003)
* Fix from_dict to work if device isn't provided in init params

* Minor refactoring of from_dict for components that load HF models

* Add tests

* Update tests to test loading with all default parameters

* Add more tests

* Add release notes

* Add unit test for whisper local

* Update reno

* Add fix for ExtractiveReader

* Fix NamedEntityExtractor
2024-07-10 12:18:05 +02:00
tstadel
7e35280d4f
fix: LinkContentFetcher html text encoding (#7975)
* fix: content encoding of LinkContentFetcher

* fix tests

* add reno

* only touch html
2024-07-09 15:28:49 +02:00
Sebastian Husch Lee
583eb8a293
fix: TransformersZeroShotTextRouter and TransformersTextRouter from_dict to work with default value for huggingface_pipeline_kwargs (#8002)
* Fix default value for huggingface_pipeline_kwargs

* Add reno note

* Update HuggingFaceLocalGenerator.from_dict to use the same logic as HuggingFaceLocalChatGenerator.from_dict

* Update tests slightly

* Update release note
2024-07-09 13:32:44 +02:00
Tobias Wochinger
58b48e36eb
fix: make PyPDF backward compatible (#7996)
* fix: make PyPDF backward compatible

* Add release note

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-09 10:08:37 +02:00
Nitanshu Vashistha
cd8a5b98fe
feat: Configure max_retries & timeout for AzureOpenAITextEmbedder (#7993)
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
2024-07-09 09:56:46 +02:00
Nitanshu Vashistha
f9d53c5ca8
feat: Configure max_retries and timeout for AzureOpenAIDocumentEmbedder (#7994)
* feat: Configure max_retries & timeout for AzureOpenAIDocumentEmbedder

max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>

* Update retries-and-timeout-for-AzureOpenAIDocumentEmbedder-006fd84204942e43.yaml

* Update haystack/components/embedders/azure_document_embedder.py

* Update haystack/components/embedders/azure_document_embedder.py

---------

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-07-08 22:35:25 +02:00
Nitanshu Vashistha
376336686b
feat: Configure max_retries and timeout for AzureOpenAIChatGenerator (#7988)
* feat: Configure max_retries & timeout for AzureOpenAIChatGenerator

max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>

* Update haystack/components/generators/chat/azure.py

* Update haystack/components/generators/chat/azure.py

* Update max_retries-for-AzureOpenAIChatGenerator-9e49b4c7bec5c72b.yaml

---------

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-07-08 22:34:51 +02:00
Nitanshu Vashistha
167e886f2c
feat: Configure max_retries & timeout for AzureOpenAIGenerator (#7983)
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
2024-07-08 11:16:26 +02:00
Ulises M
e92a0e4beb
feat: Allow Connection of ChatGenerator to AnswerBuilder (#7897)
* initial implementation

* add support for meta and add ChatMessage tests

* explictly cast types for mypy and update reno

* leave inputs unchanged avoiding side effects

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-07-05 08:21:53 +02:00
Vladimir Blagojevic
0255422eb3
chore: Mark AzureOCRDocumentConverter test_run_with_pdf_file flaky (#7978)
* Disable AzureOCRDocumentConverter test_run_with_pdf_file on osx

* Mark test flaky instead

* Remove import
2024-07-04 16:36:32 +02:00
tstadel
aa46466894
fix: meta from ByteStream input for AzureOCRDocumentConverter (#7955)
* fix: meta from ByteStream input for AzureOCRDocumentConverter

* add test

* add reno

* fix test
2024-07-04 14:42:30 +02:00
Chris Pappalardo
7178aa0253
feat: add custom jinja filter handling to ConditionalRouter (#7957)
* add custom jinja filter handling to ConditionalRouter

* add release notes for custom filters

* align sede to existing patterns and update docstring example

* update sede unit test route condition to be more explicit

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-04 10:08:12 +02:00
Nicola Procopio
cafcf51cb0
Fixed ZeroDivisionError in JoinDocuments (#7972)
* added new strategy DBRF

* fix hook

* fix typos

* added test for DBRF

* fix format

* new release note

* reformatted with black

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* updated comments

* added type-hint and return type

* fix

* revert for lint problems

* fix

* fix

* fix

* fix

* another tentative

* dict out file

* only output

* fix output

* revert

* removed unused imports

* fix typing

* fixed ZeroDivisionError

* added test

* add release note

* removed try - except

* renamed test

* Update test/components/joiners/test_document_joiner.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* fix format error

* removed releasenotes/notes/release-note-9b2bc03a8a398078.yaml

* added comment

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-07-04 10:07:26 +02:00
Nicola Procopio
03d9057e64
Add Distribution based rank fusion mode (#7915)
* added new strategy DBRF

* fix hook

* fix typos

* added test for DBRF

* fix format

* new release note

* reformatted with black

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* updated comments

* added type-hint and return type

* fix

* revert for lint problems

* fix

* fix

* fix

* fix

* another tentative

* dict out file

* only output

* fix output

* revert

* removed unused imports

* fix typing

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-07-03 13:55:17 +02:00
David S. Batista
186512459d
feat: LLM-based evaluators return meta info from OpenAI (#7947)
* LLM-Evaluator returns metadata from OpenAI

* adding tests

* adding release notes

* updating test

* updating release notes

* fixing live tests

* attending PR comments

* fixing tests

* Update releasenotes/notes/adding-metadata-info-from-OpenAI-f5309af5f59bb6a7.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update llm_evaluator.py

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-02 11:31:51 +02:00
Vladimir Blagojevic
3068ea258b
Fix whisper test (#7959) 2024-07-01 10:10:19 +02:00
David S. Batista
91f57015c0
feat : adding split_id and split_overlap to DocumentSplitter (#7933)
* wip: adding _split_overlapp

* fixing join issue for _split_overlap

* adding tests

* adding release notes

* cleaning and fixing tests

* making mypy happy

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* adding docstrings

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-06-27 15:07:43 +02:00
Vladimir Blagojevic
569b2a87cb
feat: Update LocalWhisperTranscriber, add tests (#7935)
* Update LocalWhisperTranscriber, add tests

* Final touches

* Update haystack/components/audio/whisper_local.py

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Fix prev commit

* Relax test for tiny model to work

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-06-27 12:53:41 +02:00
Vladimir Blagojevic
c2ed275a2d
feat: Improve LinkContentFetcher content type handling (#7920)
* LinkContentFetcher: add more default content type handlers

* Update/add unit test

* Add reno note

* Add image content handler

* Update unit test
2024-06-27 11:45:20 +02:00
Vladimir Blagojevic
535a281eec
feat: Add option to use HF_TOKEN as env var for authentication across all HF components (#7942)
* Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components

* Add reno note

* Test fixes

* More test updates

* More test updates
2024-06-27 10:31:58 +02:00
Sebastian Husch Lee
6836079686
chore: Capitalize DOCX in DOCXToDocument converter (#7931)
* Capitalize DOCX in DOCXToDocument converter

* Update docstrings

* Update test class name

* add releease notes
2024-06-27 08:19:01 +02:00
Amna Mubashar
866e6c8fc2
Add the missing parameter for serialization (#7929)
* Add the missing parameter for serialization

* Updated test

---------

Co-authored-by: Amna Mubashar <amna.mubashar@Amnas-MBP.fritz.box>
2024-06-26 11:07:00 +02:00
David S. Batista
8b9eddcd94
fix: explicitly tell ContextRelevanceEvaluator that each statement should be scored (#7904)
* initial import

* adding release notes

* adding pytest decorator for live test

* make examples more readable

* updating tests

* reverting progress_bar = False
2024-06-25 16:59:37 +02:00