3803 Commits

Author SHA1 Message Date
Agnieszka Marzec
1d4883f178
update docstrings (#8117) 2024-07-30 11:10:36 +02:00
Agnieszka Marzec
42f59fc022
update docstrings (#8115) 2024-07-30 11:08:45 +02:00
Daria Fokina
21de1f87d4
docs: clean up docstrings of AnswerBuilder (#8094)
* answerbuilder docstrings

* update the `replies`

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update answer_builder.py

---------

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-07-30 11:06:39 +02:00
Agnieszka Marzec
e8598befb6
Docs: Update OpenAIGen docstrings and add missing headers (#8105)
* update docstrings

* Update haystack/components/generators/openai.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-07-30 11:06:17 +02:00
Daria Fokina
92e2377eff
docs: clean up docstrings of FileTypeRouter (#8098)
* upd filetyperouter docstrings

* Suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* aga's suggestions

---------

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-07-30 08:39:08 +02:00
Agnieszka Marzec
8ce7bedf25
Docs: Update DocSplitter docstrings (#8081)
* update docstrings

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fix article

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-07-29 15:11:12 +02:00
Agnieszka Marzec
abb24c61c2
Docs: Update DocumentEmbedder docstrings (#8112)
* update docstrings

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fix casing

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-07-29 15:10:49 +02:00
Agnieszka Marzec
950c632009
Docs: Update DocumentCleaner docstrings (#8106)
* update docstrings

* Update haystack/components/preprocessors/document_cleaner.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fix article

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-07-29 14:45:15 +02:00
Agnieszka Marzec
da81d10060
Docs: Update DocumentJoiner docstrings (#8109)
* update docstrings

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/document_joiner.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fix typo

* fix typo

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-07-29 14:39:44 +02:00
Corentin Meyer
1c53aae8f0
fix: Tika converter not yielding page break tags (\f) (#8082)
* Fix TikaConverter not having \f page tag by using HTML mode of parsing and then parsing the HTML to text using the old Haystack 1.X integration as template.

* Add Reno

* Fix test by making Mock Tika return XML (before parsing)

* refinements and test

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-07-26 20:13:47 +02:00
Amna Mubashar
e0de423ee0
Rename SentenceWindowRetrieval to SentenceWindowRetriever 2024-07-26 17:46:44 +02:00
Silvano Cerza
3fed1366c4
fix: Fix issue that could lead to RCE if using unsecure Jinja templates (#8095)
* Fix issue that could lead to RCE if using unsecure Jinja templates

* Add comment explaining exception suppression

* Update release note

* Update release note
2024-07-26 14:02:09 +00:00
Nicola Procopio
47f4db8698
added truncate_dim to sentence transformers embedder (#8077)
* added truncate_dim to sentence transformers embedder

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/release-note-2b603a123cd36214.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fixed parameter description

* added test for truncation to text embedder

* fix format

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-26 10:39:48 +02:00
Madeesh Kannan
b2aef217da
chore: Remove deprecated DynamicPromptBuilder and DynamicChatPromptBuilder components (#8085) 2024-07-26 10:00:59 +02:00
Daria Fokina
f372ca443c
bm25 retriever docstrings (#8087) 2024-07-25 17:28:21 +02:00
Agnieszka Marzec
1f58ec20a8
Docs: Standardize and improve SentenceTransformersTextEmbedder docstrings (#8060)
* Update docstrings

* format

* add Daria's comments

* Update haystack/components/embedders/sentence_transformers_text_embedder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/embedders/sentence_transformers_text_embedder.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-07-25 13:56:51 +02:00
Agnieszka Marzec
de728b4877
Docs: Simplify lg + standardize docstrings (#8057)
* Simplify lg + standardize

* Format

* Update formatting

* Fix formatting again

* Fix empty line

* Change formatting

* Format with black

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-25 13:24:42 +02:00
Agnieszka Marzec
855f8e61f3
Docs: Update InMemoryEmbeddingRetriever docstrings (#8068)
* update docstrings

* Update documents to lowercase
2024-07-25 13:24:00 +02:00
Madeesh Kannan
f9e4d5dc58
chore: Deprecate the debug parameter in Pipeline.run (#8075) 2024-07-25 09:58:57 +00:00
Tobias Wochinger
4dde6fbaec
build: unpin structlog (#8071) 2024-07-24 20:58:34 +02:00
Amna Mubashar
b374c528b2
Assign streaming_callback to OpenAIGenerator and OpenAIChatGenerator in run() method (#8054)
* Add optional parameter for streaming_callback in run() method
2024-07-24 15:49:19 +02:00
Sebastian Husch Lee
baed478f23
fix: Fix split_start_idx and _split_overlap information in DocumentSplitter (#8046)
* Fix bug in DocumentSplitter and expand tests to catch said bug

* Fix split overlap information calc and actually test it

* Add release notes

* Remove comments

* Same fix in SentenceWindowRetrieval

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-24 15:15:36 +02:00
Stefano Fiorucci
b36ec0a38c
fix release note (#8070) 2024-07-24 15:03:01 +02:00
Tobias Wochinger
38d38678c7
fix: fix PPTX import (#8069)
* fix: fix PPTX import

* docs: add release notes
2024-07-24 14:50:47 +02:00
Agnieszka Marzec
a022af02bc
Update docstrings (#8066) 2024-07-24 13:54:39 +02:00
Madeesh Kannan
4650263bc3
chore: Remove deprecated init paramters from HTMLToDocument (#8056)
* chore: Remove deprecated init paramters from `HTMLToDocument`

* Fix reno
2024-07-24 13:16:47 +02:00
David S. Batista
0c9dc008f0
fix: improve context relevancy metric (#7964)
* fixing tests

* fixing tests

* updating tests

* updating tests

* updating docstring

* adding release notes

* making the insufficient information more robust

* updating docstring and release notes

* empty list instead of informative string

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* fixing tests

* Update haystack/components/evaluators/context_relevance.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* reverting commit

* reverting again commit

* fixing docstrings

* removing deprecation warning

* removing warning import

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-22 15:13:46 +02:00
Vladimir Blagojevic
a59de1d7b3
chore: Combined main unblock (#8045)
* Pin structlog to 24.2.0 due to unit test failures

* Remove object init parameter in huggingface_hub unit tests

* Use less restrictive structlog pin

* Add release note
2024-07-19 10:39:10 +02:00
Daria Fokina
913078dfaa
docs: add sentence window retrieval to api reference (#8032)
* docs: add sentence window retrieval to api reference

* deprecating multiplexer
2024-07-17 11:16:58 +02:00
Amna Mubashar
3fa6c253c3
fix: Prevent Pipeline.from_dict from modifying the dictionary parameter passed to it (#8030)
* Updated the pipeline deserialization
2024-07-17 10:28:29 +02:00
David S. Batista
431aa4a406
updating sentence window retriever tests (#8034)
* updating sentence window retriever tests

* fix
2024-07-16 22:10:55 +02:00
David S. Batista
3ed69c4aab
docs: adding example to docstring to SentenceWindowRetrieval (#8031)
* adding example to docstring

* small fix

* Update haystack/components/retrievers/sentence_window_retrieval.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/retrievers/sentence_window_retrieval.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* PR comments

* Update haystack/components/retrievers/sentence_window_retrieval.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* PR comments

* PR comments

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-16 16:22:26 +02:00
Amna Mubashar
499fbcc59f
Remove Multiplexer and related tests (#8020) 2024-07-16 15:39:40 +02:00
Silvano Cerza
0411cd938a
Fix bug in Pipeline.run() executing Components in a wrong and unexpected order (#8021)
* Fix bug in Pipeline.run() executing Components in a wrong and unexpected order

* Update haystack/core/pipeline/base.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-07-12 15:30:10 +00:00
Madeesh Kannan
94b806815c
refactor: Improve error messages shown during pipeline deserialization (#8016)
* refactor: Improve error messages shown during pipeline deserialization

* Add link to release notes

* Update release notes link
2024-07-12 14:47:00 +00:00
Anushree Bannadabhavi
1f05e633a9
refactor: refactor DocumentJoiner to follow enum pattern for join_mode parameter (#8010)
* refactor document joiner to follow enum pattern for join mode

* Added to_dict and from_dict
2024-07-12 11:29:44 +02:00
Silvano Cerza
0cec82e55e
refactor: Pipeline.run() (#8019)
* Move utility functions from _enqueue_next_runnable_component (#7895)

* Isolate logic to check if we're stuck in a loop

* Simplify for else

* Add missing return in docstring

* Emit warning when stuck in a loop

* Fix docstring

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Add utility function to move Components in queues

* Add function to find next Component to run

* Comment update

* Add missing break in loop

* Make _add_missing_input_defaults less error prone and add tests

* Fix tests

* Update docstring

* Simplify enqueue logic

* Remove unused _enqueue_next_runnable_component function

* Add method to find Component with lazy variadic input or all inputs with defaults

* Simplify _find_next_runnable_lazy_variadic_or_default_component

* Remove unnecessary type ignore

* Split _dequeue_components_that_received_no_input into separate functions

* Fix linting

* Simplify variadic check when running Component

* Simplify code

* Reorganize functions used by Pipeline.run

* Rename variables used in Pipeline.run() for clarity

* Add comment clarifying last_waiting_queue and before_last_waiting_queue

* Add functions to easily update waiting_queue

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-07-12 08:35:23 +00:00
David S. Batista
d02356fe7a
chore: normalise the use of importlib in getting an object from a qualified name string across the codebase (#8012)
* initial import

* cleaning up

* removing unused imports
2024-07-11 16:14:00 +02:00
Madeesh Kannan
8faa3fa465
Revert "fix: make PyPDF backward compatible (#7996)" (#8014)
This reverts commit 58b48e36eb56a896365133ab4a9d8e327989948c.
2024-07-11 13:06:08 +00:00
Ulises M
6f8834d036
feat: add and expose api_params for OpenAIGenerator in LLMEvaluator based classes (#7987)
* initial support for api_params

* add tests and reno

* resolve suggestions and add integration test

* fix mypy

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-07-11 13:14:03 +02:00
David S. Batista
ebfeb571d7
feat: add sentence window retrieval (#7997)
* initial import

* adding tests

* adding license and release notes

* adding missing release notes

* working with any type of doc store

* nit

* adding get_class_object to serialization package

* nit

* refactoring get_class_object()

* refactoring get_class_object()

* chaning type and var names

* more refactoring

* Update haystack/core/serialization.py

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Update haystack/core/serialization.py

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* Update test/core/test_serialization.py

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* more refactoring

* more refactoring

* Pydoc syntax

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-10 13:13:46 +00:00
Sebastian Husch Lee
c121c86c4c
fix: Fix from_dict methods of components using HF models to work with default values (#8003)
* Fix from_dict to work if device isn't provided in init params

* Minor refactoring of from_dict for components that load HF models

* Add tests

* Update tests to test loading with all default parameters

* Add more tests

* Add release notes

* Add unit test for whisper local

* Update reno

* Add fix for ExtractiveReader

* Fix NamedEntityExtractor
2024-07-10 12:18:05 +02:00
Madeesh Kannan
f19131f13a
chore: Deprecate legacy document/metadata filters (#8004) 2024-07-09 16:18:38 +02:00
tstadel
7e35280d4f
fix: LinkContentFetcher html text encoding (#7975)
* fix: content encoding of LinkContentFetcher

* fix tests

* add reno

* only touch html
2024-07-09 15:28:49 +02:00
Sebastian Husch Lee
583eb8a293
fix: TransformersZeroShotTextRouter and TransformersTextRouter from_dict to work with default value for huggingface_pipeline_kwargs (#8002)
* Fix default value for huggingface_pipeline_kwargs

* Add reno note

* Update HuggingFaceLocalGenerator.from_dict to use the same logic as HuggingFaceLocalChatGenerator.from_dict

* Update tests slightly

* Update release note
2024-07-09 13:32:44 +02:00
Tobias Wochinger
58b48e36eb
fix: make PyPDF backward compatible (#7996)
* fix: make PyPDF backward compatible

* Add release note

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-09 10:08:37 +02:00
Nitanshu Vashistha
cd8a5b98fe
feat: Configure max_retries & timeout for AzureOpenAITextEmbedder (#7993)
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
2024-07-09 09:56:46 +02:00
Nitanshu Vashistha
f9d53c5ca8
feat: Configure max_retries and timeout for AzureOpenAIDocumentEmbedder (#7994)
* feat: Configure max_retries & timeout for AzureOpenAIDocumentEmbedder

max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>

* Update retries-and-timeout-for-AzureOpenAIDocumentEmbedder-006fd84204942e43.yaml

* Update haystack/components/embedders/azure_document_embedder.py

* Update haystack/components/embedders/azure_document_embedder.py

---------

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-07-08 22:35:25 +02:00
Nitanshu Vashistha
376336686b
feat: Configure max_retries and timeout for AzureOpenAIChatGenerator (#7988)
* feat: Configure max_retries & timeout for AzureOpenAIChatGenerator

max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>

* Update haystack/components/generators/chat/azure.py

* Update haystack/components/generators/chat/azure.py

* Update max_retries-for-AzureOpenAIChatGenerator-9e49b4c7bec5c72b.yaml

---------

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-07-08 22:34:51 +02:00
Haystack Bot
d7a7d9c1fb
Update unstable version to 2.4.0-rc0 (#7992)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-07-08 14:32:56 +02:00