3645 Commits

Author SHA1 Message Date
Silvano Cerza
907ee04c58
Remove version choice for workflow dispatch in minor_version_release.yml (#8362) 2024-09-12 16:41:00 +02:00
Giovanni Alzetta, PhD
4106e7e8d1
feat : DocumentSplitter, adding the option to split_by function (#8336)
* Adding splitting function

* Adding test for split by function

* Adding release note for feat adding split by function

* Fixing release note for split_by_function

* Fixing issue with splitting_function non callable

* nit: fixing value error in documentsplitter for split_by

* Add custom serde

---------

Co-authored-by: Giovanni Alzetta <giovannialzetta@gmail.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-09-12 16:38:37 +02:00
Vladimir Blagojevic
7e9f153e78
chore: Remove all references to old filter syntax (#8342)
* Remove all references to old filter syntax

* More removals

* Lint

* Do not remove test_filter_retriever.py

* Add reno note

* Update ValueError text to match text in haystack-core-integrations
2024-09-12 16:28:31 +02:00
Madeesh Kannan
672bcf7e03
fix: Add constraints to set_input_type(s) based on run method (#8358)
* fix: Prevent the usage of `set_input_type(s)` when the `run` method doesn't have kwargs,
raise if `set_input_type(s)` overrides `run` method parameters

* fix: update components and tests

* reno
2024-09-12 15:58:16 +02:00
Tuana Çelik
349615b291
Update whisper_local.py & whisper.py (#8359) 2024-09-12 14:50:01 +02:00
Silvano Cerza
5514676b5e
feat: Deprecate max_loops_allowed in favour of new argument max_runs_per_component (#8354)
* Deprecate max_loops_allowed in favour of new argument max_runs_per_component

* Add missing test file

* Some enhancements

* Add version that will remove deprecate stuff
2024-09-12 11:00:12 +02:00
Mo Sriha
3016c5ca93
update release note (#8346) 2024-09-11 08:56:59 -05:00
Sebastian Husch Lee
7227bcf9df
feat: TransformerSimilarityRanker add batching across Documents during inference (#8344)
* First pass at adding batch support to TransformersSimilarityRanker

* Add test

* Add reno
2024-09-11 12:47:29 +02:00
Tuana Çelik
675cf43be7
Update sentence_window_retriever.py (#8332)
* Update sentence_window_retriever.py

* Update haystack/components/retrievers/sentence_window_retriever.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-09-10 17:42:28 +00:00
Silvano Cerza
7cedf7e894
Remove unused Slack notification on PyPi release (#8351) 2024-09-10 15:46:41 +02:00
Stefano Fiorucci
69ab8e4de9
fix: fix Pipeline rendering by replacing * with &ast; (#8349)
* replace * with &ast;

* reno
2024-09-10 15:23:45 +02:00
Silvano Cerza
4d67b552e1
Fix Pipeline skipping a Component with Variadic input (#8347)
* Fix Pipeline skipping a Component with Variadic input

* Simplify _find_components_that_will_receive_no_input
2024-09-10 14:59:53 +02:00
Ulises M
145ca89a3f
feat: Expose default_headers and add kwargs for Azure Client (#8244)
* default_headers and azure_kwargs added

* update docstrings

* dont forget about chat generator

* Remove azure_kwargs argument

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-09-10 10:29:56 +00:00
jpatra72
b126c14e51
feat: Adds support for zero-shot document classification (#7669) (#8193)
* feat: adds support for zero short document classification (#7669)

Also, supports multi-label classification

* pytests for zero shot document classification

* release note

* added licence info to py scripts

* updated the format of licence info

* Added doc string and example code

* added review points highlighted in the PR

* feat: adds support for zero short document classification (#7669)

Also, supports multi-label classification

* pytests for zero shot document classification

* release note

* added licence info to py scripts

* updated the format of licence info

* Added doc string and example code

* added review points highlighted in the PR

* Applied suggestions from doc string review

Co-authored-by: Daria Fokina <daria.f93@gmail.com>

* fixed pytest for init

* added output type

* added test for pipeline (de-) serialization

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2024-09-10 11:00:05 +02:00
Silvano Cerza
da49e782e2
chore: Make arrow an optional dependency (#8345)
* Make arrow an optional dependency

* Fix imports
2024-09-09 16:09:51 +02:00
ArzelaAscoIi
720e54970f
fix: make from dict conditional router more resilient (#8343)
* fix: make from dict conditional router more resilient

* refactor: remove

* dos: add release notes

* fix: format
2024-09-09 15:11:52 +02:00
Mo Sriha
75955922b9
feat: Add current date in UTC to PromptBuilder (#8233)
* initial commit

* add unit tests

* add release notes

* update function name
2024-09-09 09:47:03 +02:00
Bilge Yücel
e31b3edda1
Add studio to the readme (#8321)
* Add studio to the readme

* Update README.md
2024-09-06 12:23:38 +01:00
Sebastian Husch Lee
06dd5c2f37
feat (v2): Update so model_max_length updates max_seq_length for Sentence Transformers (#8334)
* Update so model_max_length does what is expected

* Add release notes

* Some fixes

* Another test
2024-09-06 11:37:56 +02:00
Sriniketh J
e98a6fea04
Convertor: CSVToDocument (#8328)
* carry forwarded initial commit

* fix: doc strings

* fix: update docstrings

* fix: docstring update

* fix: csv encoding in actions

* fix: line endings through hooks

* fix: converter docs addition
2024-09-06 10:59:12 +02:00
Daria Fokina
a292f0a24e
broken formatting (#8325) 2024-09-04 18:05:56 +02:00
Silvano Cerza
a34869da3f
ci: Fix docstrings linting workflow never running (#8327)
* Fix docstrings linting workflow never running

* Trigger file

* Remove trigger file

* Remove unnecessary trigger
2024-09-04 17:57:38 +02:00
Silvano Cerza
314a6396d3
Add step to verify release notes files are correctly formatted (#8323)
* Add step to verify release notes files are correctly formatted

* Fake release note

* Trigger file

* Fix step not running when it should

* Fix release notes error

* Remove trigger files
2024-09-04 17:37:32 +02:00
David S. Batista
1f3cb68d9f
fix: meta prefix missing in the sentence window retriever filters (#8309)
* initial import

* listing supported doc stores in docstring

* adding release notes
2024-09-03 10:57:11 +02:00
Vladimir Blagojevic
b2c19a8c7a
feat: ChatPromptBuilder copies entire ChatMessage rather than copying content field only (#8317)
* Initial implementation of ChatMessage copy and deepcopy

* Add reno release note

* Satisfy hawkeye

* Remove copy and deepcopy, no need to complicate things

* Add new reno note

* Add unit test
2024-09-02 18:06:38 +02:00
Haystack Bot
9c1ad8e8ea
Update unstable version to 2.6.0-rc0 (#8318)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-02 16:44:50 +02:00
Silvano Cerza
3e3f79b928
feat: Add unsafe init arg in ConditionalRouter and OutputAdapter to enable previous behaviour (#8176)
* Add unsafe behaviour to OutputAdapter

* Add unsafe behaviour to ConditionalRouter

* Add release notes

* Fix mypy

* Add documentation links

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
v2.6.0-rc0
2024-09-02 14:14:54 +00:00
Alper
e614fa0c62
refactor: Rename deserialize_document_store_in_init_parameters (#8302)
* 8259

* update function name

* rename and update docstring

* fix linting

* add a release note
2024-09-02 11:42:23 +02:00
Alper
7dbc51a3e7
doc: warning added for deprectaion of gpt-3.5 as default model for OpenAI generators (#8300)
* warning added for gpt3.5 usage

* Revert "warning added for gpt3.5 usage"

This reverts commit 035a0ab9eaa9306171439fe128a78b7898ffe486.

* update openaigenerator and openaichatgenerator with warnings

* if cond removed

* update description

* adding release notes

* linting

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-08-29 09:31:59 +02:00
Julian Risch
51180e060e
chore: Remove emojis from release notes config (#8305) 2024-08-28 16:14:06 +02:00
Stefano Fiorucci
842a7b80a8
rm sentence_window_retrieval (#8303) 2024-08-28 10:51:07 +02:00
David S. Batista
2f3257b77a
chore: removing deprecated SentenceWindowRetrieval (#8294)
* removing deprecated SentenceWindowRetrieval

* adding release notes

* Rename TestSentenceWindowRetrieval to TestSentenceWindowRetriever

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-08-28 10:04:52 +02:00
Stefano Fiorucci
25d333bed3
update transformers (#8296) 2024-08-27 16:04:11 +00:00
Stefano Fiorucci
6b0ee4c193
chore: update test dependency and LazyImport block to make compatibility with sentence-transformers>=3.0.0 explicit (#8295)
* sentence-transformers-3 update test dep and lazyimport block

* clearer release note
2024-08-27 15:51:03 +00:00
Madeesh Kannan
f0b45c873f
feat: Extend core component machinery to support an async run method (experimental) (#8279)
* feat: Extend core component machinery to support an async run method

* Add reno

* Fix incorrect docstring

* Make `async_run` a coroutine

* Make `supports_async` a dunder field
2024-08-27 14:20:13 +02:00
Madeesh Kannan
1fa30d4aaa
chore: Remove deprecated debug param from Pipeline.run (#8288)
* chore: Remove deprecated `debug` param from `Pipeline.run`

* Fix tests
2024-08-27 11:27:38 +02:00
David S. Batista
b411c14414
feat: The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window (#8283)
* initial import

* adding release notes

* linting

* improving docs and release notes

* updating example
2024-08-27 10:30:12 +02:00
dependabot[bot]
83e9542a62
chore(deps): bump fossas/fossa-action from 1.3.3 to 1.4.0 (#8167)
Bumps [fossas/fossa-action](https://github.com/fossas/fossa-action) from 1.3.3 to 1.4.0.
- [Release notes](https://github.com/fossas/fossa-action/releases)
- [Commits](https://github.com/fossas/fossa-action/compare/v1.3.3...v1.4.0)

---
updated-dependencies:
- dependency-name: fossas/fossa-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-23 17:55:33 +02:00
David S. Batista
acfe28b5ed
docs: updating DocumentSplitter docstring, adding supported DocumentSores (#8270)
* initial import

* adding Chroma with limited support

* updating

* Update document_splitter.py

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update document_splitter.py

* linting

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-23 14:24:08 +00:00
Souf G
3163fbb835
fix discord link in README.md (#8274)
* fix discord link in README.md

* Update README.md

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-23 10:29:56 +02:00
Stefano Fiorucci
2e619f06c8
fix: make meta produced by DOCXToDocument JSON serializable (#8263)
* make meta from DOCXToDocument JSON serializable

* unused import

* update docstrings
2024-08-22 12:24:32 +00:00
dependabot[bot]
0a1a64cb0c
build(deps): bump tj-actions/changed-files from 44 to 45 (#8269)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44 to 45.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/changed-files/compare/v44...v45)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-21 17:00:17 +02:00
Stefano Fiorucci
aca8f09f7d
fix: DOCXToDocument converter - use forward reference to Paragraph (#8260)
* docx paragraph forward ref

* fix
2024-08-21 12:37:43 +02:00
Jon Strutz
471f07c8fe
fix: extract page breaks from .docx files (#8232)
* fix: extract page breaks from .docx files

Context: Currently, DOCXToDocument does not extract page breaks from
word documents. This makes it impossible to do things like split by page
or get correct page number metadata after using something like
DocumentSplitter. For example, if you split by word, the 'page_number'
metadata field will be 1 for all documents.

Solution: Added a method to DOCXToDocument that extracts page breaks
from word documents as '\f' characters so that they are recognized by
DocumentSplitter.

Caveat: Due to the way the python-docx library is set up, you can only
accurately determine the location of the first page break for a given
paragraph. In the rare case that a paragraph contains more than one page
break (which means it is an extremely long paragraph spanning multiple
pages), the 2nd, 3rd, etc. page break locations are not known. To sort
of fix this, I just appended the page break characters to the end of
the paragraph text to keep the overall page number values for the
document consistent.

* Apply suggestions from code review

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-08-21 09:48:02 +00:00
Sebastian Husch Lee
7fd0b6a013
feat: Add min_top_k to TopPSampler (#8228)
* Add feature to Top P Sampler

* Add release notes

* Fix zip call

* Fix mypy

* Restore doc string and make mypy happy hopefully

* Make mypy happy

* PR comment

* Revert change to make mypy happy

* Add back type ignore

* try to fix typing

* Update haystack/components/samplers/top_p.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/samplers/top_p.py

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-08-21 11:29:23 +02:00
Daria Fokina
35b1215b00
clean up docstrings: WhisperTranscribers (#8235)
* clarify docstrings

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

---------

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-16 11:28:42 +00:00
Daria Fokina
bbe18cfdaf
clean up docstrings: DocumentLanguageClassifier (#8215)
* doclangclass-strings

* simplify sentence

* simplify sentence 2

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

---------

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2024-08-16 12:45:54 +02:00
Daria Fokina
4a058032e7
clean up docstrings: TransformersTextRouter (#8229)
* Update transformers_text_router.py

* article

* article 2
2024-08-16 12:44:39 +02:00
Daria Fokina
b51bb6e5a9
Update zero_shot_text_router.py (#8231) 2024-08-16 12:43:13 +02:00
Daria Fokina
b5d0bfa9df
Update cache_checker.py (#8237) 2024-08-16 12:22:09 +02:00