3803 Commits

Author SHA1 Message Date
Amna Mubashar
a82586808b Update from_string method 2025-01-09 15:48:34 +01:00
Amna Mubashar
94f9fd1e40 Small fixes 2025-01-09 12:39:07 +01:00
Amna Mubashar
9779bbdbe6 Create custom environment to render haystack dataclasses 2025-01-09 12:13:53 +01:00
Sebastian Husch Lee
28ad78c73d
feat: Add XLSXToDocument converter (#8522)
* Add draft of the Excel To Document converter

* Add license header

* Add release note

* Use Union instead of pipe

* Add openpyxl as additional dep

* Fix zip issue

* few updates from Bijay

* Update deps

* Add markdown test

* Adding more example excels and expanding tests

* Added more tests

* Fix windows test by setting lineterminator

* Addressing PR comments

* PR comments

* Fix linting
2025-01-09 09:03:19 +01:00
Stefano Fiorucci
bc30105fbc
test: reorganize docstore test suite to isolate dataframe tests (#8684)
* reorganize docstore test suite to isolate dataframe tests

* improve docstring

* include FilterDocumentsTestWithDataframe in InMemoryDocumentStore tests
2025-01-08 14:58:52 +00:00
Stefano Fiorucci
5539f6c33f
refactor: improve serialization/deserialization of callables (to handle class methods and static methods) (#8683)
* progress

* refinements

* tidy up

* release note
2025-01-08 11:28:00 +01:00
tstadel
e6059e632e
fix: truncate ByteStream string representation (#8673)
* fix: truncate ByteStream string representation

* add reno

* better reno

* add test

* Update test_byte_stream.py

* apply feedback

* update reno
2025-01-07 19:00:52 +01:00
Bohan Qu
8e3f64717f
feat: use importlib when deserializing callables (#8648) 2025-01-03 15:06:58 +01:00
Stefano Fiorucci
7b4d9ba86e
feat: introduce class method to create ChatMessage from the OpenAI dictionary format (#8670)
* add ChatMessage.from_openai_dict_format

* remove print

* release note

* improve docstring

* separate validation logic

* rm obvious comment
2025-01-02 10:34:41 +00:00
Stefano Fiorucci
3ea128c962
OpenAITextEmbedder - remove unused constants (#8669) 2024-12-21 09:46:30 +01:00
Stefano Fiorucci
99e7e343b2
chore: update links to chatmessage docs (#8667) 2024-12-20 15:33:27 +01:00
Stefano Fiorucci
188b2a7f06
feat: support for tools in OpenAIChatGenerator (#8666)
* move chatmsg>openai conversion to chatmsg dataclass

* implementation and tests cleanup

* release note

* try fixing azure chat generator

* add serde test for toolinvoker

* small fix
2024-12-20 14:20:54 +00:00
Stefano Fiorucci
7dcbf25bd7
feat: add Tool Invoker component (#8664)
* port toolinvoker

* release note
2024-12-20 14:02:42 +01:00
Michele Pangrazzi
c192488bf6
Named entity extractor private models (#8658)
* add 'token' support to NamedEntityExtractor to enable using private models on HF backend

* fix existing error message format

* add release note

* add HF_API_TOKEN to e2e workflow

* add informative comment

* Updated to_dict / from_dict to handle 'token' correctly ; Added tests

* Fix lint

* Revert unwanted change
2024-12-20 11:15:55 +01:00
Sebastian Husch Lee
286061f005
fix: Move potential nltk download to warm_up (#8646)
* Move potential nltk download to warm_up

* Update tests

* Add release notes

* Fix tests

* Uncomment

* Make mypy happy

* Add RuntimeError message

* Update release notes

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-12-20 10:41:44 +01:00
Stefano Fiorucci
f4d9c2bb91
fix: Make the HuggingFaceLocalChatGenerator compatible with the new ChatMessage; serialize chat_template (#8663)
* message conversion function

* hfapi w tools

* right test file + hf_hub version

* release note

* fix for new chatmessage; serialize chat_template

* feedback
2024-12-19 15:12:12 +01:00
Stefano Fiorucci
2bc58d2987
feat: support for tools in HuggingFaceAPIChatGenerator (#8661)
* message conversion function

* hfapi w tools

* right test file + hf_hub version

* release note

* feedback
2024-12-19 15:04:37 +01:00
David S. Batista
c306bee665
fix: adding missing abbreviations files for SentenceSplitter (#8660)
* adding missing abbreviations files for SentenceSplitter

* fixing tests path
2024-12-19 11:08:29 +01:00
Tobias Wochinger
91619a79c1
fix: fix deserialization issues in multi-threading environments (#8651) 2024-12-18 21:34:57 +01:00
Stefano Fiorucci
96b4a1d2fd
feat: Tool dataclass - unified abstraction to represent tools (#8652)
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* port Tool from experimental

* release note

* docs upd

* Update tool.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-18 11:36:44 +00:00
Stefano Fiorucci
ea3602643a
feat!: new ChatMessage (#8640)
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* Update chat_message.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-17 17:02:04 +01:00
David S. Batista
a5b57f4b1f
adding SentenceSplitter to init imports (#8644) 2024-12-16 13:57:41 +01:00
David S. Batista
db89b9a2e5
fix: removing unused import (#8636) 2024-12-13 12:35:58 +01:00
David S. Batista
176db5dbf9
initial import (#8635) 2024-12-13 12:12:40 +01:00
Stefano Fiorucci
f2b5f123b3
del HF token in tests (#8634) 2024-12-13 09:50:23 +01:00
Stefano Fiorucci
2a9a6401d2
chore: pin openai>=1.56.1 (#8632)
* pin openai>=1.56.1

* release note
2024-12-12 16:26:38 +01:00
David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter (#8617)
* wip: initial import

* wip: refactoring

* wip: refactoring tests

* wip: refactoring tests

* making all NLTKSplitter related tests work

* refactoring

* docstrings

* refactoring and removing NLTKDocumentSplitter

* fixing tests for custom sentence tokenizer

* fixing tests for custom sentence tokenizer

* cleaning up

* adding release notes

* reverting some changes

* cleaning up tests

* fixing serialisation and adding tests

* cleaning up

* wip

* renaming and cleaning

* adding NLTK files

* updating docstring

* adding import to init

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* updating tests

* wip

* adding sentence/period change warning

* fixing LICENSE header

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
David S. Batista
6cceaac15f
docs: add deprecation warning nltk document splitter (#8628)
* adding deprecation warning

* adding release notes

* adding release notes

* updating message

* Update haystack/components/preprocessors/nltk_document_splitter.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-12 15:16:54 +01:00
Stefano Fiorucci
04fc187bc4
chore: remove deprecation warnings related to store_full_path (#8626)
* remove deprecation warnings related to store_full_path

* unused imports
2024-12-12 09:27:19 +01:00
Michele Pangrazzi
21d53d0ec6
update default value of 'store_full_path' to False in converters (#8619) 2024-12-10 16:03:38 +01:00
dependabot[bot]
c78eb9be4e
build(deps): bump readmeio/rdme from 8 to 9 (#8615)
Bumps [readmeio/rdme](https://github.com/readmeio/rdme) from 8 to 9.
- [Release notes](https://github.com/readmeio/rdme/releases)
- [Changelog](https://github.com/readmeio/rdme/blob/next/CHANGELOG.md)
- [Commits](https://github.com/readmeio/rdme/compare/v8...v9)

---
updated-dependencies:
- dependency-name: readmeio/rdme
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-10 13:22:08 +01:00
David S. Batista
248dccbdd3
chore: fixing pylint issues (#8610)
* initial import

* fixing internal methods

* fixing some internal methods

* modify _preprocess

* fixed internal methods

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-12-09 16:53:37 +00:00
Anton Pelykh
6f983a22ca
fix: add missing stream mime type assignment to the LinkContentFetcher (#8596)
* add missing stream mime type assignment to the `LinkContentFetcher`

* fix release note fmt

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-09 14:51:14 +00:00
Stefano Fiorucci
09adf856dc
rm openapi spec util (#8613) 2024-12-09 10:59:21 +01:00
ArzelaAscoIi
ed2f37da60
fix: docstring for normalization (#8604)
* fix: docstring for normalization

* chore: add reno

* fixing docstrings and adding pylint disable too many args

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-12-06 17:13:30 +01:00
Michele Pangrazzi
b32f85cca2
remove deprecated 'converter' init parameter from PyPDFToDocument component (#8609) 2024-12-06 15:43:43 +01:00
David S. Batista
3da5bac8c4
refactor: converting some DocumentJoiner methods to staticmethod (#8606)
* converting some methods to static, since they change/depend on state of the object

* adding release notes

* removing tab
2024-12-06 10:28:41 +01:00
David S. Batista
e349a7f2fc
docs: complete docstring for DocumentJoiner code example (#8593)
* initial import

* changing a method to static

* reverting staticmethod
2024-12-05 14:04:34 +00:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever returns List[Document] with docs ordered by split_idx_start (#8590)
* initial import

* adding a few pylint disable

* adding tests

* fixing integration tests

* adding release notes

* fixing types and docstrings
2024-12-04 16:55:56 +01:00
David S. Batista
f0638b2868
refactor: moving SentenceSplitter outside NLTKDocumentSplitter (#8599)
* initial import

* fixing imports and renaming file

* fixing imports path

* adding condition to check NLTK successfully imported

* adding one class inside the NLTK imported condition
2024-12-04 10:44:36 +01:00
David S. Batista
c5ef0b2956
chore: adding a deprecation warning on the SentenceWindowRetriever (#8597)
* linting

* improving message

* fixing header

* adding deprecation in the release notes
2024-12-03 17:41:19 +01:00
Julian Risch
41369b9e0a
chore: Mention breaking changes in PR template (#8602) 2024-12-03 17:18:48 +01:00
Amna Mubashar
4c8eb54049
feat: Add store_full_path to converters (3/3) (#8585)
* Add store_full_path params
2024-12-03 13:48:56 +05:00
Stefano Fiorucci
de7099e560
ci: add job to check imports (#8594)
* try checking imports

* clarify error message

* better fmt

* do not show complete list of successfully imported packages

* refinements

* relnote

* add missing forward references

* better function name

* linting

* fix linting

* Update .github/utils/check_imports.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Madeesh Kannan
163c06f3d6
chore: Revert change to deserialization error in Pipeline (#8591) 2024-11-28 13:28:52 +01:00
Stefano Fiorucci
c8685aa141
refactor: update components to access ChatMessage.text instead of content (#8589)
* introduce text property and deprecate content

* release note

* use chatmessage.text

* release note

* linting
2024-11-28 10:16:07 +00:00
Stefano Fiorucci
fb1baf4921
refactor: ChatMessage - introduce text property and deprecate content (#8588)
* introduce text property and deprecate content

* release note

* minor test refactoring

---------

Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com>
2024-11-28 09:53:02 +00:00
Stefano Fiorucci
51c1390426
chore: use class methods to create ChatMessage (#8581)
* use class methods to build messages

* fix failing format
2024-11-28 09:35:24 +00:00
Silvano Cerza
473f7bef11 Change Pipeline.from_dict error message 2024-11-28 10:15:06 +01:00
Stefano Fiorucci
fb42c035c5
feat: PyPDFToDocument - add new customization parameters (#8574)
* deprecat converter in pypdf

* fix linting of MetaFieldGroupingRanker

* linting

* pypdftodocument: add customization params

* fix mypy

* incorporate feedback
2024-11-26 16:37:59 +01:00