Amna Mubashar
a82586808b
Update from_string method
2025-01-09 15:48:34 +01:00
Amna Mubashar
94f9fd1e40
Small fixes
2025-01-09 12:39:07 +01:00
Amna Mubashar
9779bbdbe6
Create custom environment to render haystack dataclasses
2025-01-09 12:13:53 +01:00
Sebastian Husch Lee
28ad78c73d
feat: Add XLSXToDocument converter ( #8522 )
...
* Add draft of the Excel To Document converter
* Add license header
* Add release note
* Use Union instead of pipe
* Add openpyxl as additional dep
* Fix zip issue
* few updates from Bijay
* Update deps
* Add markdown test
* Adding more example excels and expanding tests
* Added more tests
* Fix windows test by setting lineterminator
* Addressing PR comments
* PR comments
* Fix linting
2025-01-09 09:03:19 +01:00
Stefano Fiorucci
bc30105fbc
test: reorganize docstore test suite to isolate dataframe tests ( #8684 )
...
* reorganize docstore test suite to isolate dataframe tests
* improve docstring
* include FilterDocumentsTestWithDataframe in InMemoryDocumentStore tests
2025-01-08 14:58:52 +00:00
Stefano Fiorucci
5539f6c33f
refactor: improve serialization/deserialization of callables (to handle class methods and static methods) ( #8683 )
...
* progress
* refinements
* tidy up
* release note
2025-01-08 11:28:00 +01:00
tstadel
e6059e632e
fix: truncate ByteStream string representation ( #8673 )
...
* fix: truncate ByteStream string representation
* add reno
* better reno
* add test
* Update test_byte_stream.py
* apply feedback
* update reno
2025-01-07 19:00:52 +01:00
Bohan Qu
8e3f64717f
feat: use importlib when deserializing callables ( #8648 )
2025-01-03 15:06:58 +01:00
Stefano Fiorucci
7b4d9ba86e
feat: introduce class method to create ChatMessage
from the OpenAI dictionary format ( #8670 )
...
* add ChatMessage.from_openai_dict_format
* remove print
* release note
* improve docstring
* separate validation logic
* rm obvious comment
2025-01-02 10:34:41 +00:00
Stefano Fiorucci
3ea128c962
OpenAITextEmbedder - remove unused constants ( #8669 )
2024-12-21 09:46:30 +01:00
Stefano Fiorucci
99e7e343b2
chore: update links to chatmessage docs ( #8667 )
2024-12-20 15:33:27 +01:00
Stefano Fiorucci
188b2a7f06
feat: support for tools in OpenAIChatGenerator
( #8666 )
...
* move chatmsg>openai conversion to chatmsg dataclass
* implementation and tests cleanup
* release note
* try fixing azure chat generator
* add serde test for toolinvoker
* small fix
2024-12-20 14:20:54 +00:00
Stefano Fiorucci
7dcbf25bd7
feat: add Tool Invoker component ( #8664 )
...
* port toolinvoker
* release note
2024-12-20 14:02:42 +01:00
Michele Pangrazzi
c192488bf6
Named entity extractor private models ( #8658 )
...
* add 'token' support to NamedEntityExtractor to enable using private models on HF backend
* fix existing error message format
* add release note
* add HF_API_TOKEN to e2e workflow
* add informative comment
* Updated to_dict / from_dict to handle 'token' correctly ; Added tests
* Fix lint
* Revert unwanted change
2024-12-20 11:15:55 +01:00
Sebastian Husch Lee
286061f005
fix: Move potential nltk download to warm_up ( #8646 )
...
* Move potential nltk download to warm_up
* Update tests
* Add release notes
* Fix tests
* Uncomment
* Make mypy happy
* Add RuntimeError message
* Update release notes
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-12-20 10:41:44 +01:00
Stefano Fiorucci
f4d9c2bb91
fix: Make the HuggingFaceLocalChatGenerator
compatible with the new ChatMessage
; serialize chat_template
( #8663 )
...
* message conversion function
* hfapi w tools
* right test file + hf_hub version
* release note
* fix for new chatmessage; serialize chat_template
* feedback
2024-12-19 15:12:12 +01:00
Stefano Fiorucci
2bc58d2987
feat: support for tools in HuggingFaceAPIChatGenerator
( #8661 )
...
* message conversion function
* hfapi w tools
* right test file + hf_hub version
* release note
* feedback
2024-12-19 15:04:37 +01:00
David S. Batista
c306bee665
fix: adding missing abbreviations files for SentenceSplitter ( #8660 )
...
* adding missing abbreviations files for SentenceSplitter
* fixing tests path
2024-12-19 11:08:29 +01:00
Tobias Wochinger
91619a79c1
fix: fix deserialization issues in multi-threading environments ( #8651 )
2024-12-18 21:34:57 +01:00
Stefano Fiorucci
96b4a1d2fd
feat: Tool
dataclass - unified abstraction to represent tools ( #8652 )
...
* draft
* del HF token in tests
* adaptations
* progress
* fix type
* import sorting
* more control on deserialization
* release note
* improvements
* support name field
* fix chatpromptbuilder test
* port Tool from experimental
* release note
* docs upd
* Update tool.py
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-18 11:36:44 +00:00
Stefano Fiorucci
ea3602643a
feat!: new ChatMessage
( #8640 )
...
* draft
* del HF token in tests
* adaptations
* progress
* fix type
* import sorting
* more control on deserialization
* release note
* improvements
* support name field
* fix chatpromptbuilder test
* Update chat_message.py
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-17 17:02:04 +01:00
David S. Batista
a5b57f4b1f
adding SentenceSplitter to init imports ( #8644 )
2024-12-16 13:57:41 +01:00
David S. Batista
db89b9a2e5
fix: removing unused import ( #8636 )
2024-12-13 12:35:58 +01:00
David S. Batista
176db5dbf9
initial import ( #8635 )
2024-12-13 12:12:40 +01:00
Stefano Fiorucci
f2b5f123b3
del HF token in tests ( #8634 )
2024-12-13 09:50:23 +01:00
Stefano Fiorucci
2a9a6401d2
chore: pin openai>=1.56.1
( #8632 )
...
* pin openai>=1.56.1
* release note
2024-12-12 16:26:38 +01:00
David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter ( #8617 )
...
* wip: initial import
* wip: refactoring
* wip: refactoring tests
* wip: refactoring tests
* making all NLTKSplitter related tests work
* refactoring
* docstrings
* refactoring and removing NLTKDocumentSplitter
* fixing tests for custom sentence tokenizer
* fixing tests for custom sentence tokenizer
* cleaning up
* adding release notes
* reverting some changes
* cleaning up tests
* fixing serialisation and adding tests
* cleaning up
* wip
* renaming and cleaning
* adding NLTK files
* updating docstring
* adding import to init
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* updating tests
* wip
* adding sentence/period change warning
* fixing LICENSE header
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
David S. Batista
6cceaac15f
docs: add deprecation warning nltk document splitter ( #8628 )
...
* adding deprecation warning
* adding release notes
* adding release notes
* updating message
* Update haystack/components/preprocessors/nltk_document_splitter.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-12 15:16:54 +01:00
Stefano Fiorucci
04fc187bc4
chore: remove deprecation warnings related to store_full_path
( #8626 )
...
* remove deprecation warnings related to store_full_path
* unused imports
2024-12-12 09:27:19 +01:00
Michele Pangrazzi
21d53d0ec6
update default value of 'store_full_path' to False in converters ( #8619 )
2024-12-10 16:03:38 +01:00
dependabot[bot]
c78eb9be4e
build(deps): bump readmeio/rdme from 8 to 9 ( #8615 )
...
Bumps [readmeio/rdme](https://github.com/readmeio/rdme ) from 8 to 9.
- [Release notes](https://github.com/readmeio/rdme/releases )
- [Changelog](https://github.com/readmeio/rdme/blob/next/CHANGELOG.md )
- [Commits](https://github.com/readmeio/rdme/compare/v8...v9 )
---
updated-dependencies:
- dependency-name: readmeio/rdme
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-10 13:22:08 +01:00
David S. Batista
248dccbdd3
chore: fixing pylint
issues ( #8610 )
...
* initial import
* fixing internal methods
* fixing some internal methods
* modify _preprocess
* fixed internal methods
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-12-09 16:53:37 +00:00
Anton Pelykh
6f983a22ca
fix: add missing stream mime type assignment to the LinkContentFetcher
( #8596 )
...
* add missing stream mime type assignment to the `LinkContentFetcher`
* fix release note fmt
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-09 14:51:14 +00:00
Stefano Fiorucci
09adf856dc
rm openapi spec util ( #8613 )
2024-12-09 10:59:21 +01:00
ArzelaAscoIi
ed2f37da60
fix: docstring for normalization ( #8604 )
...
* fix: docstring for normalization
* chore: add reno
* fixing docstrings and adding pylint disable too many args
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-12-06 17:13:30 +01:00
Michele Pangrazzi
b32f85cca2
remove deprecated 'converter' init parameter from PyPDFToDocument component ( #8609 )
2024-12-06 15:43:43 +01:00
David S. Batista
3da5bac8c4
refactor: converting some DocumentJoiner methods to staticmethod ( #8606 )
...
* converting some methods to static, since they change/depend on state of the object
* adding release notes
* removing tab
2024-12-06 10:28:41 +01:00
David S. Batista
e349a7f2fc
docs: complete docstring for DocumentJoiner code example ( #8593 )
...
* initial import
* changing a method to static
* reverting staticmethod
2024-12-05 14:04:34 +00:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever
returns List[Document]
with docs ordered by split_idx_start
( #8590 )
...
* initial import
* adding a few pylint disable
* adding tests
* fixing integration tests
* adding release notes
* fixing types and docstrings
2024-12-04 16:55:56 +01:00
David S. Batista
f0638b2868
refactor: moving SentenceSplitter
outside NLTKDocumentSplitter
( #8599 )
...
* initial import
* fixing imports and renaming file
* fixing imports path
* adding condition to check NLTK successfully imported
* adding one class inside the NLTK imported condition
2024-12-04 10:44:36 +01:00
David S. Batista
c5ef0b2956
chore: adding a deprecation warning on the SentenceWindowRetriever
( #8597 )
...
* linting
* improving message
* fixing header
* adding deprecation in the release notes
2024-12-03 17:41:19 +01:00
Julian Risch
41369b9e0a
chore: Mention breaking changes in PR template ( #8602 )
2024-12-03 17:18:48 +01:00
Amna Mubashar
4c8eb54049
feat: Add store_full_path to converters (3/3) ( #8585 )
...
* Add store_full_path params
2024-12-03 13:48:56 +05:00
Stefano Fiorucci
de7099e560
ci: add job to check imports ( #8594 )
...
* try checking imports
* clarify error message
* better fmt
* do not show complete list of successfully imported packages
* refinements
* relnote
* add missing forward references
* better function name
* linting
* fix linting
* Update .github/utils/check_imports.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Madeesh Kannan
163c06f3d6
chore: Revert change to deserialization error in Pipeline
( #8591 )
2024-11-28 13:28:52 +01:00
Stefano Fiorucci
c8685aa141
refactor: update components to access ChatMessage.text
instead of content
( #8589 )
...
* introduce text property and deprecate content
* release note
* use chatmessage.text
* release note
* linting
2024-11-28 10:16:07 +00:00
Stefano Fiorucci
fb1baf4921
refactor: ChatMessage
- introduce text
property and deprecate content
( #8588 )
...
* introduce text property and deprecate content
* release note
* minor test refactoring
---------
Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com>
2024-11-28 09:53:02 +00:00
Stefano Fiorucci
51c1390426
chore: use class methods to create ChatMessage
( #8581 )
...
* use class methods to build messages
* fix failing format
2024-11-28 09:35:24 +00:00
Silvano Cerza
473f7bef11
Change Pipeline.from_dict error message
2024-11-28 10:15:06 +01:00
Stefano Fiorucci
fb42c035c5
feat: PyPDFToDocument
- add new customization parameters ( #8574 )
...
* deprecat converter in pypdf
* fix linting of MetaFieldGroupingRanker
* linting
* pypdftodocument: add customization params
* fix mypy
* incorporate feedback
2024-11-26 16:37:59 +01:00