Sebastian Husch Lee
28ad78c73d
feat: Add XLSXToDocument converter ( #8522 )
...
* Add draft of the Excel To Document converter
* Add license header
* Add release note
* Use Union instead of pipe
* Add openpyxl as additional dep
* Fix zip issue
* few updates from Bijay
* Update deps
* Add markdown test
* Adding more example excels and expanding tests
* Added more tests
* Fix windows test by setting lineterminator
* Addressing PR comments
* PR comments
* Fix linting
2025-01-09 09:03:19 +01:00
Stefano Fiorucci
bc30105fbc
test: reorganize docstore test suite to isolate dataframe tests ( #8684 )
...
* reorganize docstore test suite to isolate dataframe tests
* improve docstring
* include FilterDocumentsTestWithDataframe in InMemoryDocumentStore tests
2025-01-08 14:58:52 +00:00
Stefano Fiorucci
5539f6c33f
refactor: improve serialization/deserialization of callables (to handle class methods and static methods) ( #8683 )
...
* progress
* refinements
* tidy up
* release note
2025-01-08 11:28:00 +01:00
tstadel
e6059e632e
fix: truncate ByteStream string representation ( #8673 )
...
* fix: truncate ByteStream string representation
* add reno
* better reno
* add test
* Update test_byte_stream.py
* apply feedback
* update reno
2025-01-07 19:00:52 +01:00
Bohan Qu
8e3f64717f
feat: use importlib when deserializing callables ( #8648 )
2025-01-03 15:06:58 +01:00
Stefano Fiorucci
7b4d9ba86e
feat: introduce class method to create ChatMessage
from the OpenAI dictionary format ( #8670 )
...
* add ChatMessage.from_openai_dict_format
* remove print
* release note
* improve docstring
* separate validation logic
* rm obvious comment
2025-01-02 10:34:41 +00:00
Stefano Fiorucci
99e7e343b2
chore: update links to chatmessage docs ( #8667 )
2024-12-20 15:33:27 +01:00
Stefano Fiorucci
188b2a7f06
feat: support for tools in OpenAIChatGenerator
( #8666 )
...
* move chatmsg>openai conversion to chatmsg dataclass
* implementation and tests cleanup
* release note
* try fixing azure chat generator
* add serde test for toolinvoker
* small fix
2024-12-20 14:20:54 +00:00
Stefano Fiorucci
7dcbf25bd7
feat: add Tool Invoker component ( #8664 )
...
* port toolinvoker
* release note
2024-12-20 14:02:42 +01:00
Michele Pangrazzi
c192488bf6
Named entity extractor private models ( #8658 )
...
* add 'token' support to NamedEntityExtractor to enable using private models on HF backend
* fix existing error message format
* add release note
* add HF_API_TOKEN to e2e workflow
* add informative comment
* Updated to_dict / from_dict to handle 'token' correctly ; Added tests
* Fix lint
* Revert unwanted change
2024-12-20 11:15:55 +01:00
Sebastian Husch Lee
286061f005
fix: Move potential nltk download to warm_up ( #8646 )
...
* Move potential nltk download to warm_up
* Update tests
* Add release notes
* Fix tests
* Uncomment
* Make mypy happy
* Add RuntimeError message
* Update release notes
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-12-20 10:41:44 +01:00
Stefano Fiorucci
f4d9c2bb91
fix: Make the HuggingFaceLocalChatGenerator
compatible with the new ChatMessage
; serialize chat_template
( #8663 )
...
* message conversion function
* hfapi w tools
* right test file + hf_hub version
* release note
* fix for new chatmessage; serialize chat_template
* feedback
2024-12-19 15:12:12 +01:00
Stefano Fiorucci
2bc58d2987
feat: support for tools in HuggingFaceAPIChatGenerator
( #8661 )
...
* message conversion function
* hfapi w tools
* right test file + hf_hub version
* release note
* feedback
2024-12-19 15:04:37 +01:00
Tobias Wochinger
91619a79c1
fix: fix deserialization issues in multi-threading environments ( #8651 )
2024-12-18 21:34:57 +01:00
Stefano Fiorucci
96b4a1d2fd
feat: Tool
dataclass - unified abstraction to represent tools ( #8652 )
...
* draft
* del HF token in tests
* adaptations
* progress
* fix type
* import sorting
* more control on deserialization
* release note
* improvements
* support name field
* fix chatpromptbuilder test
* port Tool from experimental
* release note
* docs upd
* Update tool.py
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-18 11:36:44 +00:00
Stefano Fiorucci
ea3602643a
feat!: new ChatMessage
( #8640 )
...
* draft
* del HF token in tests
* adaptations
* progress
* fix type
* import sorting
* more control on deserialization
* release note
* improvements
* support name field
* fix chatpromptbuilder test
* Update chat_message.py
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-17 17:02:04 +01:00
Stefano Fiorucci
2a9a6401d2
chore: pin openai>=1.56.1
( #8632 )
...
* pin openai>=1.56.1
* release note
2024-12-12 16:26:38 +01:00
David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter ( #8617 )
...
* wip: initial import
* wip: refactoring
* wip: refactoring tests
* wip: refactoring tests
* making all NLTKSplitter related tests work
* refactoring
* docstrings
* refactoring and removing NLTKDocumentSplitter
* fixing tests for custom sentence tokenizer
* fixing tests for custom sentence tokenizer
* cleaning up
* adding release notes
* reverting some changes
* cleaning up tests
* fixing serialisation and adding tests
* cleaning up
* wip
* renaming and cleaning
* adding NLTK files
* updating docstring
* adding import to init
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* updating tests
* wip
* adding sentence/period change warning
* fixing LICENSE header
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
David S. Batista
6cceaac15f
docs: add deprecation warning nltk document splitter ( #8628 )
...
* adding deprecation warning
* adding release notes
* adding release notes
* updating message
* Update haystack/components/preprocessors/nltk_document_splitter.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-12 15:16:54 +01:00
Michele Pangrazzi
21d53d0ec6
update default value of 'store_full_path' to False in converters ( #8619 )
2024-12-10 16:03:38 +01:00
Anton Pelykh
6f983a22ca
fix: add missing stream mime type assignment to the LinkContentFetcher
( #8596 )
...
* add missing stream mime type assignment to the `LinkContentFetcher`
* fix release note fmt
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-09 14:51:14 +00:00
ArzelaAscoIi
ed2f37da60
fix: docstring for normalization ( #8604 )
...
* fix: docstring for normalization
* chore: add reno
* fixing docstrings and adding pylint disable too many args
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-12-06 17:13:30 +01:00
Michele Pangrazzi
b32f85cca2
remove deprecated 'converter' init parameter from PyPDFToDocument component ( #8609 )
2024-12-06 15:43:43 +01:00
David S. Batista
3da5bac8c4
refactor: converting some DocumentJoiner methods to staticmethod ( #8606 )
...
* converting some methods to static, since they change/depend on state of the object
* adding release notes
* removing tab
2024-12-06 10:28:41 +01:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever
returns List[Document]
with docs ordered by split_idx_start
( #8590 )
...
* initial import
* adding a few pylint disable
* adding tests
* fixing integration tests
* adding release notes
* fixing types and docstrings
2024-12-04 16:55:56 +01:00
David S. Batista
c5ef0b2956
chore: adding a deprecation warning on the SentenceWindowRetriever
( #8597 )
...
* linting
* improving message
* fixing header
* adding deprecation in the release notes
2024-12-03 17:41:19 +01:00
Amna Mubashar
4c8eb54049
feat: Add store_full_path to converters (3/3) ( #8585 )
...
* Add store_full_path params
2024-12-03 13:48:56 +05:00
Stefano Fiorucci
de7099e560
ci: add job to check imports ( #8594 )
...
* try checking imports
* clarify error message
* better fmt
* do not show complete list of successfully imported packages
* refinements
* relnote
* add missing forward references
* better function name
* linting
* fix linting
* Update .github/utils/check_imports.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Stefano Fiorucci
c8685aa141
refactor: update components to access ChatMessage.text
instead of content
( #8589 )
...
* introduce text property and deprecate content
* release note
* use chatmessage.text
* release note
* linting
2024-11-28 10:16:07 +00:00
Stefano Fiorucci
fb1baf4921
refactor: ChatMessage
- introduce text
property and deprecate content
( #8588 )
...
* introduce text property and deprecate content
* release note
* minor test refactoring
---------
Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com>
2024-11-28 09:53:02 +00:00
Stefano Fiorucci
51c1390426
chore: use class methods to create ChatMessage
( #8581 )
...
* use class methods to build messages
* fix failing format
2024-11-28 09:35:24 +00:00
Stefano Fiorucci
fb42c035c5
feat: PyPDFToDocument
- add new customization parameters ( #8574 )
...
* deprecat converter in pypdf
* fix linting of MetaFieldGroupingRanker
* linting
* pypdftodocument: add customization params
* fix mypy
* incorporate feedback
2024-11-26 16:37:59 +01:00
Stefano Fiorucci
2440a5ee17
chore:PyPDFToDocument
- deprecate converter
init parameter ( #8569 )
...
* deprecat converter in pypdf
* fix linting of MetaFieldGroupingRanker
* linting
2024-11-26 14:47:04 +01:00
Michele Pangrazzi
f0c3692cf2
Remove is_greedy
deprecated argument from @component
decorator ( #8580 )
...
* Remove 'is_greedy' deprecated argument from @component decorator
* Remove unused import
2024-11-26 10:44:50 +00:00
Vladimir Blagojevic
59f1e182db
feat: Add variable to specify inputs as optional to ConditionalRouter ( #8568 )
...
* Add optional_variables in ConditionalRouter
* Add reno note
* Add more unit test with various complex scenarios
* Add more unit tests
* Add pylint disable=too-many-positional-arguments
* PR feedback from @sjrl
2024-11-26 10:48:55 +01:00
Matt G
e3b73e048b
fix: bug on tracing where components are in a loop in a pipeline ( #8576 )
...
* Fix to tracing parent spans on loops
* Fix linting
* Add release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-25 14:21:08 +01:00
Silvano Cerza
ab840351f8
Fix DocumentCleaner not preserving Document fields ( #8578 )
2024-11-25 13:08:59 +01:00
Amna Mubashar
9302d3d9f0
feat: Add store_full_path to converters (2/3) ( #8573 )
2024-11-25 15:22:19 +05:00
Sebastian Husch Lee
eace2a99e5
feat: Add Literal["*"] option to required_variables in ChatPrompBuilder and PromptBuilder ( #8572 )
...
* Add new option for required_variables in PromptBuilder and ChatPromptBuilder
* Add reno note
* Add tests
2024-11-22 16:27:50 +01:00
David S. Batista
b5a2fad642
feat: adding Maximum Margin Relevance Ranker ( #8554 )
...
* initial import
* linting
* adding MRR tests
* adding release notes
* fixing tests
* adding linting ignore to cross-encoder ranker
* update docstring
* refactoring
* making strategy Optional instead of Literal
* wip: adding unit tests
* refactoring MMR algorithm
* refactoring tests
* cleaning up and updating tests
* adding empty line between license + code
* bug in tests
* using Enum for strategy and similarity metric
* adding more tests
* adding empty line between license + code
* removing run time params
* PR comments
* PR comments
* fixing
* fixing serialisation
* fixing serialisation tests
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/rankers/sentence_transformers_diversity.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* fixing tests
* PR comments
* PR comments
* PR comments
* PR comments
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-22 14:58:45 +00:00
Richard Hudson
a8eeb2024f
feat: Allow unverified OpenAPI calls ( #8562 )
...
* Feed through ssl_verify value to OpenAPI
* Add release note
* Update serialization methods
* Applied black formatting
2024-11-22 15:45:00 +01:00
Amna Mubashar
21906d0558
feat: Add store_full_path
to converters (1/3) ( #8566 )
...
* Add store_full_path param to 3 converters
2024-11-22 13:55:08 +01:00
Ulises M
b1353f4f0f
fix: append runtime meta to ChatMessage's extracted meta in AnswerBuilder ( #8544 )
...
* append runtime meta to extracted meta
* add pylint ignore flag to .run()
* explicitly convert reply to string
2024-11-20 20:07:04 +01:00
Silvano Cerza
3ef8c081be
fix: OpenAIChatGenerator
and OpenAIGenerator
crashing when streaming with usage tracking ( #8558 )
...
* Fix OpenAIGenerator crashing with tracking usage with streaming enabled
* Fix OpenAIChatGenerator crashing with tracking usage with streaming enabled
* Add release notes
* Fix linting
2024-11-20 10:27:22 +01:00
Silvano Cerza
bd77120cf3
Fix DocumentSplitter
not splitting by function ( #8549 )
...
* Fix DocumentSplitter not splitting by function
* Make the split_by mapping a constant
2024-11-18 11:54:30 +01:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions ( #8526 )
...
* feat: be tolerant to exceptions
if ever an error is raised by the OpenAI API, don't fail the entire processing
* fix: missing import, string separator
* Enhance error handling
* Use batched from more_itertools for compatibility with older Python versions
* Fix batching and add test
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Sebastian Husch Lee
e45d3329a1
feat: Adding DALLE image generator ( #8448 )
...
* First pass at adding DALLE image generator
* Add missing header
* Fix tests
* Add tests
* Fix mypy
* Make mypy happy
* More unit tests
* Adding release notes
* Add a test for run
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Fix pylint
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/generators/openai_dalle.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-14 16:19:49 +01:00
Sriniketh J
a045c0eabb
feat: added split by line to DocumentSplitter ( #8525 )
...
* feat: added split by line to DocumentSplitter
* fix: pr review comments
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-11-14 16:09:01 +01:00
Sebastian Husch Lee
0c11c7b98e
fix: Bring in fix from custom nodes ( #8539 )
...
* Bring in fix from custom nodes
* Add to_dict function and test
* reno
* Fix pylint
2024-11-14 13:00:28 +01:00
Anes Benmerzoug
f5683bc8fa
fix: document joiner division by zero with distribution based rank fusion ( #8520 )
...
* Parametrize document joiner tests with empty lists
* Skip loop in _distribution_based_rank_fusion if document list is empty
* Parametrize test_empty_list with join_mode
* Prevent division by zero in _merge and _reciprocal_rank_fusion
* Add release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-14 11:41:28 +00:00