5 Commits

Author SHA1 Message Date
Sebastian Husch Lee
85258f0654
fix: Fix types and formatting pipeline test_run.py (#9575)
* Fix types in test_run.py

* Get test_run.py to pass fmt-check

* Add test_run to mypy checks

* Update test folder to pass ruff linting

* Fix merge

* Fix HF tests

* Fix hf test

* Try to fix tests

* Another attempt

* minor fix

* fix SentenceTransformersDiversityRanker

* skip integrations tests due to model unavailable on HF inference

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-07-03 09:49:09 +02:00
David S. Batista
da60156174
chore: removing unused imports from tests (#9446) 2025-05-26 16:22:51 +00:00
David S. Batista
0f00c1882e
fix: make SentenceSplitter QUOTE_SPANS_RE regex ReDoS-safe (#9338)
* fix: make QUOTE_SPANS_RE regex ReDoS-safe

* Removing the capture of leading non-character on double quotes, allowing quote with new lines, adding tests

* cleaning

* fixing release notes

* changing import

* adding test for Regex Denial of Service (ReDoS)

* reducing the size/time of tests

* Update test/components/preprocessors/test_sentence_tokenizer.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/preprocessors/test_sentence_tokenizer.py

---------

Co-authored-by: Waivey <waivey@proton.me>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-05-02 15:40:17 +00:00
David S. Batista
c306bee665
fix: adding missing abbreviations files for SentenceSplitter (#8660)
* adding missing abbreviations files for SentenceSplitter

* fixing tests path
2024-12-19 11:08:29 +01:00
David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter (#8617)
* wip: initial import

* wip: refactoring

* wip: refactoring tests

* wip: refactoring tests

* making all NLTKSplitter related tests work

* refactoring

* docstrings

* refactoring and removing NLTKDocumentSplitter

* fixing tests for custom sentence tokenizer

* fixing tests for custom sentence tokenizer

* cleaning up

* adding release notes

* reverting some changes

* cleaning up tests

* fixing serialisation and adding tests

* cleaning up

* wip

* renaming and cleaning

* adding NLTK files

* updating docstring

* adding import to init

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* updating tests

* wip

* adding sentence/period change warning

* fixing LICENSE header

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00