10 Commits

Author SHA1 Message Date
Sebastian Husch Lee
baed478f23
fix: Fix split_start_idx and _split_overlap information in DocumentSplitter (#8046)
* Fix bug in DocumentSplitter and expand tests to catch said bug

* Fix split overlap information calc and actually test it

* Add release notes

* Remove comments

* Same fix in SentenceWindowRetrieval

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-07-24 15:15:36 +02:00
David S. Batista
91f57015c0
feat : adding split_id and split_overlap to DocumentSplitter (#7933)
* wip: adding _split_overlapp

* fixing join issue for _split_overlap

* adding tests

* adding release notes

* cleaning and fixing tests

* making mypy happy

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* adding docstrings

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-06-27 15:07:43 +02:00
Massimiliano Pippi
3a03fce71c
ci: Add code formatting checks (#7882)
* ruff settings

enable ruff format and re-format outdated files

feat: `EvaluationRunResult` add parameter to specify columns to keep in the comparative `Dataframe`  (#7879)

* adding param to explictily state which cols to keep

* adding param to explictily state which cols to keep

* adding param to explictily state which cols to keep

* updating tests

* adding release notes

* Update haystack/evaluation/eval_run_result.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update releasenotes/notes/add-keep-columns-to-EvalRunResult-comparative-be3e15ce45de3e0b.yaml

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* updating docstring

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

add format-check

fail on format and linting failures

fix string formatting

reformat long lines

fix tests

fix typing

linter

pull from main

* reformat

* lint -> check

* lint -> check
2024-06-18 15:52:46 +00:00
Alessio Cesaretti
d0da31a047
feat: Add split_threshold to DocumentSplitter to avoid excessively short splits (#7721)
* feat: add split_threshold to document splitter to avoid excessively small splits

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* extend release note

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-05-27 14:48:38 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Carlos Fernández
d2c87b2fd9
feat: add page_number to metadata in DocumentSplitter (#7599)
* Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705.

* Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x
Solve some minor bugs spotted by tests.

* Update docstrings.

* Add reno.

* Update haystack/components/preprocessors/document_splitter.py

Update docstring from suggestion

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* solve suggestion to improve readability

* fragment tests

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Update .gitignore

* Update .gitignore

* Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml

* blackening

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-04-29 12:51:18 +02:00
sahusiddharth
a7ac4edd07
feat: added split by page to DocumentSplitter (#6753)
* feat-added-split-by-page-to-DocumentSplitter

* added test case and the suggested changes

* Update document_splitter.py

* Update haystack/components/preprocessors/document_splitter.py

* Update test_document_splitter.py

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-01-17 15:36:29 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker (#6450) 2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00