12 Commits

Author SHA1 Message Date
Sebastian Husch Lee
19cf220136
feat: integrate two ready-made SuperComponents from haystack-experimental (#9235)
* Add super component decorator

* Add reno

* MultiFileConverter

* Add DocumentPreprocessor

* Add reno

* Add tests and change doc preprocessor to split first then clean

* Remove code from merge

* Add to pydoc and missing test file

* PR comments

* Lint fix

* Fix mypy

* Fix mypy

* Add comment

* PR comments

* Update haystack/components/converters/multi_file_converter.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/preprocessors/document_preprocessor.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/converters/multi_file_converter.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* PR comments

* PR comment

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2025-04-17 10:02:26 +00:00
David S. Batista
be2d1fb303
feat: adding AutoMergingRetriever and HierarchicalDocumentSplitter (#9067)
* adding Auto-Merging-Retriever

* adding release notes

* updating tests

* adding renamed file

* Update haystack/components/preprocessors/hierarchical_document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/retrievers/auto_merging_retriever.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fixing tests and imports

* adding pydoc

* adding to type checking

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-03-19 18:25:23 +00:00
Sebastian Husch Lee
f9e6e481a1
feat: Add new component CSVDocumentSplitter to recursively split CSV documents (#8815)
* CSV Document Splitter

* Add license header

* Add newline

* Add to docs

* Add lineterminator

* Updated csv splitter to allow user to specify to split by row, column or both

* Adding more tests

* Column tests

* Some refactoring to remove incorrect dropna call

* Fix

* More complicated test

* Adding more relevant metadata to match whats provided in our other splitters

* value error tests

* Fix mypy

* Docstring updates

* Add skip_blank_lines=False

* Add to dict test

* More from and to dict tests

* Fixes

* Move dict creation outside of for loop
2025-02-10 18:10:18 +01:00
Sebastian Husch Lee
1785ea622e
feat: Add component CSVDocumentCleaner for removing empty rows and columns (#8816)
* Initial commit for csv cleaner

* Add release notes

* Update lineterminator

* Update releasenotes/notes/csv-document-cleaner-8eca67e884684c56.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* alphabetize

* Use lazy import

* Some refactoring

* Some refactoring

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 17:56:38 +01:00
David S. Batista
26b80778f5
chore: removing NLTKDocumentSplitter (#8724)
* removing NLTKDocumentSplitter

* adding release notes

* removing pydocs reference
2025-01-15 16:11:51 +00:00
David S. Batista
ec8666545d
docs: adding RecursiveSplitter to pydoc 2025-01-13 11:46:34 +01:00
Daria Fokina
caf465b004
docs: add NLTKSplitter and ZeroShotClassifier to pydocs (#8384)
* Update preprocessors_api.yml

* Update classifiers_api.yml
2024-09-18 15:55:40 +02:00
Silvano Cerza
2a83eccf99
Update docs renderer (#7349) 2024-03-13 12:30:13 +01:00
Tobias Wochinger
a3a21947a4
docs: disable class def rendering (#7329) 2024-03-07 15:54:16 +01:00
Stefano Fiorucci
2580e053ad
fix wrong docs config (#7224) 2024-02-27 16:00:44 +01:00
Stefano Fiorucci
9b1d7926ae
preprocessors: review docstrings (#7219) 2024-02-27 15:51:23 +01:00
Massimiliano Pippi
27d0b28d06
chore: rename categories in the API docs (#6885)
* rename API categories

* fix

* update slugs

* rename files for consistency

* fix category ID

* try getting the right version
2024-02-01 16:47:26 +01:00