10 Commits

Author SHA1 Message Date
Sebastian Husch Lee
f9e6e481a1
feat: Add new component CSVDocumentSplitter to recursively split CSV documents (#8815)
* CSV Document Splitter

* Add license header

* Add newline

* Add to docs

* Add lineterminator

* Updated csv splitter to allow user to specify to split by row, column or both

* Adding more tests

* Column tests

* Some refactoring to remove incorrect dropna call

* Fix

* More complicated test

* Adding more relevant metadata to match whats provided in our other splitters

* value error tests

* Fix mypy

* Docstring updates

* Add skip_blank_lines=False

* Add to dict test

* More from and to dict tests

* Fixes

* Move dict creation outside of for loop
2025-02-10 18:10:18 +01:00
Sebastian Husch Lee
1785ea622e
feat: Add component CSVDocumentCleaner for removing empty rows and columns (#8816)
* Initial commit for csv cleaner

* Add release notes

* Update lineterminator

* Update releasenotes/notes/csv-document-cleaner-8eca67e884684c56.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* alphabetize

* Use lazy import

* Some refactoring

* Some refactoring

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 17:56:38 +01:00
David S. Batista
26b80778f5
chore: removing NLTKDocumentSplitter (#8724)
* removing NLTKDocumentSplitter

* adding release notes

* removing pydocs reference
2025-01-15 16:11:51 +00:00
David S. Batista
ec8666545d
docs: adding RecursiveSplitter to pydoc 2025-01-13 11:46:34 +01:00
Daria Fokina
caf465b004
docs: add NLTKSplitter and ZeroShotClassifier to pydocs (#8384)
* Update preprocessors_api.yml

* Update classifiers_api.yml
2024-09-18 15:55:40 +02:00
Silvano Cerza
2a83eccf99
Update docs renderer (#7349) 2024-03-13 12:30:13 +01:00
Tobias Wochinger
a3a21947a4
docs: disable class def rendering (#7329) 2024-03-07 15:54:16 +01:00
Stefano Fiorucci
2580e053ad
fix wrong docs config (#7224) 2024-02-27 16:00:44 +01:00
Stefano Fiorucci
9b1d7926ae
preprocessors: review docstrings (#7219) 2024-02-27 15:51:23 +01:00
Massimiliano Pippi
27d0b28d06
chore: rename categories in the API docs (#6885)
* rename API categories

* fix

* update slugs

* rename files for consistency

* fix category ID

* try getting the right version
2024-02-01 16:47:26 +01:00