Sebastian Husch Lee
19cf220136
feat: integrate two ready-made SuperComponents from haystack-experimental ( #9235 )
...
* Add super component decorator
* Add reno
* MultiFileConverter
* Add DocumentPreprocessor
* Add reno
* Add tests and change doc preprocessor to split first then clean
* Remove code from merge
* Add to pydoc and missing test file
* PR comments
* Lint fix
* Fix mypy
* Fix mypy
* Add comment
* PR comments
* Update haystack/components/converters/multi_file_converter.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/preprocessors/document_preprocessor.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/components/converters/multi_file_converter.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* PR comments
* PR comment
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2025-04-17 10:02:26 +00:00
David S. Batista
be2d1fb303
feat: adding AutoMergingRetriever
and HierarchicalDocumentSplitter
( #9067 )
...
* adding Auto-Merging-Retriever
* adding release notes
* updating tests
* adding renamed file
* Update haystack/components/preprocessors/hierarchical_document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/retrievers/auto_merging_retriever.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* fixing tests and imports
* adding pydoc
* adding to type checking
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-03-19 18:25:23 +00:00
Sebastian Husch Lee
f9e6e481a1
feat: Add new component CSVDocumentSplitter to recursively split CSV documents ( #8815 )
...
* CSV Document Splitter
* Add license header
* Add newline
* Add to docs
* Add lineterminator
* Updated csv splitter to allow user to specify to split by row, column or both
* Adding more tests
* Column tests
* Some refactoring to remove incorrect dropna call
* Fix
* More complicated test
* Adding more relevant metadata to match whats provided in our other splitters
* value error tests
* Fix mypy
* Docstring updates
* Add skip_blank_lines=False
* Add to dict test
* More from and to dict tests
* Fixes
* Move dict creation outside of for loop
2025-02-10 18:10:18 +01:00
Sebastian Husch Lee
1785ea622e
feat: Add component CSVDocumentCleaner for removing empty rows and columns ( #8816 )
...
* Initial commit for csv cleaner
* Add release notes
* Update lineterminator
* Update releasenotes/notes/csv-document-cleaner-8eca67e884684c56.yaml
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* alphabetize
* Use lazy import
* Some refactoring
* Some refactoring
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 17:56:38 +01:00
David S. Batista
26b80778f5
chore: removing NLTKDocumentSplitter ( #8724 )
...
* removing NLTKDocumentSplitter
* adding release notes
* removing pydocs reference
2025-01-15 16:11:51 +00:00
David S. Batista
ec8666545d
docs: adding RecursiveSplitter to pydoc
2025-01-13 11:46:34 +01:00
Daria Fokina
caf465b004
docs: add NLTKSplitter and ZeroShotClassifier to pydocs ( #8384 )
...
* Update preprocessors_api.yml
* Update classifiers_api.yml
2024-09-18 15:55:40 +02:00
Silvano Cerza
2a83eccf99
Update docs renderer ( #7349 )
2024-03-13 12:30:13 +01:00
Tobias Wochinger
a3a21947a4
docs: disable class def rendering ( #7329 )
2024-03-07 15:54:16 +01:00
Stefano Fiorucci
2580e053ad
fix wrong docs config ( #7224 )
2024-02-27 16:00:44 +01:00
Stefano Fiorucci
9b1d7926ae
preprocessors: review docstrings ( #7219 )
2024-02-27 15:51:23 +01:00
Massimiliano Pippi
27d0b28d06
chore: rename categories in the API docs ( #6885 )
...
* rename API categories
* fix
* update slugs
* rename files for consistency
* fix category ID
* try getting the right version
2024-02-01 16:47:26 +01:00