Julian Risch
|
9f3b6512be
|
refactor: Remove reimplementations of default from_dict /to_dict and corresponding tests in 2.0 (#6108)
* whisper transcriber
* remove from/to_dict from builders
* remove from/to_dict from embedders
* remove from/to_dict from fetcher, file_converters
* remove from/to_dict from generators, preprocessors
* remove from/to_dict from ranker, reader
* remove from/to_dict from router, sampler, websearch
* pylint
* reno
* refactor import
* remove unused import
|
2023-10-19 11:17:02 +02:00 |
|
Julian Risch
|
aaee03aee8
|
feat: Add DocumentCleaner 2.0 (#5976)
* remove whitespaces, substrings, regex, empty lines
* remove repeated substrings
* reno
* return empty string as shortest common ngram
* address first half of review feedback
* address second half of review feedback
* mention \f page separator for header/footer removal
* mention \f page separator for header/footer removal
* mark example usage as python code
|
2023-10-13 12:39:55 +02:00 |
|