Julian Risch aaee03aee8
feat: Add DocumentCleaner 2.0 (#5976)
* remove whitespaces, substrings, regex, empty lines

* remove repeated substrings

* reno

* return empty string as shortest common ngram

* address first half of review feedback

* address second half of review feedback

* mention \f page separator for header/footer removal

* mention \f page separator for header/footer removal

* mark example usage as python code
2023-10-13 12:39:55 +02:00
..