Daniel Bichuetti
|
7c49fffc71
|
feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation (#4226)
* refactor: isolate PDF converters
* refactor: remove xpdf dependency and fix tests
* refactor: add min. version
* feat: enable multiprocessing and add tests
* fix: remove unused imports
* fix: regression when moved code
* refactor: use itertools
* fix: mypy claims
* refactor: double tool support
* refactor: add fallback to xpdf
* refactor: black formatting
* refactor: make superclass signature compatible
* refactor: complete removal of xPdf
* refactor: regroup Haystack imports and fix regression
* refactor: remove original declaration
* docs: fix docstrings
* tests: add [pdf] to [all]
* refactor: remove redundant checks, avoid extra processes
* refactor: add deprecation warning
* refactor: add pytest mark
* tests: change PDF test file
* fix: correct pytest mark
* refactor: deprecate parameter and add new
* tests: change pdf sample
* Add minor lg changes to docstrings
* Fix default value in doc strings
* Update test/nodes/test_file_converter.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* tests: fix page count
* refactor: add imported function
* refactor: change default value
* tests: change parameters and fix typo
* Unify sort_by_position parameter names
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
|
2023-03-01 22:34:38 +01:00 |
|