Sara Zan 01ea4bf21f
Change default encoding for PDFToTextConverter from Latin 1 to UTF-8 (#2420)
* Change default encoding for PDFToTextConverter

* Update Documentation & Code Style

* Improve docstring

* Update Documentation & Code Style

* Add list of ligatures to ignore and add the possibility to modify such list at need

* Add docstring

* Add tests

* Rename parameter

* Update Documentation & Code Style

* Move implementation into the base converter to make mypy happier

* Update Documentation & Code Style

* mypy and pylint

* mypy

* move encoding parameter to init of PDFToTextConverter

* Update Documentation & Code Style

* make utf8 default and fix mypy

* Update Documentation & Code Style

* Update Documentation & Code Style

* remove note on encoding in tutorial8

* Update Documentation & Code Style

* skip OCRConverter and test converter.run

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
..
2020-09-18 12:57:32 +02:00
2022-03-21 16:24:09 +01:00
2022-02-03 13:43:18 +01:00
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00