Sara Zan
01ea4bf21f
Change default encoding for PDFToTextConverter
from Latin 1
to UTF-8
(#2420)
* Change default encoding for PDFToTextConverter
* Update Documentation & Code Style
* Improve docstring
* Update Documentation & Code Style
* Add list of ligatures to ignore and add the possibility to modify such list at need
* Add docstring
* Add tests
* Rename parameter
* Update Documentation & Code Style
* Move implementation into the base converter to make mypy happier
* Update Documentation & Code Style
* mypy and pylint
* mypy
* move encoding parameter to init of PDFToTextConverter
* Update Documentation & Code Style
* make utf8 default and fix mypy
* Update Documentation & Code Style
* Update Documentation & Code Style
* remove note on encoding in tutorial8
* Update Documentation & Code Style
* skip OCRConverter and test converter.run
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
..
2022-04-26 16:09:39 +02:00
2022-04-26 16:09:39 +02:00
2022-03-15 11:17:26 +01:00
2022-04-29 10:16:02 +02:00
2022-03-29 13:53:35 +02:00
2022-03-15 11:17:26 +01:00
2022-03-07 19:25:33 +01:00
2022-05-02 13:35:07 +02:00
2022-04-26 19:06:30 +02:00
2022-04-26 16:09:39 +02:00
2022-05-02 13:35:07 +02:00
2022-05-04 17:01:45 +02:00
2022-04-20 09:18:02 +02:00
2022-03-15 11:17:26 +01:00
2022-02-03 13:43:18 +01:00
2022-03-15 11:17:26 +01:00
2022-03-07 19:25:33 +01:00
2022-02-03 13:43:18 +01:00
2022-03-15 11:17:26 +01:00
2022-03-15 11:17:26 +01:00
2022-04-21 11:24:39 +02:00
2022-04-26 16:09:39 +02:00
2022-03-07 19:25:33 +01:00
2022-04-19 16:08:08 +02:00
2022-05-02 14:41:07 +02:00
2022-03-29 13:53:35 +02:00
2022-02-03 13:43:18 +01:00
2021-12-06 17:13:57 +01:00
2022-03-15 11:17:26 +01:00
2022-04-21 11:24:39 +02:00
2022-04-29 10:16:02 +02:00
2022-02-03 19:19:05 +01:00
2022-04-26 16:09:39 +02:00
2022-03-15 11:17:26 +01:00
2022-03-07 19:25:33 +01:00
2022-03-30 17:02:39 +02:00
2022-03-21 11:58:51 +01:00
2022-03-15 11:17:26 +01:00
2021-10-25 15:50:23 +02:00
2022-03-29 13:53:35 +02:00
2022-04-26 19:06:30 +02:00