Shahrukh Khan
|
4822536886
|
Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349)
* add image.py converter
* add PDFtoImageConverter
* add init to PDFtoImageConverter and classes to __init__
* update imagetotext pipeline
* update imagetotext pipeline
* update imagetotext pipeline
* update imagetotext pipeline
* update imagetotext pipeline
* update imagetotext pipeline
* update imagetotext pipeline
* revert change in base.py in file_conv
* Update base.py
* Update pdf.py
* add ocr file_converter testcase & update dockerfile
* fix tesseract exception message typo
* fix _image_to_text doctstring
* add tesseract installation to CI
* add tesseract installation to CI
* add content test for PDF OCR converter
* update PDFToTextOCRConverter constructor doctsring
* replace image files with tmp paths for image.py convert
* replace image files with tmp paths for image.py convert
* Update README.md
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
|
2021-09-01 16:42:25 +02:00 |
|