haystack

yujunjun/haystack

Fork 0

mirror of https://github.com/deepset-ai/haystack.git synced 2025-09-03 21:33:40 +00:00

Commit Graph

Author	SHA1	Message	Date
Shahrukh Khan	4822536886	Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349 ) * add image.py converter * add PDFtoImageConverter * add init to PDFtoImageConverter and classes to __init__ * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * update imagetotext pipeline * revert change in base.py in file_conv * Update base.py * Update pdf.py * add ocr file_converter testcase & update dockerfile * fix tesseract exception message typo * fix _image_to_text doctstring * add tesseract installation to CI * add tesseract installation to CI * add content test for PDF OCR converter * update PDFToTextOCRConverter constructor doctsring * replace image files with tmp paths for image.py convert * replace image files with tmp paths for image.py convert * Update README.md Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-09-01 16:42:25 +02:00
Lalit Pagaria	e904deefa7	Add Markdown file convertor (#875 )	2021-03-23 16:31:26 +01:00
oryx1729	c4607cbd98	Revamp CI (#825 )	2021-02-12 13:38:54 +01:00

Author

SHA1

Message

Date

Shahrukh Khan

4822536886

Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349 )

* add image.py converter

* add PDFtoImageConverter

* add init to PDFtoImageConverter and classes to __init__

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* revert change in base.py in file_conv

* Update base.py

* Update pdf.py

* add ocr file_converter testcase & update dockerfile

* fix tesseract exception message typo

* fix _image_to_text doctstring

* add tesseract installation to CI

* add tesseract installation to CI

* add content test for PDF OCR converter

* update PDFToTextOCRConverter constructor doctsring

* replace image files with tmp paths for image.py convert

* replace image files with tmp paths for image.py convert

* Update README.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>

2021-09-01 16:42:25 +02:00

Lalit Pagaria

e904deefa7

Add Markdown file convertor (#875 )

2021-03-23 16:31:26 +01:00

oryx1729

c4607cbd98

Revamp CI (#825 )

2021-02-12 13:38:54 +01:00

3 Commits