OCRmyPDF/tests/spoof/tesseract_big_image_error.py
James R. Barlow 2c24f67deb Rename “tess4” renderer to “sandwich” and make it default in Tess 3.05.01
Tesseract 3.05.01 backported the textonly_pdf=1 which allows the use
of this superior PDF renderer prior to 4.00 alpha. This means that
the tess4 name is no longer accurate, so call it a sandwich because of
its merge-preserve characteristic. Preserve the tess4 name. Fix the
documentation and tests to reflect this.

Make it the default, because it’s better. It does not have the issues
the “tesseract” renderer does prior to Tess 3.05.00 with rendering
PDFs that Ghostscript corrupts, and it produces better output without
re-rastering.

Deprecate some old stuff to avoid the test suite growing obscenely
large.
2017-06-13 13:09:12 -07:00

50 lines
1.4 KiB
Python
Executable File

#!/usr/bin/env python3
# © 2016 James R. Barlow: github.com/jbarlow83
import sys
VERSION_STRING = '''tesseract 3.04.00
leptonica-1.72
libjpeg 8d : libpng 1.6.19 : libtiff 4.0.6 : zlib 1.2.5
SPOOFED: return error claiming image too big
'''
"""Simulates an error of Tesseract failing on attempts to process large images
"""
def main():
if sys.argv[1] == '--version':
print(VERSION_STRING, file=sys.stderr)
sys.exit(0)
elif sys.argv[1] == '--list-langs':
print('List of available languages (1):\neng', file=sys.stderr)
sys.exit(0)
elif sys.argv[1] == '--print-parameters':
print('A parameter list would go here', file=sys.stderr)
sys.exit(0)
elif sys.argv[-2] == 'hocr':
print("Image too large: (33830, 14959)\n"
"Error during processing.", file=sys.stderr)
sys.exit(1)
elif sys.argv[-2] == 'pdf':
print("Image too large: (33830, 14959)\n"
"Error during processing.", file=sys.stderr)
sys.exit(1)
elif sys.argv[-1] == 'stdout':
print("Image too large: (33830, 14959)\n"
"Error during processing.", file=sys.stderr)
sys.exit(1)
else:
print("Spoof doesn't understand arguments", file=sys.stderr)
print(sys.argv, file=sys.stderr)
sys.exit(1)
sys.exit(0)
if __name__ == '__main__':
main()