Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Updated 2025-06-26 22:27:05 +00:00
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Updated 2025-06-26 12:36:33 +00:00
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Updated 2025-06-23 12:15:26 +00:00