Toolkit for linearizing PDFs for LLM datasets/training
Updated 2025-06-27 02:57:26 +00:00
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Updated 2025-06-26 12:36:33 +00:00
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Updated 2025-06-23 12:15:26 +00:00