diff --git a/README.md b/README.md index 737a000381..25baeb1f36 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,42 @@ PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具 ## 📣 最新动态 + +🔥🔥2025.06.19: **PaddleOCR 3.0.2** 发布,包含: + +- **功能新增:** + - 模型默认下载源从`BOS`改为`HuggingFace`,同时也支持用户通过更改环境变量`PADDLE_PDX_MODEL_SOURCE`为`BOS`,将模型下载源设置为百度云对象存储BOS。 + - PP-OCRv5、PP-StructureV3、PP-ChatOCRv4等pipeline新增C++、Java、Go、C#、Node.js、PHP 6种语言的服务调用示例。 + - 优化PP-StructureV3产线中版面分区排序算法,对复杂竖版版面排序逻辑进行完善,进一步提升了复杂版面排序效果。 + - 优化模型选择逻辑,当指定语言、未指定模型版本时,自动选择支持该语言的最新版本的模型。 + - 为MKL-DNN缓存大小设置默认上界,防止缓存无限增长。同时,支持用户配置缓存容量。 + - 更新高性能推理默认配置,支持Paddle MKL-DNN加速。优化高性能推理自动配置逻辑,支持更智能的配置选择。 + - 调整默认设备获取逻辑,考虑环境中安装的Paddle框架对计算设备的实际支持情况,使程序行为更符合直觉。 + - 新增PP-OCRv5的Android端示例,[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/deployment/on_device_deployment.html)。 + +- **Bug修复:** + - 修复PP-StructureV3部分CLI参数不生效的问题。 + - 修复部分情况下`export_paddlex_config_to_yaml`无法正常工作的问题。 + - 修复save_path实际行为与文档描述不符的问题。 + - 修复基础服务化部署在使用MKL-DNN时可能出现的多线程错误。 + - 修复Latex-OCR模型的图像预处理的通道顺序错误。 + - 修复文本识别模块保存可视化图像的通道顺序错误。 + - 修复PP-StructureV3中表格可视化结果通道顺序错误。 + - 修复PP-StructureV3产线中极特殊的情况下,计算overlap_ratio时,变量溢出问题。 + +- **文档优化:** + - 更新文档中对`enable_mkldnn`参数的说明,使其更准确地描述程序的实际行为。 + - 修复文档中对`lang`和`ocr_version`参数描述的错误。 + - 补充通过CLI导出产线配置文件的说明。 + - 修复PP-OCRv5性能数据表格中的列缺失问题。 + - 润色PP-StructureV3在不同配置下的benchmark指标。 + +- **其他:** + - 放松numpy、pandas等依赖的版本限制,恢复对Python 3.12的支持。 + +
+ 历史日志 + 🔥🔥2025.06.05: **PaddleOCR 3.0.1** 发布,包含: - **优化部分模型和模型配置:** @@ -65,6 +101,9 @@ PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具 2. 💻 原生支持**文心大模型4.5 Turbo**,还兼容 PaddleNLP、Ollama、vLLM 等工具部署的大模型。 3. 🤝 集成 [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2),支持印刷文字、手写体文字、印章信息、表格、图表等常见的复杂文档信息抽取和理解的能力。 +[更多日志](https://paddlepaddle.github.io/PaddleOCR/latest/update/update.html) + +
## ⚡ 快速开始 ### 1. 在线体验 diff --git a/README_en.md b/README_en.md index 37d0907868..4264af2933 100644 --- a/README_en.md +++ b/README_en.md @@ -44,7 +44,46 @@ In addition to providing an outstanding model library, PaddleOCR 3.0 also offers ## 📣 Recent updates -#### **🔥🔥 2025.06.05: Release of PaddleOCR 3.0.1, includes:** +#### 🔥🔥**2025.06.19: Release of PaddleOCR 3.0.2, includes:** + +- **New Features:** + + - The default download source has been changed from `BOS` to `HuggingFace`. Users can also change the environment variable `PADDLE_PDX_MODEL_SOURCE` to `BOS` to set the model download source back to Baidu Object Storage (BOS). + - Added service invocation examples for six languages—C++, Java, Go, C#, Node.js, and PHP—for pipelines like PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4. + - Improved the layout partition sorting algorithm in the PP-StructureV3 pipeline, enhancing the sorting logic for complex vertical layouts to deliver better results. + - Enhanced model selection logic: when a language is specified but a model version is not, the system will automatically select the latest model version supporting that language. + - Set a default upper limit for MKL-DNN cache size to prevent unlimited growth, while also allowing users to configure cache capacity. + - Updated default configurations for high-performance inference to support Paddle MKL-DNN acceleration and optimized the logic for automatic configuration selection for smarter choices. + - Adjusted the logic for obtaining the default device to consider the actual support for computing devices by the installed Paddle framework, making program behavior more intuitive. + - Added Android example for PP-OCRv5. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/on_device_deployment.html). + +- **Bug Fixes:** + + - Fixed an issue with some CLI parameters in PP-StructureV3 not taking effect. + - Resolved an issue where `export_paddlex_config_to_yaml` would not function correctly in certain cases. + - Corrected the discrepancy between the actual behavior of `save_path` and its documentation description. + - Fixed potential multithreading errors when using MKL-DNN in basic service deployment. + - Corrected channel order errors in image preprocessing for the Latex-OCR model. + - Fixed channel order errors in saving visualized images within the text recognition module. + - Resolved channel order errors in visualized table results within PP-StructureV3 pipeline. + - Fixed an overflow issue in the calculation of `overlap_ratio` under extremely special circumstances in the PP-StructureV3 pipeline. + +- **Documentation Improvements:** + + - Updated the description of the `enable_mkldnn` parameter in the documentation to accurately reflect the program's actual behavior. + - Fixed errors in the documentation regarding the `lang` and `ocr_version` parameters. + - Added instructions for exporting production line configuration files via CLI. + - Fixed missing columns in the performance data table for PP-OCRv5. + - Refined benchmark metrics for PP-StructureV3 across different configurations. + +- **Others:** + + - Relaxed version restrictions on dependencies like numpy and pandas, restoring support for Python 3.12. + +
+ History Log + +#### **2025.06.05: Release of PaddleOCR 3.0.1, includes:** - **Optimisation of certain models and model configurations:** - Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64. @@ -68,20 +107,7 @@ In addition to providing an outstanding model library, PaddleOCR 3.0 also offers 2. 💻 Native support for **ERINE4.5 Turbo**, with compatibility for large-model deployments via PaddleNLP, Ollama, vLLM, and more. 3. 🤝 Integrated [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2), enabling extraction and understanding of printed text, handwriting, seals, tables, charts, and other common elements in complex documents. -
- The history of updates - - -- 🔥🔥2025.03.07: Release of **PaddleOCR v2.10**, including: - - - **12 new self-developed models:** - - **[Layout Detection series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)**(3 models): PP-DocLayout-L, M, and S -- capable of detecting 23 common layout types across diverse document formats(papers, reports, exams, books, magazines, contracts, etc.) in English and Chinese. Achieves up to **90.4% mAP@0.5** , and lightweight features can process over 100 pages per second. - - **[Formula Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)**(2 models): PP-FormulaNet-L and S -- supports recognition of 50,000+ LaTeX expressions, handling both printed and handwritten formulas. PP-FormulaNet-L offers **6% higher accuracy** than comparable models; PP-FormulaNet-S is 16x faster while maintaining similar accuracy. - - **[Table Structure Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)**(2 models): SLANeXt_wired and SLANeXt_wireless -- newly developed models with **6% accuracy improvement** over SLANet_plus in complex table recognition. - - **[Table Classification](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)**(1 model): -PP-LCNet_x1_0_table_cls -- an ultra-lightweight classifier for wired and wireless tables. - -[Learn more](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html) +[History Log](https://paddlepaddle.github.io/PaddleOCR/latest/en/update/update.html)
diff --git a/docs/update/update.en.md b/docs/update/update.en.md index a42cbe6cb5..e61ad77ad0 100644 --- a/docs/update/update.en.md +++ b/docs/update/update.en.md @@ -6,26 +6,61 @@ hide: --- ### Recently Update +#### **🔥🔥 2025.06.19: Release of PaddleOCR v3.0.2, which includes:** + +- **New Features:** + + - The default download source has been changed from `BOS` to `HuggingFace`. Users can also change the environment variable `PADDLE_PDX_MODEL_SOURCE` to `BOS` to set the model download source back to Baidu Object Storage (BOS). + - Added service invocation examples for six languages—C++, Java, Go, C#, Node.js, and PHP—for pipelines like PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4. + - Improved the layout partition sorting algorithm in the PP-StructureV3 pipeline, enhancing the sorting logic for complex vertical layouts to deliver better results. + - Enhanced model selection logic: when a language is specified but a model version is not, the system will automatically select the latest model version supporting that language. + - Set a default upper limit for MKL-DNN cache size to prevent unlimited growth, while also allowing users to configure cache capacity. + - Updated default configurations for high-performance inference to support Paddle MKL-DNN acceleration and optimized the logic for automatic configuration selection for smarter choices. + - Adjusted the logic for obtaining the default device to consider the actual support for computing devices by the installed Paddle framework, making program behavior more intuitive. + - Added Android example for PP-OCRv5. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/on_device_deployment.html). + +- **Bug Fixes:** + + - Fixed an issue with some CLI parameters in PP-StructureV3 not taking effect. + - Resolved an issue where `export_paddlex_config_to_yaml` would not function correctly in certain cases. + - Corrected the discrepancy between the actual behavior of `save_path` and its documentation description. + - Fixed potential multithreading errors when using MKL-DNN in basic service deployment. + - Corrected channel order errors in image preprocessing for the Latex-OCR model. + - Fixed channel order errors in saving visualized images within the text recognition module. + - Resolved channel order errors in visualized table results within PP-StructureV3 pipeline. + - Fixed an overflow issue in the calculation of `overlap_ratio` under extremely special circumstances in the PP-StructureV3 pipeline. + +- **Documentation Improvements:** + + - Updated the description of the `enable_mkldnn` parameter in the documentation to accurately reflect the program's actual behavior. + - Fixed errors in the documentation regarding the `lang` and `ocr_version` parameters. + - Added instructions for exporting production line configuration files via CLI. + - Fixed missing columns in the performance data table for PP-OCRv5. + - Refined benchmark metrics for PP-StructureV3 pipeline across different configurations. + +- **Others:** + + - Relaxed version restrictions on dependencies like numpy and pandas, restoring support for Python 3.12. #### **🔥🔥 2025.06.05: Release of PaddleOCR v3.0.1, which includes:** - **Optimisation of certain models and model configurations:** - - Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64. - - Added a new text line orientation classification model `PP-LCNet_x1_0_textline_ori` with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model. - - Optimised the text line orientation classification model `PP-LCNet_x0_25_textline_ori`, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%. + - Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64. + - Added a new text line orientation classification model `PP-LCNet_x1_0_textline_ori` with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model. + - Optimised the text line orientation classification model `PP-LCNet_x0_25_textline_ori`, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%. - **Optimisation of issues present in version 3.0.0:** - - **Improved CLI usage experience:** When using the PaddleOCR CLI without passing any parameters, a usage prompt is now provided. - - **New parameters added:** PP-ChatOCRv3 and PP-StructureV3 now support the `use_textline_orientation` parameter. - - **CPU inference speed optimisation:** All pipeline CPU inferences now enable MKL-DNN by default. - - **Support for C++ inference:** The detection and recognition concatenation part of PP-OCRv5 now supports C++ inference. + - **Improved CLI usage experience:** When using the PaddleOCR CLI without passing any parameters, a usage prompt is now provided. + - **New parameters added:** PP-ChatOCRv3 and PP-StructureV3 now support the `use_textline_orientation` parameter. + - **CPU inference speed optimisation:** All pipeline CPU inferences now enable MKL-DNN by default. + - **Support for C++ inference:** The detection and recognition concatenation part of PP-OCRv5 now supports C++ inference. - **Fixes for issues present in version 3.0.0:** - - Fixed an issue where PP-StructureV3 encountered CPU inference errors due to the inability to use MKL-DNN with formula and table recognition models. - - Fixed an issue where GPU environments encountered the error `FatalError: Process abort signal is detected by the operating system` during inference. - - Fixed type hint issues in some Python 3.8 environments. - - Fixed the issue where the method `PPStructureV3.concatenate_markdown_pages` was missing. - - Fixed an issue where specifying both `lang` and `model_name` when instantiating `paddleocr.PaddleOCR` resulted in `model_name` being ineffective. + - Fixed an issue where PP-StructureV3 encountered CPU inference errors due to the inability to use MKL-DNN with formula and table recognition models. + - Fixed an issue where GPU environments encountered the error `FatalError: Process abort signal is detected by the operating system` during inference. + - Fixed type hint issues in some Python 3.8 environments. + - Fixed the issue where the method `PPStructureV3.concatenate_markdown_pages` was missing. + - Fixed an issue where specifying both `lang` and `model_name` when instantiating `paddleocr.PaddleOCR` resulted in `model_name` being ineffective. #### **🔥🔥 2025.05.20: PaddleOCR 3.0 Official Release Highlights** diff --git a/docs/update/update.md b/docs/update/update.md index 9f9d60dba8..c856202c7e 100644 --- a/docs/update/update.md +++ b/docs/update/update.md @@ -7,22 +7,54 @@ hide: ### 更新 +#### **🔥🔥2025.06.19: PaddleOCR v3.0.2 版本发布,包含:** + +- **功能新增:** + - 模型默认下载源从`BOS`改为`HuggingFace`,同时也支持用户通过更改环境变量`PADDLE_PDX_MODEL_SOURCE`为`BOS`,将模型下载源设置为百度云对象存储BOS。 + - PP-OCRv5、PP-StructureV3、PP-ChatOCRv4等pipeline新增C++、Java、Go、C#、Node.js、PHP 6种语言的服务调用示例。 + - 优化PP-StructureV3产线中版面分区排序算法,对复杂竖版版面排序逻辑进行完善,进一步提升了复杂版面排序效果。 + - 优化模型选择逻辑,当指定语言、未指定模型版本时,自动选择支持该语言的最新版本的模型。 @timminator + - 为MKL-DNN缓存大小设置默认上界,防止缓存无限增长。同时,支持用户配置缓存容量。@timminator + - 更新高性能推理默认配置,支持Paddle MKL-DNN加速。优化高性能推理自动配置逻辑,支持更智能的配置选择。 + - 调整默认设备获取逻辑,考虑环境中安装的Paddle框架对计算设备的实际支持情况,使程序行为更符合直觉。 + - 新增PP-OCRv5的Android端示例,[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/deployment/on_device_deployment.html)。 + +- **Bug修复:** + - 修复PP-StructureV3部分CLI参数不生效的问题。 + - 修复部分情况下`export_paddlex_config_to_yaml`无法正常工作的问题。 + - 修复save_path实际行为与文档描述不符的问题。 + - 修复基础服务化部署在使用MKL-DNN时可能出现的多线程错误。 + - 修复Latex-OCR模型的图像预处理的通道顺序错误。 + - 修复文本识别模块保存可视化图像的通道顺序错误。 + - 修复PP-StructureV3中表格可视化结果通道顺序错误。 + - 修复PP-StructureV3产线中极特殊的情况下,计算overlap_ratio时,变量溢出问题。 + +- **文档优化:** + - 更新文档中对`enable_mkldnn`参数的说明,使其更准确地描述程序的实际行为。 + - 修复文档中对`lang`和`ocr_version`参数描述的错误。 + - 补充通过CLI导出产线配置文件的说明。 + - 修复PP-OCRv5性能数据表格中的列缺失问题。 + - 润色PP-StructureV3在不同配置下的benchmark指标。 + +- **其他:** + - 放松numpy、pandas等依赖的版本限制,恢复对Python 3.12的支持。 + #### **🔥🔥2025.06.05: PaddleOCR v3.0.1 版本发布,包含:** - **优化部分模型和模型配置:** - - 更新 PP-OCRv5默认模型配置,检测和识别均由mobile改为server模型。为了改善大多数的场景默认效果,配置中的参数`limit_side_len`由736改为64 - - 新增文本行方向分类`PP-LCNet_x1_0_textline_ori`模型,精度99.42%,OCR、PP-StructureV3、PP-ChatOCRv4产线的默认文本行方向分类器改为该模型 - - 优化文本行方向分类`PP-LCNet_x0_25_textline_ori`模型,精度提升3.3个百分点,当前精度98.85% + - 更新 PP-OCRv5默认模型配置,检测和识别均由mobile改为server模型。为了改善大多数的场景默认效果,配置中的参数`limit_side_len`由736改为64 + - 新增文本行方向分类`PP-LCNet_x1_0_textline_ori`模型,精度99.42%,OCR、PP-StructureV3、PP-ChatOCRv4产线的默认文本行方向分类器改为该模型 + - 优化文本行方向分类`PP-LCNet_x0_25_textline_ori`模型,精度提升3.3个百分点,当前精度98.85% - **优化3.0.0版本部分存在的问题** - - **优化CLI使用体验:** 当使用PaddleOCR CLI不传入任何参数时,给出用法提示。 - - **新增参数:** PP-ChatOCRv3、PP-StructureV3支持`use_textline_orientation`参数。 - - **CPU推理速度优化:** 所有产线CPU推理默认开启MKL-DNN。 - - **C++推理支持:** PP-OCRv5的检测和识别串联部分支持C++推理 + - **优化CLI使用体验:** 当使用PaddleOCR CLI不传入任何参数时,给出用法提示。 + - **新增参数:** PP-ChatOCRv3、PP-StructureV3支持`use_textline_orientation`参数。 + - **CPU推理速度优化:** 所有产线CPU推理默认开启MKL-DNN。 + - **C++推理支持:** PP-OCRv5的检测和识别串联部分支持C++推理 - **修复3.0.0版本部分存在的问题** - - 修复由于公式识别、表格识别模型无法使用MKL-DNN导致PP-StructureV3在部分cpu推理报错的问题 - - 修复在部分GPU环境中推理报`FatalError: Process abort signal is detected by the operating system`错误的问题 - - 修复部分Python3.8环境的type hint的问题 - - 修复`PPStructureV3.concatenate_markdown_pages`方法不存在的问题。 - - 修复实例化`paddleocr.PaddleOCR`时同时指定`lang`和`model_name`时`model_name`不生效的问题。 + - 修复由于公式识别、表格识别模型无法使用MKL-DNN导致PP-StructureV3在部分cpu推理报错的问题 + - 修复在部分GPU环境中推理报`FatalError: Process abort signal is detected by the operating system`错误的问题 + - 修复部分Python3.8环境的type hint的问题 + - 修复`PPStructureV3.concatenate_markdown_pages`方法不存在的问题。 + - 修复实例化`paddleocr.PaddleOCR`时同时指定`lang`和`model_name`时`model_name`不生效的问题。 #### **🔥🔥2025.05.20: PaddleOCR 3.0 正式发布,包含:**