update for 3.0.2 (#15774)

* update for 3.0.2

* fix typo
This commit is contained in:
cuicheng01 2025-06-19 00:34:34 +08:00 committed by GitHub
parent d7ac87f37d
commit 4602329be9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 171 additions and 39 deletions

View File

@ -39,6 +39,42 @@ PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具
## 📣 最新动态
🔥🔥2025.06.19: **PaddleOCR 3.0.2** 发布,包含:
- **功能新增:**
- 模型默认下载源从`BOS`改为`HuggingFace`,同时也支持用户通过更改环境变量`PADDLE_PDX_MODEL_SOURCE``BOS`将模型下载源设置为百度云对象存储BOS。
- PP-OCRv5、PP-StructureV3、PP-ChatOCRv4等pipeline新增C++、Java、Go、C#、Node.js、PHP 6种语言的服务调用示例。
- 优化PP-StructureV3产线中版面分区排序算法对复杂竖版版面排序逻辑进行完善进一步提升了复杂版面排序效果。
- 优化模型选择逻辑,当指定语言、未指定模型版本时,自动选择支持该语言的最新版本的模型。
- 为MKL-DNN缓存大小设置默认上界防止缓存无限增长。同时支持用户配置缓存容量。
- 更新高性能推理默认配置支持Paddle MKL-DNN加速。优化高性能推理自动配置逻辑支持更智能的配置选择。
- 调整默认设备获取逻辑考虑环境中安装的Paddle框架对计算设备的实际支持情况使程序行为更符合直觉。
- 新增PP-OCRv5的Android端示例[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/deployment/on_device_deployment.html)。
- **Bug修复**
- 修复PP-StructureV3部分CLI参数不生效的问题。
- 修复部分情况下`export_paddlex_config_to_yaml`无法正常工作的问题。
- 修复save_path实际行为与文档描述不符的问题。
- 修复基础服务化部署在使用MKL-DNN时可能出现的多线程错误。
- 修复Latex-OCR模型的图像预处理的通道顺序错误。
- 修复文本识别模块保存可视化图像的通道顺序错误。
- 修复PP-StructureV3中表格可视化结果通道顺序错误。
- 修复PP-StructureV3产线中极特殊的情况下计算overlap_ratio时变量溢出问题。
- **文档优化:**
- 更新文档中对`enable_mkldnn`参数的说明,使其更准确地描述程序的实际行为。
- 修复文档中对`lang``ocr_version`参数描述的错误。
- 补充通过CLI导出产线配置文件的说明。
- 修复PP-OCRv5性能数据表格中的列缺失问题。
- 润色PP-StructureV3在不同配置下的benchmark指标。
- **其他:**
- 放松numpy、pandas等依赖的版本限制恢复对Python 3.12的支持。
<details>
<summary><strong>历史日志</strong></summary>
🔥🔥2025.06.05: **PaddleOCR 3.0.1** 发布,包含:
- **优化部分模型和模型配置:**
@ -65,6 +101,9 @@ PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具
2. 💻 原生支持**文心大模型4.5 Turbo**,还兼容 PaddleNLP、Ollama、vLLM 等工具部署的大模型。
3. 🤝 集成 [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2),支持印刷文字、手写体文字、印章信息、表格、图表等常见的复杂文档信息抽取和理解的能力。
[更多日志](https://paddlepaddle.github.io/PaddleOCR/latest/update/update.html)
</details>
## ⚡ 快速开始
### 1. 在线体验

View File

@ -44,7 +44,46 @@ In addition to providing an outstanding model library, PaddleOCR 3.0 also offers
## 📣 Recent updates
#### **🔥🔥 2025.06.05: Release of PaddleOCR 3.0.1, includes:**
#### 🔥🔥**2025.06.19: Release of PaddleOCR 3.0.2, includes:**
- **New Features:**
- The default download source has been changed from `BOS` to `HuggingFace`. Users can also change the environment variable `PADDLE_PDX_MODEL_SOURCE` to `BOS` to set the model download source back to Baidu Object Storage (BOS).
- Added service invocation examples for six languages—C++, Java, Go, C#, Node.js, and PHP—for pipelines like PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4.
- Improved the layout partition sorting algorithm in the PP-StructureV3 pipeline, enhancing the sorting logic for complex vertical layouts to deliver better results.
- Enhanced model selection logic: when a language is specified but a model version is not, the system will automatically select the latest model version supporting that language.
- Set a default upper limit for MKL-DNN cache size to prevent unlimited growth, while also allowing users to configure cache capacity.
- Updated default configurations for high-performance inference to support Paddle MKL-DNN acceleration and optimized the logic for automatic configuration selection for smarter choices.
- Adjusted the logic for obtaining the default device to consider the actual support for computing devices by the installed Paddle framework, making program behavior more intuitive.
- Added Android example for PP-OCRv5. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/on_device_deployment.html).
- **Bug Fixes:**
- Fixed an issue with some CLI parameters in PP-StructureV3 not taking effect.
- Resolved an issue where `export_paddlex_config_to_yaml` would not function correctly in certain cases.
- Corrected the discrepancy between the actual behavior of `save_path` and its documentation description.
- Fixed potential multithreading errors when using MKL-DNN in basic service deployment.
- Corrected channel order errors in image preprocessing for the Latex-OCR model.
- Fixed channel order errors in saving visualized images within the text recognition module.
- Resolved channel order errors in visualized table results within PP-StructureV3 pipeline.
- Fixed an overflow issue in the calculation of `overlap_ratio` under extremely special circumstances in the PP-StructureV3 pipeline.
- **Documentation Improvements:**
- Updated the description of the `enable_mkldnn` parameter in the documentation to accurately reflect the program's actual behavior.
- Fixed errors in the documentation regarding the `lang` and `ocr_version` parameters.
- Added instructions for exporting production line configuration files via CLI.
- Fixed missing columns in the performance data table for PP-OCRv5.
- Refined benchmark metrics for PP-StructureV3 across different configurations.
- **Others:**
- Relaxed version restrictions on dependencies like numpy and pandas, restoring support for Python 3.12.
<details>
<summary><strong>History Log</strong></summary>
#### **2025.06.05: Release of PaddleOCR 3.0.1, includes:**
- **Optimisation of certain models and model configurations:**
- Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64.
@ -68,20 +107,7 @@ In addition to providing an outstanding model library, PaddleOCR 3.0 also offers
2. 💻 Native support for **ERINE4.5 Turbo**, with compatibility for large-model deployments via PaddleNLP, Ollama, vLLM, and more.
3. 🤝 Integrated [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2), enabling extraction and understanding of printed text, handwriting, seals, tables, charts, and other common elements in complex documents.
<details>
<summary><strong>The history of updates </strong></summary>
- 🔥🔥2025.03.07: Release of **PaddleOCR v2.10**, including:
- **12 new self-developed models:**
- **[Layout Detection series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)**(3 models): PP-DocLayout-L, M, and S -- capable of detecting 23 common layout types across diverse document formats(papers, reports, exams, books, magazines, contracts, etc.) in English and Chinese. Achieves up to **90.4% mAP@0.5** , and lightweight features can process over 100 pages per second.
- **[Formula Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)**(2 models): PP-FormulaNet-L and S -- supports recognition of 50,000+ LaTeX expressions, handling both printed and handwritten formulas. PP-FormulaNet-L offers **6% higher accuracy** than comparable models; PP-FormulaNet-S is 16x faster while maintaining similar accuracy.
- **[Table Structure Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)**(2 models): SLANeXt_wired and SLANeXt_wireless -- newly developed models with **6% accuracy improvement** over SLANet_plus in complex table recognition.
- **[Table Classification](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)**(1 model):
PP-LCNet_x1_0_table_cls -- an ultra-lightweight classifier for wired and wireless tables.
[Learn more](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html)
[History Log](https://paddlepaddle.github.io/PaddleOCR/latest/en/update/update.html)
</details>

View File

@ -6,26 +6,61 @@ hide:
---
### Recently Update
#### **🔥🔥 2025.06.19: Release of PaddleOCR v3.0.2, which includes:**
- **New Features:**
- The default download source has been changed from `BOS` to `HuggingFace`. Users can also change the environment variable `PADDLE_PDX_MODEL_SOURCE` to `BOS` to set the model download source back to Baidu Object Storage (BOS).
- Added service invocation examples for six languages—C++, Java, Go, C#, Node.js, and PHP—for pipelines like PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4.
- Improved the layout partition sorting algorithm in the PP-StructureV3 pipeline, enhancing the sorting logic for complex vertical layouts to deliver better results.
- Enhanced model selection logic: when a language is specified but a model version is not, the system will automatically select the latest model version supporting that language.
- Set a default upper limit for MKL-DNN cache size to prevent unlimited growth, while also allowing users to configure cache capacity.
- Updated default configurations for high-performance inference to support Paddle MKL-DNN acceleration and optimized the logic for automatic configuration selection for smarter choices.
- Adjusted the logic for obtaining the default device to consider the actual support for computing devices by the installed Paddle framework, making program behavior more intuitive.
- Added Android example for PP-OCRv5. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/on_device_deployment.html).
- **Bug Fixes:**
- Fixed an issue with some CLI parameters in PP-StructureV3 not taking effect.
- Resolved an issue where `export_paddlex_config_to_yaml` would not function correctly in certain cases.
- Corrected the discrepancy between the actual behavior of `save_path` and its documentation description.
- Fixed potential multithreading errors when using MKL-DNN in basic service deployment.
- Corrected channel order errors in image preprocessing for the Latex-OCR model.
- Fixed channel order errors in saving visualized images within the text recognition module.
- Resolved channel order errors in visualized table results within PP-StructureV3 pipeline.
- Fixed an overflow issue in the calculation of `overlap_ratio` under extremely special circumstances in the PP-StructureV3 pipeline.
- **Documentation Improvements:**
- Updated the description of the `enable_mkldnn` parameter in the documentation to accurately reflect the program's actual behavior.
- Fixed errors in the documentation regarding the `lang` and `ocr_version` parameters.
- Added instructions for exporting production line configuration files via CLI.
- Fixed missing columns in the performance data table for PP-OCRv5.
- Refined benchmark metrics for PP-StructureV3 pipeline across different configurations.
- **Others:**
- Relaxed version restrictions on dependencies like numpy and pandas, restoring support for Python 3.12.
#### **🔥🔥 2025.06.05: Release of PaddleOCR v3.0.1, which includes:**
- **Optimisation of certain models and model configurations:**
- Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64.
- Added a new text line orientation classification model `PP-LCNet_x1_0_textline_ori` with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model.
- Optimised the text line orientation classification model `PP-LCNet_x0_25_textline_ori`, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%.
- Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64.
- Added a new text line orientation classification model `PP-LCNet_x1_0_textline_ori` with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model.
- Optimised the text line orientation classification model `PP-LCNet_x0_25_textline_ori`, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%.
- **Optimisation of issues present in version 3.0.0:**
- **Improved CLI usage experience:** When using the PaddleOCR CLI without passing any parameters, a usage prompt is now provided.
- **New parameters added:** PP-ChatOCRv3 and PP-StructureV3 now support the `use_textline_orientation` parameter.
- **CPU inference speed optimisation:** All pipeline CPU inferences now enable MKL-DNN by default.
- **Support for C++ inference:** The detection and recognition concatenation part of PP-OCRv5 now supports C++ inference.
- **Improved CLI usage experience:** When using the PaddleOCR CLI without passing any parameters, a usage prompt is now provided.
- **New parameters added:** PP-ChatOCRv3 and PP-StructureV3 now support the `use_textline_orientation` parameter.
- **CPU inference speed optimisation:** All pipeline CPU inferences now enable MKL-DNN by default.
- **Support for C++ inference:** The detection and recognition concatenation part of PP-OCRv5 now supports C++ inference.
- **Fixes for issues present in version 3.0.0:**
- Fixed an issue where PP-StructureV3 encountered CPU inference errors due to the inability to use MKL-DNN with formula and table recognition models.
- Fixed an issue where GPU environments encountered the error `FatalError: Process abort signal is detected by the operating system` during inference.
- Fixed type hint issues in some Python 3.8 environments.
- Fixed the issue where the method `PPStructureV3.concatenate_markdown_pages` was missing.
- Fixed an issue where specifying both `lang` and `model_name` when instantiating `paddleocr.PaddleOCR` resulted in `model_name` being ineffective.
- Fixed an issue where PP-StructureV3 encountered CPU inference errors due to the inability to use MKL-DNN with formula and table recognition models.
- Fixed an issue where GPU environments encountered the error `FatalError: Process abort signal is detected by the operating system` during inference.
- Fixed type hint issues in some Python 3.8 environments.
- Fixed the issue where the method `PPStructureV3.concatenate_markdown_pages` was missing.
- Fixed an issue where specifying both `lang` and `model_name` when instantiating `paddleocr.PaddleOCR` resulted in `model_name` being ineffective.
#### **🔥🔥 2025.05.20: PaddleOCR 3.0 Official Release Highlights**

View File

@ -7,22 +7,54 @@ hide:
### 更新
#### **🔥🔥2025.06.19: PaddleOCR v3.0.2 版本发布,包含:**
- **功能新增:**
- 模型默认下载源从`BOS`改为`HuggingFace`,同时也支持用户通过更改环境变量`PADDLE_PDX_MODEL_SOURCE``BOS`将模型下载源设置为百度云对象存储BOS。
- PP-OCRv5、PP-StructureV3、PP-ChatOCRv4等pipeline新增C++、Java、Go、C#、Node.js、PHP 6种语言的服务调用示例。
- 优化PP-StructureV3产线中版面分区排序算法对复杂竖版版面排序逻辑进行完善进一步提升了复杂版面排序效果。
- 优化模型选择逻辑,当指定语言、未指定模型版本时,自动选择支持该语言的最新版本的模型。 @timminator
- 为MKL-DNN缓存大小设置默认上界防止缓存无限增长。同时支持用户配置缓存容量。@timminator
- 更新高性能推理默认配置支持Paddle MKL-DNN加速。优化高性能推理自动配置逻辑支持更智能的配置选择。
- 调整默认设备获取逻辑考虑环境中安装的Paddle框架对计算设备的实际支持情况使程序行为更符合直觉。
- 新增PP-OCRv5的Android端示例[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/deployment/on_device_deployment.html)。
- **Bug修复**
- 修复PP-StructureV3部分CLI参数不生效的问题。
- 修复部分情况下`export_paddlex_config_to_yaml`无法正常工作的问题。
- 修复save_path实际行为与文档描述不符的问题。
- 修复基础服务化部署在使用MKL-DNN时可能出现的多线程错误。
- 修复Latex-OCR模型的图像预处理的通道顺序错误。
- 修复文本识别模块保存可视化图像的通道顺序错误。
- 修复PP-StructureV3中表格可视化结果通道顺序错误。
- 修复PP-StructureV3产线中极特殊的情况下计算overlap_ratio时变量溢出问题。
- **文档优化:**
- 更新文档中对`enable_mkldnn`参数的说明,使其更准确地描述程序的实际行为。
- 修复文档中对`lang``ocr_version`参数描述的错误。
- 补充通过CLI导出产线配置文件的说明。
- 修复PP-OCRv5性能数据表格中的列缺失问题。
- 润色PP-StructureV3在不同配置下的benchmark指标。
- **其他:**
- 放松numpy、pandas等依赖的版本限制恢复对Python 3.12的支持。
#### **🔥🔥2025.06.05: PaddleOCR v3.0.1 版本发布,包含:**
- **优化部分模型和模型配置:**
- 更新 PP-OCRv5默认模型配置检测和识别均由mobile改为server模型。为了改善大多数的场景默认效果配置中的参数`limit_side_len`由736改为64
- 新增文本行方向分类`PP-LCNet_x1_0_textline_ori`模型精度99.42%OCR、PP-StructureV3、PP-ChatOCRv4产线的默认文本行方向分类器改为该模型
- 优化文本行方向分类`PP-LCNet_x0_25_textline_ori`模型精度提升3.3个百分点当前精度98.85%
- 更新 PP-OCRv5默认模型配置检测和识别均由mobile改为server模型。为了改善大多数的场景默认效果配置中的参数`limit_side_len`由736改为64
- 新增文本行方向分类`PP-LCNet_x1_0_textline_ori`模型精度99.42%OCR、PP-StructureV3、PP-ChatOCRv4产线的默认文本行方向分类器改为该模型
- 优化文本行方向分类`PP-LCNet_x0_25_textline_ori`模型精度提升3.3个百分点当前精度98.85%
- **优化3.0.0版本部分存在的问题**
- **优化CLI使用体验** 当使用PaddleOCR CLI不传入任何参数时给出用法提示。
- **新增参数:** PP-ChatOCRv3、PP-StructureV3支持`use_textline_orientation`参数。
- **CPU推理速度优化** 所有产线CPU推理默认开启MKL-DNN。
- **C++推理支持:** PP-OCRv5的检测和识别串联部分支持C++推理
- **优化CLI使用体验** 当使用PaddleOCR CLI不传入任何参数时给出用法提示。
- **新增参数:** PP-ChatOCRv3、PP-StructureV3支持`use_textline_orientation`参数。
- **CPU推理速度优化** 所有产线CPU推理默认开启MKL-DNN。
- **C++推理支持:** PP-OCRv5的检测和识别串联部分支持C++推理
- **修复3.0.0版本部分存在的问题**
- 修复由于公式识别、表格识别模型无法使用MKL-DNN导致PP-StructureV3在部分cpu推理报错的问题
- 修复在部分GPU环境中推理报`FatalError: Process abort signal is detected by the operating system`错误的问题
- 修复部分Python3.8环境的type hint的问题
- 修复`PPStructureV3.concatenate_markdown_pages`方法不存在的问题。
- 修复实例化`paddleocr.PaddleOCR`时同时指定`lang``model_name``model_name`不生效的问题。
- 修复由于公式识别、表格识别模型无法使用MKL-DNN导致PP-StructureV3在部分cpu推理报错的问题
- 修复在部分GPU环境中推理报`FatalError: Process abort signal is detected by the operating system`错误的问题
- 修复部分Python3.8环境的type hint的问题
- 修复`PPStructureV3.concatenate_markdown_pages`方法不存在的问题。
- 修复实例化`paddleocr.PaddleOCR`时同时指定`lang``model_name``model_name`不生效的问题。
#### **🔥🔥2025.05.20: PaddleOCR 3.0 正式发布,包含:**