mirror of
https://github.com/PaddlePaddle/PaddleOCR.git
synced 2025-06-26 21:24:27 +00:00
337 lines
19 KiB
Markdown
337 lines
19 KiB
Markdown
<div align="center">
|
||
<p>
|
||
<img width="100%" src="./docs/images/Banner.png" alt="PaddleOCR Banner">
|
||
</p>
|
||
|
||
<!-- language -->
|
||
English | [简体中文](./README_cn.md) | [繁體中文](./README_tcn.md) | [日本語](./README_ja.md) | [한국어](./README_ko.md) | [Français](./README_fr.md) | [Русский](./README_ru.md) | [Español](./README_es.md) | [العربية](./README_ar.md)
|
||
|
||
<!-- icon -->
|
||
|
||
[](https://github.com/PaddlePaddle/PaddleOCR)
|
||
[](https://pypi.org/project/PaddleOCR/)
|
||

|
||

|
||

|
||
|
||
|
||
[](https://aistudio.baidu.com/community/app/91660/webUI)
|
||
[](https://aistudio.baidu.com/community/app/518494/webUI)
|
||
[](https://aistudio.baidu.com/community/app/518493/webUI)
|
||
|
||
</div>
|
||
|
||
## 🚀 Introduction
|
||
Since its initial release, PaddleOCR has gained widespread acclaim across academia, industry, and research communities, thanks to its cutting-edge algorithms and proven performance in real-world applications. It's already powering popular open-source projects like Umi-OCR, OmniParser, MinerU, and RAGFlow, making it the go-to OCR toolkit for developers worldwide.
|
||
|
||
On May 20, 2025, the PaddlePaddle team unveiled PaddleOCR 3.0, fully compatible with the official release of the **PaddlePaddle 3.0** framework. This update further **boosts text-recognition accuracy**, adds support for **multiple text-type recognition** and **handwriting recognition**, and meets the growing demand from large-model applications for **high-precision parsing of complex documents**. When combined with the **ERNIE 4.5 Turbo**, it significantly enhances key-information extraction accuracy. PaddleOCR 3.0 also introduces support for Chinese Heterogeneous AI Accelerators such as **KUNLUNXIN** and **Ascend**. For the complete usage documentation, please refer to the [PaddleOCR 3.0 Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html).
|
||
|
||
Three Major New Features in PaddleOCR 3.0:
|
||
- Universal-Scene Text Recognition Model [PP-OCRv5](./docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation. [Online Demo](https://aistudio.baidu.com/community/app/91660/webUI)
|
||
|
||
- General Document-Parsing Solution [PP-StructureV3](./docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks. [Online Demo](https://aistudio.baidu.com/community/app/518494/webUI)
|
||
|
||
- Intelligent Document-Understanding Solution [PP-ChatOCRv4](./docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the ERNIE 4.5 Turbo, achieving 15 percentage points higher accuracy than its predecessor. [Online Demo](https://aistudio.baidu.com/community/app/518493/webUI)
|
||
|
||
In addition to providing an outstanding model library, PaddleOCR 3.0 also offers user-friendly tools covering model training, inference, and service deployment, so developers can rapidly bring AI applications to production.
|
||
<div align="center">
|
||
<p>
|
||
<img width="100%" src="./docs/images/Arch.png" alt="PaddleOCR Architecture">
|
||
</p>
|
||
</div>
|
||
|
||
|
||
|
||
## 📣 Recent updates
|
||
|
||
#### **2025.06.26: Release of PaddleOCR 3.0.3**, includes:
|
||
- Bug Fix: Resolved the issue where the `enable_mkldnn` parameter was not effective, restoring the default behavior of using MKL-DNN for CPU inference.
|
||
|
||
#### 🔥🔥 **2025.06.19: Release of PaddleOCR 3.0.2**, includes:
|
||
|
||
- **New Features:**
|
||
|
||
- The default download source has been changed from `BOS` to `HuggingFace`. Users can also change the environment variable `PADDLE_PDX_MODEL_SOURCE` to `BOS` to set the model download source back to Baidu Object Storage (BOS).
|
||
- Added service invocation examples for six languages—C++, Java, Go, C#, Node.js, and PHP—for pipelines like PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4.
|
||
- Improved the layout partition sorting algorithm in the PP-StructureV3 pipeline, enhancing the sorting logic for complex vertical layouts to deliver better results.
|
||
- Enhanced model selection logic: when a language is specified but a model version is not, the system will automatically select the latest model version supporting that language.
|
||
- Set a default upper limit for MKL-DNN cache size to prevent unlimited growth, while also allowing users to configure cache capacity.
|
||
- Updated default configurations for high-performance inference to support Paddle MKL-DNN acceleration and optimized the logic for automatic configuration selection for smarter choices.
|
||
- Adjusted the logic for obtaining the default device to consider the actual support for computing devices by the installed Paddle framework, making program behavior more intuitive.
|
||
- Added Android example for PP-OCRv5. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/on_device_deployment.html).
|
||
|
||
- **Bug Fixes:**
|
||
|
||
- Fixed an issue with some CLI parameters in PP-StructureV3 not taking effect.
|
||
- Resolved an issue where `export_paddlex_config_to_yaml` would not function correctly in certain cases.
|
||
- Corrected the discrepancy between the actual behavior of `save_path` and its documentation description.
|
||
- Fixed potential multithreading errors when using MKL-DNN in basic service deployment.
|
||
- Corrected channel order errors in image preprocessing for the Latex-OCR model.
|
||
- Fixed channel order errors in saving visualized images within the text recognition module.
|
||
- Resolved channel order errors in visualized table results within PP-StructureV3 pipeline.
|
||
- Fixed an overflow issue in the calculation of `overlap_ratio` under extremely special circumstances in the PP-StructureV3 pipeline.
|
||
|
||
- **Documentation Improvements:**
|
||
|
||
- Updated the description of the `enable_mkldnn` parameter in the documentation to accurately reflect the program's actual behavior.
|
||
- Fixed errors in the documentation regarding the `lang` and `ocr_version` parameters.
|
||
- Added instructions for exporting production line configuration files via CLI.
|
||
- Fixed missing columns in the performance data table for PP-OCRv5.
|
||
- Refined benchmark metrics for PP-StructureV3 across different configurations.
|
||
|
||
- **Others:**
|
||
|
||
- Relaxed version restrictions on dependencies like numpy and pandas, restoring support for Python 3.12.
|
||
|
||
<details>
|
||
<summary><strong>History Log</strong></summary>
|
||
|
||
#### **🔥🔥 2025.06.05: Release of PaddleOCR 3.0.1, includes:**
|
||
|
||
- **Optimisation of certain models and model configurations:**
|
||
- Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64.
|
||
- Added a new text line orientation classification model `PP-LCNet_x1_0_textline_ori` with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model.
|
||
- Optimised the text line orientation classification model `PP-LCNet_x0_25_textline_ori`, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%.
|
||
|
||
- **Optimizations and fixes for some issues in version 3.0.0, [details](https://paddlepaddle.github.io/PaddleOCR/latest/en/update/update.html)**
|
||
|
||
🔥🔥2025.05.20: Official Release of **PaddleOCR v3.0**, including:
|
||
- **PP-OCRv5**: High-Accuracy Text Recognition Model for All Scenarios - Instant Text from Images/PDFs.
|
||
1. 🌐 Single-model support for **five** text types - Seamlessly process **Simplified Chinese, Traditional Chinese, Simplified Chinese Pinyin, English** and **Japanese** within a single model.
|
||
2. ✍️ Improved **handwriting recognition**: Significantly better at complex cursive scripts and non-standard handwriting.
|
||
3. 🎯 **13-point accuracy gain** over PP-OCRv4, achieving state-of-the-art performance across a variety of real-world scenarios.
|
||
|
||
- **PP-StructureV3**: General-Purpose Document Parsing – Unleash SOTA Images/PDFs Parsing for Real-World Scenarios!
|
||
1. 🧮 **High-Accuracy multi-scene PDF parsing**, leading both open- and closed-source solutions on the OmniDocBench benchmark.
|
||
2. 🧠 Specialized capabilities include **seal recognition**, **chart-to-table conversion**, **table recognition with nested formulas/images**, **vertical text document parsing**, and **complex table structure analysis**.
|
||
|
||
- **PP-ChatOCRv4**: Intelligent Document Understanding – Extract Key Information, not just text from Images/PDFs.
|
||
1. 🔥 **15-point accuracy gain** in key-information extraction on PDF/PNG/JPG files over the previous generation.
|
||
2. 💻 Native support for **ERNIE 4.5 Turbo**, with compatibility for large-model deployments via PaddleNLP, Ollama, vLLM, and more.
|
||
3. 🤝 Integrated [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2), enabling extraction and understanding of printed text, handwriting, seals, tables, charts, and other common elements in complex documents.
|
||
|
||
[History Log](https://paddlepaddle.github.io/PaddleOCR/latest/en/update/update.html)
|
||
|
||
</details>
|
||
|
||
## ⚡ Quick Start
|
||
### 1. Run online demo
|
||
[](https://aistudio.baidu.com/community/app/91660/webUI)
|
||
[](https://aistudio.baidu.com/community/app/518494/webUI)
|
||
[](https://aistudio.baidu.com/community/app/518493/webUI)
|
||
|
||
### 2. Installation
|
||
|
||
Install PaddlePaddle refer to [Installation Guide](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html), after then, install the PaddleOCR toolkit.
|
||
|
||
```bash
|
||
# Install paddleocr
|
||
pip install paddleocr
|
||
```
|
||
|
||
### 3. Run inference by CLI
|
||
```bash
|
||
# Run PP-OCRv5 inference
|
||
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False
|
||
|
||
# Run PP-StructureV3 inference
|
||
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png --use_doc_orientation_classify False --use_doc_unwarping False
|
||
|
||
# Get the Qianfan API Key at first, and then run PP-ChatOCRv4 inference
|
||
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --use_doc_orientation_classify False --use_doc_unwarping False
|
||
|
||
# Get more information about "paddleocr ocr"
|
||
paddleocr ocr --help
|
||
```
|
||
|
||
### 4. Run inference by API
|
||
**4.1 PP-OCRv5 Example**
|
||
```python
|
||
# Initialize PaddleOCR instance
|
||
from paddleocr import PaddleOCR
|
||
ocr = PaddleOCR(
|
||
use_doc_orientation_classify=False,
|
||
use_doc_unwarping=False,
|
||
use_textline_orientation=False)
|
||
|
||
# Run OCR inference on a sample image
|
||
result = ocr.predict(
|
||
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
|
||
|
||
# Visualize the results and save the JSON results
|
||
for res in result:
|
||
res.print()
|
||
res.save_to_img("output")
|
||
res.save_to_json("output")
|
||
```
|
||
|
||
<details>
|
||
<summary><strong>4.2 PP-StructureV3 Example</strong></summary>
|
||
|
||
```python
|
||
from pathlib import Path
|
||
from paddleocr import PPStructureV3
|
||
|
||
pipeline = PPStructureV3(
|
||
use_doc_orientation_classify=False,
|
||
use_doc_unwarping=False
|
||
)
|
||
|
||
# For Image
|
||
output = pipeline.predict(
|
||
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png",
|
||
)
|
||
|
||
# Visualize the results and save the JSON results
|
||
for res in output:
|
||
res.print()
|
||
res.save_to_json(save_path="output")
|
||
res.save_to_markdown(save_path="output")
|
||
```
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><strong>4.3 PP-ChatOCRv4 Example</strong></summary>
|
||
|
||
```python
|
||
from paddleocr import PPChatOCRv4Doc
|
||
|
||
chat_bot_config = {
|
||
"module_name": "chat_bot",
|
||
"model_name": "ernie-3.5-8k",
|
||
"base_url": "https://qianfan.baidubce.com/v2",
|
||
"api_type": "openai",
|
||
"api_key": "api_key", # your api_key
|
||
}
|
||
|
||
retriever_config = {
|
||
"module_name": "retriever",
|
||
"model_name": "embedding-v1",
|
||
"base_url": "https://qianfan.baidubce.com/v2",
|
||
"api_type": "qianfan",
|
||
"api_key": "api_key", # your api_key
|
||
}
|
||
|
||
pipeline = PPChatOCRv4Doc(
|
||
use_doc_orientation_classify=False,
|
||
use_doc_unwarping=False
|
||
)
|
||
|
||
visual_predict_res = pipeline.visual_predict(
|
||
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
|
||
use_common_ocr=True,
|
||
use_seal_recognition=True,
|
||
use_table_recognition=True,
|
||
)
|
||
|
||
mllm_predict_info = None
|
||
use_mllm = False
|
||
# If a multimodal large model is used, the local mllm service needs to be started. You can refer to the documentation: https://github.com/PaddlePaddle/PaddleX/blob/release/3.0/docs/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.en.md performs deployment and updates the mllm_chat_bot_config configuration.
|
||
if use_mllm:
|
||
mllm_chat_bot_config = {
|
||
"module_name": "chat_bot",
|
||
"model_name": "PP-DocBee",
|
||
"base_url": "http://127.0.0.1:8080/", # your local mllm service url
|
||
"api_type": "openai",
|
||
"api_key": "api_key", # your api_key
|
||
}
|
||
|
||
mllm_predict_res = pipeline.mllm_pred(
|
||
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
|
||
key_list=["驾驶室准乘人数"],
|
||
mllm_chat_bot_config=mllm_chat_bot_config,
|
||
)
|
||
mllm_predict_info = mllm_predict_res["mllm_res"]
|
||
|
||
visual_info_list = []
|
||
for res in visual_predict_res:
|
||
visual_info_list.append(res["visual_info"])
|
||
layout_parsing_result = res["layout_parsing_result"]
|
||
|
||
vector_info = pipeline.build_vector(
|
||
visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
|
||
)
|
||
chat_result = pipeline.chat(
|
||
key_list=["驾驶室准乘人数"],
|
||
visual_info=visual_info_list,
|
||
vector_info=vector_info,
|
||
mllm_predict_info=mllm_predict_info,
|
||
chat_bot_config=chat_bot_config,
|
||
retriever_config=retriever_config,
|
||
)
|
||
print(chat_result)
|
||
```
|
||
|
||
</details>
|
||
|
||
### 5. Chinese Heterogeneous AI Accelerators
|
||
- [Huawei Ascend](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/other_devices_support/paddlepaddle_install_NPU.html)
|
||
- [KUNLUNXIN](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/other_devices_support/paddlepaddle_install_XPU.html)
|
||
|
||
## ⛰️ Advanced Tutorials
|
||
- [PP-OCRv5 Tutorial](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/OCR.html)
|
||
- [PP-StructureV3 Tutorial](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/PP-StructureV3.html)
|
||
- [PP-ChatOCRv4 Tutorial](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/PP-ChatOCRv4.html)
|
||
|
||
## 🔄 Quick Overview of Execution Results
|
||
|
||
<div align="center">
|
||
<p>
|
||
<img width="100%" src="./docs/images/demo.gif" alt="PP-OCRv5 Demo">
|
||
</p>
|
||
</div>
|
||
|
||
<div align="center">
|
||
<p>
|
||
<img width="100%" src="./docs/images/blue_v3.gif" alt="PP-StructureV3 Demo">
|
||
</p>
|
||
</div>
|
||
|
||
## 👩👩👧👦 Community
|
||
|
||
| PaddlePaddle WeChat official account | Join the tech discussion group |
|
||
| :---: | :---: |
|
||
| <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr/README/qrcode_for_paddlepaddle_official_account.jpg" width="150"> | <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr/README/qr_code_for_the_questionnaire.jpg" width="150"> |
|
||
|
||
|
||
## 😃 Awesome Projects Leveraging PaddleOCR
|
||
PaddleOCR wouldn't be where it is today without its incredible community! 💗 A massive thank you to all our longtime partners, new collaborators, and everyone who's poured their passion into PaddleOCR — whether we've named you or not. Your support fuels our fire!
|
||
|
||
| Project Name | Description |
|
||
| ------------ | ----------- |
|
||
| [RAGFlow](https://github.com/infiniflow/ragflow) <a href="https://github.com/infiniflow/ragflow"><img src="https://img.shields.io/github/stars/infiniflow/ragflow"></a>|RAG engine based on deep document understanding.|
|
||
| [MinerU](https://github.com/opendatalab/MinerU) <a href="https://github.com/opendatalab/MinerU"><img src="https://img.shields.io/github/stars/opendatalab/MinerU"></a>|Multi-type Document to Markdown Conversion Tool|
|
||
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR) <a href="https://github.com/hiroi-sora/Umi-OCR"><img src="https://img.shields.io/github/stars/hiroi-sora/Umi-OCR"></a>|Free, Open-source, Batch Offline OCR Software.|
|
||
| [OmniParser](https://github.com/microsoft/OmniParser)<a href="https://github.com/microsoft/OmniParser"><img src="https://img.shields.io/github/stars/microsoft/OmniParser"></a> |OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent.|
|
||
| [QAnything](https://github.com/netease-youdao/QAnything)<a href="https://github.com/netease-youdao/QAnything"><img src="https://img.shields.io/github/stars/netease-youdao/QAnything"></a> |Question and Answer based on Anything.|
|
||
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) <a href="https://github.com/opendatalab/PDF-Extract-Kit"><img src="https://img.shields.io/github/stars/opendatalab/PDF-Extract-Kit"></a>|A powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents.|
|
||
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)<a href="https://github.com/PantsuDango/Dango-Translator"><img src="https://img.shields.io/github/stars/PantsuDango/Dango-Translator"></a> |Recognize text on the screen, translate it and show the translation results in real time.|
|
||
| [Learn more projects](./awesome_projects.md) | [More projects based on PaddleOCR](./awesome_projects.md)|
|
||
|
||
## 👩👩👧👦 Contributors
|
||
|
||
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors">
|
||
<img src="https://contrib.rocks/image?repo=PaddlePaddle/PaddleOCR&max=400&columns=20" width="800"/>
|
||
</a>
|
||
|
||
|
||
## 🌟 Star
|
||
|
||
[](https://star-history.com/#PaddlePaddle/PaddleOCR&Date)
|
||
|
||
|
||
## 📄 License
|
||
This project is released under the [Apache 2.0 license](LICENSE).
|
||
|
||
## 🎓 Citation
|
||
|
||
```
|
||
@misc{paddleocr2020,
|
||
title={PaddleOCR, Awesome multilingual OCR toolkits based on PaddlePaddle.},
|
||
author={PaddlePaddle Authors},
|
||
howpublished = {\url{https://github.com/PaddlePaddle/PaddleOCR}},
|
||
year={2020}
|
||
}
|
||
```
|