[doc] add hareware support (#16725)

* Add hardware support

* Add hardware support

* fix

* update

* update
This commit is contained in:
zhang-prog 2025-10-19 01:12:17 +08:00 committed by GitHub
parent fb259db4ad
commit a034bf87ba
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 70 additions and 8 deletions

View File

@ -6,6 +6,8 @@ comments: true
PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl/metrics/allmetric.png"/>
## 1. Environment Preparation
Install PaddlePaddle and PaddleOCR:
@ -17,6 +19,35 @@ python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors
```
> For Windows users, please use WSL or a Docker container.
Running the PaddleOCR-VL has the following GPU hardware requirements:
<table border="1">
<thead>
<tr>
<th>Inference Method</th>
<th>GPU Compute Capability</th>
</tr>
</thead>
<tbody>
<tr>
<td>PaddlePaddle</td>
<td>≥ 8.5</td>
</tr>
<tr>
<td>vLLM</td>
<td>≥ 8 (RTX 3060, RTX 5070, A10, A100, ...) <br />
7 ≤ GPU Compute Capability < 8 (T4, V100, ...) Supported but may experience issues like request timeouts, OOM errors, etc. Not recommended for use.
</td>
</tr>
<tr>
<td>SGLang</td>
<td>8 ≤ GPU Compute Capability < 12</td>
</tr>
</tbody>
</table>
The PaddleOCR-VL currently does not support CPU or Arm architecture. Support for more hardware will be expanded based on actual requirements in the future. Stay tuned!
## 2. Quick Start
PaddleOCR-VL supports two usage methods: CLI command line and Python API. The CLI command line method is simpler and suitable for quickly verifying functionality, while the Python API method is more flexible and suitable for integration into existing projects.
@ -889,7 +920,7 @@ docker run \
paddlex_genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm
```
If you are using an NVIDIA 50 series graphics card (Compute Capacity >= 12), you need to install a specific version of FlashAttention before launching the service.
If you are using an NVIDIA 50 series graphics card (Compute Capability >= 12), you need to install a specific version of FlashAttention before launching the service.
```bash
docker run \
@ -926,16 +957,16 @@ paddleocr install_genai_server_deps <name of the inference acceleration framewor
The currently supported frameworks are named `vllm` and `sglang`, corresponding to vLLM and SGLang, respectively.
If you are using an NVIDIA 50 series graphics card (Compute Capacity >= 12), you need to install a specific version of FlashAttention before launching the service.
If you are using an NVIDIA 50 series graphics card (Compute Capability >= 12), you need to install a specific version of FlashAttention before launching the service.
```bash
python -m pip install flash-attn==2.8.3
```
After installation, you can start the service using the `paddleocr genai_server` command:
After installation, you can start the service using the `paddlex_genai_server` command:
```bash
paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
```
The parameters supported by this command are as follows:

View File

@ -6,6 +6,8 @@ comments: true
PaddleOCR-VL 是一款先进、高效的文档解析模型,专为文档中的元素识别设计。其核心组件为 PaddleOCR-VL-0.9B这是一种紧凑而强大的视觉语言模型VLM它由 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型组成,能够实现精准的元素识别。该模型支持 109 种语言并在识别复杂元素如文本、表格、公式和图表方面表现出色同时保持极低的资源消耗。通过在广泛使用的公开基准与内部基准上的全面评测PaddleOCR-VL 在页级级文档解析与元素级识别均达到 SOTA 表现。它显著优于现有的基于Pipeline方案和文档解析多模态方案以及先进的通用多模态大模型并具备更快的推理速度。这些优势使其非常适合在真实场景中落地部署。
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl/metrics/allmetric.png"/>
## 1. 环境准备
安装 PaddlePaddle 和 PaddleOCR:
@ -17,6 +19,35 @@ python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors
```
> 对于 Windows 用户,请使用 WSL 或者 Docker 进行环境搭建。
运行 PaddleOCR-VL 对 GPU 硬件有以下要求:
<table border="1">
<thead>
<tr>
<th>推理方式</th>
<th>GPU Compute Capability</th>
</tr>
</thead>
<tbody>
<tr>
<td>PaddlePaddle</td>
<td>≥ 8.5</td>
</tr>
<tr>
<td>vLLM</td>
<td>≥ 8 RTX 3060RTX 5070A10A100, ... <br />
7 ≤ GPU Compute Capability < 8 T4V100...支持运行但可能出现请求超时OOM 等异常情况不推荐使用
</td>
</tr>
<tr>
<td>SGLang</td>
<td>8 ≤ GPU Compute Capability < 12</td>
</tr>
</tbody>
</table>
目前 PaddleOCR-VL 暂不支持 CPU 及 Arm 架构,后续将根据实际需求扩展更多硬件支持,敬请期待!
## 2. 快速开始
PaddleOCR-VL 支持 CLI 命令行方式和 Python API 两种使用方式,其中 CLI 命令行方式更简单,适合快速验证功能,而 Python API 方式更灵活,适合集成到现有项目中。
@ -927,7 +958,7 @@ docker run \
paddlex_genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm
```
若您使用的是 NVIDIA 50 系显卡 (Compute Capacity >= 12),需要在启动服务前安装指定版本的 FlashAttention:
若您使用的是 NVIDIA 50 系显卡 (Compute Capability >= 12),需要在启动服务前安装指定版本的 FlashAttention:
```bash
docker run \
@ -964,16 +995,16 @@ paddleocr install_genai_server_deps <推理加速框架名称>
当前支持的框架名称为 `vllm``sglang`,分别对应 vLLM 和 SGLang。
若您使用的是 NVIDIA 50 系显卡 (Compute Capacity >= 12),需要在启动服务前安装指定版本的 FlashAttention:
若您使用的是 NVIDIA 50 系显卡 (Compute Capability >= 12),需要在启动服务前安装指定版本的 FlashAttention:
```bash
python -m pip install flash-attn==2.8.3
```
安装完成后,可通过 `paddleocr genai_server` 命令启动服务:
安装完成后,可通过 `paddlex_genai_server` 命令启动服务:
```bash
paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
```
该命令支持的参数如下: