[doc] add hareware support (#16725)

* Add hardware support * Add hardware support * fix * update * update
2025-12-26 22:48:21 +00:00 · 2025-10-19 01:12:17 +08:00 · 2025-10-19 01:12:17 +08:00 · a034bf87ba
commit a034bf87ba
parent fb259db4ad
2 changed files with 70 additions and 8 deletions
--- a/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md
+++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md
@ -6,6 +6,8 @@ comments: true

 PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.

+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl/metrics/allmetric.png"/>
+
 ## 1. Environment Preparation

 Install PaddlePaddle and PaddleOCR:
@ -17,6 +19,35 @@ python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors
 ```
 > For Windows users, please use WSL or a Docker container.

+Running the PaddleOCR-VL has the following GPU hardware requirements:
+
+<table border="1">
+<thead>
+  <tr>
+    <th>Inference Method</th>
+    <th>GPU Compute Capability</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>PaddlePaddle</td>
+    <td>≥ 8.5</td>
+  </tr>
+  <tr>
+    <td>vLLM</td>
+    <td>≥ 8 (RTX 3060, RTX 5070, A10, A100, ...) <br />  
+    7 ≤ GPU Compute Capability < 8 (T4, V100, ...) Supported but may experience issues like request timeouts, OOM errors, etc. Not recommended for use.
+    </td>
+  </tr>
+  <tr>
+    <td>SGLang</td>
+    <td>8 ≤ GPU Compute Capability < 12</td>
+  </tr>
+</tbody>
+</table>
+
+The PaddleOCR-VL currently does not support CPU or Arm architecture. Support for more hardware will be expanded based on actual requirements in the future. Stay tuned!
+
 ## 2. Quick Start

 PaddleOCR-VL supports two usage methods: CLI command line and Python API. The CLI command line method is simpler and suitable for quickly verifying functionality, while the Python API method is more flexible and suitable for integration into existing projects.
@ -889,7 +920,7 @@ docker run \
    paddlex_genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm
 ```

-If you are using an NVIDIA 50 series graphics card (Compute Capacity >= 12), you need to install a specific version of FlashAttention before launching the service.
+If you are using an NVIDIA 50 series graphics card (Compute Capability >= 12), you need to install a specific version of FlashAttention before launching the service.

 ```bash
 docker run \
@ -926,16 +957,16 @@ paddleocr install_genai_server_deps <name of the inference acceleration framewor

 The currently supported frameworks are named `vllm` and `sglang`, corresponding to vLLM and SGLang, respectively.

-If you are using an NVIDIA 50 series graphics card (Compute Capacity >= 12), you need to install a specific version of FlashAttention before launching the service.
+If you are using an NVIDIA 50 series graphics card (Compute Capability >= 12), you need to install a specific version of FlashAttention before launching the service.

 ```bash
 python -m pip install flash-attn==2.8.3
 ```

-After installation, you can start the service using the `paddleocr genai_server` command:
+After installation, you can start the service using the `paddlex_genai_server` command:

 ```bash
-paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
+paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
 ```

 The parameters supported by this command are as follows:
--- a/docs/version3.x/pipeline_usage/PaddleOCR-VL.md
+++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL.md
@ -6,6 +6,8 @@ comments: true

 PaddleOCR-VL 是一款先进、高效的文档解析模型，专为文档中的元素识别设计。其核心组件为 PaddleOCR-VL-0.9B，这是一种紧凑而强大的视觉语言模型（VLM），它由 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型组成，能够实现精准的元素识别。该模型支持 109 种语言，并在识别复杂元素（如文本、表格、公式和图表）方面表现出色，同时保持极低的资源消耗。通过在广泛使用的公开基准与内部基准上的全面评测，PaddleOCR-VL 在页级级文档解析与元素级识别均达到 SOTA 表现。它显著优于现有的基于Pipeline方案和文档解析多模态方案以及先进的通用多模态大模型，并具备更快的推理速度。这些优势使其非常适合在真实场景中落地部署。

+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl/metrics/allmetric.png"/>
+
 ## 1. 环境准备

 安装 PaddlePaddle 和 PaddleOCR:
@ -17,6 +19,35 @@ python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors
 ```
 > 对于 Windows 用户，请使用 WSL 或者 Docker 进行环境搭建。

+运行 PaddleOCR-VL 对 GPU 硬件有以下要求：
+
+<table border="1">
+<thead>
+  <tr>
+    <th>推理方式</th>
+    <th>GPU Compute Capability</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>PaddlePaddle</td>
+    <td>≥ 8.5</td>
+  </tr>
+  <tr>
+    <td>vLLM</td>
+    <td>≥ 8 （RTX 3060，RTX 5070，A10，A100, ...） <br />  
+    7 ≤ GPU Compute Capability < 8 （T4，V100，...）支持运行，但可能出现请求超时、OOM 等异常情况，不推荐使用
+    </td>
+  </tr>
+  <tr>
+    <td>SGLang</td>
+    <td>8 ≤ GPU Compute Capability < 12</td>
+  </tr>
+</tbody>
+</table>
+
+目前 PaddleOCR-VL 暂不支持 CPU 及 Arm 架构，后续将根据实际需求扩展更多硬件支持，敬请期待！
+
 ## 2. 快速开始

 PaddleOCR-VL 支持 CLI 命令行方式和 Python API 两种使用方式，其中 CLI 命令行方式更简单，适合快速验证功能，而 Python API 方式更灵活，适合集成到现有项目中。
@ -927,7 +958,7 @@ docker run \
    paddlex_genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm
 ```

-若您使用的是  NVIDIA 50 系显卡 (Compute Capacity >= 12)，需要在启动服务前安装指定版本的 FlashAttention:
+若您使用的是  NVIDIA 50 系显卡 (Compute Capability >= 12)，需要在启动服务前安装指定版本的 FlashAttention:

 ```bash
 docker run \
@ -964,16 +995,16 @@ paddleocr install_genai_server_deps <推理加速框架名称>

 当前支持的框架名称为 `vllm` 和 `sglang`，分别对应 vLLM 和 SGLang。

-若您使用的是  NVIDIA 50 系显卡 (Compute Capacity >= 12)，需要在启动服务前安装指定版本的 FlashAttention:
+若您使用的是  NVIDIA 50 系显卡 (Compute Capability >= 12)，需要在启动服务前安装指定版本的 FlashAttention:

 ```bash
 python -m pip install flash-attn==2.8.3
 ```

-安装完成后，可通过 `paddleocr genai_server` 命令启动服务：
+安装完成后，可通过 `paddlex_genai_server` 命令启动服务：

 ```bash
-paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
+paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118
 ```

 该命令支持的参数如下：