[Feat] Mcp draft version for ocrv5 and structurev3 (#15604)

* Add MCP OCR server draft version

* update code review

* structure can return images

* refine code and code review

* fix images return logic

* refractor structure for abstract layer

* Fix bugs and enhance code

* Use string literal for output mode

* update images logic for service

* update readme and config example

* update readme and config example

* Fix bugs and add

* refine structure image logic, now can show positions in texts

* update readme file based on code review

* update readme file

* update readme file

* udpate readme

* udpate readme

* Polish doc

* add en readme

* Refactor docs and update installation guide

---------

Co-authored-by: Bobholamovic <mhlin425@whu.edu.cn>
This commit is contained in:
Yiiii0 2025-06-13 18:36:44 +08:00 committed by GitHub
parent 8e7994992b
commit 3ce3dc56fa
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 1342 additions and 0 deletions

View File

@ -0,0 +1,194 @@
# PaddleOCR MCP Server
[![PaddleOCR](https://img.shields.io/badge/OCR-PaddleOCR-orange)](https://github.com/PaddlePaddle/PaddleOCR)
[![FastMCP](https://img.shields.io/badge/Built%20with-FastMCP%20v2-blue)](https://gofastmcp.com)
This project provides a lightweight [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) server designed to integrate the powerful capabilities of PaddleOCR into a compatible MCP Host.
### Key Features
- **Currently Supported Pipelines**
- **OCR**: Performs text detection and recognition on images and PDF files.
- **PP-StructureV3**: Recognizes and extracts text blocks, titles, paragraphs, images, tables, and other layout elements from an image or PDF file, converting the input into a Markdown document.
- **Supports the following working modes**:
- **Local**: Runs the PaddleOCR pipeline directly on your machine using the installed Python library.
- **AI Studio**: Calls cloud services provided by the Paddle AI Studio community.
- **Self-hosted**: Calls a PaddleOCR service that you deploy yourself (serving).
### Table of Contents
- [1. Installation](#1-installation)
- [2. Quick Start](#2-quick-start)
- [3. Configuration](#3-configuration)
- [3.1. MCP Host Configuration](#31-mcp-host-configuration)
- [3.2. Working Modes Explained](#32-working-modes-explained)
- [Mode 1: AI Studio Service (`aistudio`)](#mode-1-ai-studio-service-aistudio)
- [Mode 2: Local Python Library (`local`)](#mode-2-local-python-library-local)
- [Mode 3: Self-hosted Service (`self_hosted`)](#mode-3-self-hosted-service-self_hosted)
- [4. Parameter Reference](#4-parameter-reference)
- [5. Configuration Examples](#5-configuration-examples)
- [5.1 AI Studio Service Configuration](#51-ai-studio-service-configuration)
- [5.2 Local Python Library Configuration](#52-local-python-library-configuration)
- [5.3 Self-hosted Service Configuration](#53-self-hosted-service-configuration)
## 1. Installation
```bash
# Install the wheel
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl
# Or, install from source
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
# pip install -e mcp_server
```
Some [working modes](#32-working-modes-explained) may require additional dependencies.
## 2. Quick Start
This section guides you through a quick setup using **Claude Desktop** as the MCP Host and the **AI Studio** mode. This mode is recommended for new users as it does not require complex local dependencies. Please refer to [3. Configuration](#3-configuration) for other working modes and more configuration options.
1. **Prepare the AI Studio Service**
- Visit the [Paddle AI Studio community](https://aistudio.baidu.com/pipeline/mine) and log in.
- In the "PaddleX Pipeline" section under "More" on the left, navigate to [Create Pipeline] - [OCR] - [General OCR] - [Deploy Directly] - [Text Recognition Module, select PP-OCRv5_server_rec] - [Start Deployment].
- Once deployed, obtain your **Service Base URL** (e.g., `https://xxxxxx.aistudio-hub.baidu.com`).
- Get your **Access Token** from [this page](https://aistudio.baidu.com/index/accessToken).
2. **Locate the MCP Configuration File** - For details, refer to the [Official MCP Documentation](https://modelcontextprotocol.io/quickstart/user).
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/Claude/claude_desktop_config.json`
3. **Add MCP Server Configuration**
Open the `claude_desktop_config.json` file and add the configuration by referring to [5.1 AI Studio Service Configuration](#51-ai-studio-service-configuration).
**Note**:
- Do not leak your **Access Token**.
- If `paddleocr_mcp` is not in your system's `PATH`, set `command` to the absolute path of the executable.
4. **Restart the MCP Host**
Restart Claude Desktop. The new `paddleocr-ocr` tool should now be available in the application.
## 3. Configuration
### 3.1. MCP Host Configuration
In the Host's configuration file (e.g., `claude_desktop_config.json`), you need to define how to start the tool server. Key fields are:
- `command`: `paddleocr_mcp` (if the executable is in your `PATH`) or an absolute path.
- `args`: Configurable command-line arguments, e.g., `["--verbose"]`. See [4. Parameter Reference](#4-parameter-reference).
- `env`: Configurable environment variables. See [4. Parameter Reference](#4-parameter-reference).
### 3.2. Working Modes Explained
You can configure the MCP server to run in different modes based on your needs.
#### Mode 1: AI Studio Service (`aistudio`)
This mode calls services from the [Paddle AI Studio community](https://aistudio.baidu.com/pipeline/mine).
- **Use Case**: Ideal for quickly trying out features, validating solutions, and for no-code development scenarios.
- **Procedure**: Please refer to [2. Quick Start](#2-quick-start).
- In addition to using the platform's preset model solutions, you can also train and deploy custom models on the platform.
#### Mode 2: Local Python Library (`local`)
This mode runs the model directly on your local machine and has certain requirements for the local environment and computer performance. It relies on the installed `paddleocr` inference package.
- **Use Case**: Suitable for offline usage and scenarios with strict data privacy requirements.
- **Procedure**:
1. Refer to the [PaddleOCR Installation Guide](../installation.en.md) to install the *PaddlePaddle framework* and *PaddleOCR*. **It is strongly recommended to install them in a separate virtual environment** to avoid dependency conflicts.
2. Refer to [5.2 Local Python Library Configuration](#52-local-python-library-configuration) to modify the `claude_desktop_config.json` file.
#### Mode 3: Self-hosted Service (`self_hosted`)
This mode calls a PaddleOCR inference service that you have deployed yourself. This corresponds to the **Serving** solutions provided by PaddleX.
- **Use Case**: Offers the advantages of service-oriented deployment and high flexibility, making it well-suited for production environments, especially for scenarios requiring custom service configurations.
- **Procedure**:
1. Refer to the [PaddleOCR Installation Guide](../installation.en.md) to install the *PaddlePaddle framework* and *PaddleOCR*.
2. Refer to the [PaddleOCR Serving Deployment Guide](./serving.en.md) to run the server.
3. Refer to [5.3 Self-hosted Service Configuration](#53-self-hosted-service-configuration) to modify the `claude_desktop_config.json` file.
4. Set your service address in `PADDLEOCR_MCP_SERVER_URL` (e.g., `"http://127.0.0.1:8080"`).
## 4. Parameter Reference
You can control the server's behavior via environment variables or command-line arguments.
| Environment Variable | Command-line Argument | Type | Description | Options | Default |
|:---|:---|:---|:---|:---|:---|
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | The pipeline to run | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | The source of PaddleOCR capabilities | `"local"`, `"aistudio"`, `"self_hosted"` | `"local"` |
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | Base URL of the underlying service (required for `aistudio` or `self_hosted` mode) | - | `None` |
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio authentication token (required for `aistudio` mode) | - | `None` |
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | Request timeout for the underlying service (in seconds) | - | `30` |
| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | Specify the device for inference (only effective in `local` mode) | - | `None` |
| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | Path to the PaddleX pipeline configuration file (only effective in `local` mode) | - | `None` |
| - | `--http` | `bool` | Use HTTP transport instead of stdio (for remote deployment and multiple clients) | - | `False` |
| - | `--host` | `str` | Host address for HTTP mode | - | `"127.0.0.1"` |
| - | `--port` | `int` | Port for HTTP mode | - | `8080` |
| - | `--verbose` | `bool` | Enable verbose logging for debugging | - | `False` |
## 5. Configuration Examples
Below are complete configuration examples for different working modes. You can copy and modify them as needed.
### 5.1 AI Studio Service Configuration
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "OCR",
"PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>",
"PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
}
}
}
}
```
**Note**:
- Replace `<your-server-url>` with your AI Studio **Service Base URL**, e.g., `https://xxxxx.aistudio-hub.baidu.com`. Do not include endpoint paths (like `/ocr`).
- Replace `<your-access-token>` with your **Access Token**.
### 5.2 Local Python Library Configuration
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "OCR",
"PADDLEOCR_MCP_PPOCR_SOURCE": "local"
}
}
}
}
```
**Note**:
- `PADDLEOCR_MCP_PIPELINE_CONFIG` is optional. If not set, the default pipeline configuration is used. To adjust settings, such as changing models, refer to the [PaddleOCR and PaddleX documentation](../paddleocr_and_paddlex.en.md), export a pipeline configuration file, and set `PADDLEOCR_MCP_PIPELINE_CONFIG` to its absolute path.
### 5.3 Self-hosted Service Configuration
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "OCR",
"PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
}
}
}
}
```
**Note**:
- Replace `<your-server-url>` with the base URL of your underlying service (e.g., `http://127.0.0.1:8080`).

View File

@ -0,0 +1,194 @@
# PaddleOCR MCP 服务器
[![PaddleOCR](https://img.shields.io/badge/OCR-PaddleOCR-orange)](https://github.com/PaddlePaddle/PaddleOCR)
[![FastMCP](https://img.shields.io/badge/Built%20with-FastMCP%20v2-blue)](https://gofastmcp.com)
本项目提供一个轻量级的 [Model Context ProtocolMCP](https://modelcontextprotocol.io/introduction) 服务器,旨在将 PaddleOCR 的强大能力集成到兼容的 MCP Host 中。
### 主要功能
- **当前支持的工具**
- **OCR**:对图像和 PDF 文件进行文本检测与识别。
- **PP-StructureV3**:从图像或 PDF 文件中识别和提取文本块、标题、段落、图片、表格以及其他版面元素,将输入转换为 Markdown 文档。
- **支持运行在如下工作模式**
- **本地 Python 库**:在本机直接运行 PaddleOCR 产线。
- **星河社区服务**:调用飞桨星河社区提供的云端服务。
- **自托管服务**:调用用户自行部署的 PaddleOCR 服务。
### 目录
- [1. 安装](#1-安装)
- [2. 快速开始](#2-快速开始)
- [3. 配置说明](#3-配置说明)
- [3.1. MCP Host 配置](#31-mcp-host-配置)
- [3.2. 工作模式详解](#32-工作模式详解)
- [模式一:托管在星河社区的服务](#模式一托管在星河社区的服务-aistudio)
- [模式二:本地 Python 库](#模式二本地-python-库-local)
- [模式三:自托管服务](#模式三自托管服务-self_hosted)
- [4. 参数参考](#4-参数参考)
- [5. 配置示例](#5-配置示例)
- [5.1 星河社区服务配置](#51-ai-studio-星河社区服务配置)
- [5.2 本地 Python 库配置](#52-本地-python-库配置)
- [5.3 自托管服务配置](#53-自托管服务配置)
## 1. 安装
```bash
# 安装 wheel 包
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl
# 或者,从项目源码安装
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
# pip install -e mcp_server
```
部分 [工作模式](#32-工作模式详解) 可能需要安装额外依赖。
## 2. 快速开始
本节将以 **Claude Desktop** 作为 MCP Host并以 **星河社区服务** 工作模式为例,引导您完成快速配置。此模式无需在本地安装复杂的依赖,推荐新用户使用。请参考 [3. 配置说明](#3-配置说明) 了解其他工作模式的操作流程以及更多配置项。
1. **准备星河社区服务**
- 访问 [飞桨星河社区](https://aistudio.baidu.com/pipeline/mine) 并登录。
- 在左侧"更多内容"下的 "PaddleX 产线" 部分,[创建产线] - [OCR] - [通用 OCR] - [直接部署] - [文本识别模块,选择 PP-OCRv5_server_rec] - [开始部署]。
- 部署成功后,获取您的 **服务基础 URL**(示例:`https://xxxxxx.aistudio-hub.baidu.com`)。
- 在 [此页面](https://aistudio.baidu.com/index/accessToken) 获取您的 **访问令牌**
2. **定位 MCP 配置文件** - 详情请参考 [MCP 官方文档](https://modelcontextprotocol.io/quickstart/user)。
- **macOS**`~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**`%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**`~/.config/Claude/claude_desktop_config.json`
3. **添加 MCP 服务器配置**
打开 `claude_desktop_config.json` 文件,参考 [5.1 星河社区服务配置](#51-星河社区服务配置) 调整配置,填充到 `claude_desktop_config.json` 中。
**注意**
- 请勿泄漏您的 **访问令牌**
- 如果 `paddleocr_mcp` 无法在系统 `PATH` 中找到,请将 `command` 设置为可执行文件的绝对路径。
4. **重启 MCP Host**
重启 Claude Desktop。新的 `paddleocr-ocr` 工具现在应该可以在应用中使用了。
## 3. 配置说明
### 3.1. MCP Host 配置
在 Host 的配置文件中(如 `claude_desktop_config.json`),您需要定义工具服务器的启动方式。关键字段如下:
- `command``paddleocr_mcp`(如果可执行文件可在 `PATH` 中找到)或绝对路径。
- `args`:可配置命令行参数,如 `["--verbose"]`。详见 [4. 参数参考](#4-参数参考)。
- `env`:可配置环境变量。详见 [4. 参数参考](#4-参数参考)。
### 3.2. 工作模式详解
您可以根据需求配置 MCP 服务器,使其运行在不同的工作模式。
#### 模式一:托管在星河社区的服务 (`aistudio`)
此模式调用 [飞桨星河社区](https://aistudio.baidu.com/pipeline/mine) 的服务。
- **适用场景**:适合快速体验功能、快速验证方案等,也适用于零代码开发场景。
- **操作流程**:请参考 [2. 快速开始](#2-快速开始)。
- 除了使用平台预设的模型方案,您也可以在平台上自行训练并部署自定义模型。
#### 模式二:本地 Python 库 (`local`)
此模式直接在本地计算机上运行模型,对本地环境与计算机性能有一定要求。
- **适用场景**:需要离线使用、对数据隐私有严格要求的场景。
- **操作流程**
1. 参考 [PaddleOCR 安装文档](../installation.md) 安装 *飞桨框架**PaddleOCR*。为避免依赖冲突,**强烈建议在独立的虚拟环境中安装**。
2. 参考 [配置示例](#52-本地-python-库配置) 更改 `claude_desktop_config.json` 文件内容。
#### 模式三:自托管服务 (`self_hosted`)
此模式调用您自行部署的 PaddleOCR 推理服务。
- **适用场景**:具备服务化部署优势及高度灵活性,较适合生产环境,尤其是适用于需要自定义服务配置的场景。
- **操作流程**
1. 参考 [PaddleOCR 安装文档](../installation.md) 安装 *飞桨框架**PaddleOCR*
2. 参考 [PaddleOCR 服务化部署文档](./serving.md) 运行服务器。
3. 参考 [配置示例](#53-自托管服务配置) 更改 `claude_desktop_config.json` 文件内容。
4. 将您的服务地址填入 `PADDLEOCR_MCP_SERVER_URL` (例如:`"http://127.0.0.1:8000"`)。
## 4. 参数参考
您可以通过环境变量或命令行参数来控制服务器的行为。
| 环境变量 | 命令行参数 | 类型 | 描述 | 可选值 | 默认值 |
|:---------|:-----------|:-----|:-----|:-------|:-------|
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | 要运行的产线 | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | PaddleOCR 能力来源 | `"local"`, `"aistudio"`, `"self_hosted"` | `"local"` |
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | 底层服务基础 URL`aistudio``self_hosted` 模式下必需) | - | `None` |
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio 认证令牌(`aistudio` 模式下必需) | - | `None` |
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | 底层服务请求的超时时间(秒) | - | `30` |
| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | 指定运行推理的设备(仅在 `local` 模式下生效) | - | `None` |
| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | PaddleOCR 产线配置文件路径(仅在 `local` 模式下生效) | - | `None` |
| - | `--http` | `bool` | 使用 HTTP 传输而非 stdio适用于远程部署和多客户端 | - | `False` |
| - | `--host` | `str` | HTTP 模式的主机地址 | - | `"127.0.0.1"` |
| - | `--port` | `int` | HTTP 模式的端口 | - | `8000` |
| - | `--verbose` | `bool` | 启用详细日志记录,便于调试 | - | `False` |
## 5. 配置示例
以下是针对不同工作模式的完整配置示例,您可以直接复制并根据需要修改:
### 5.1 星河社区服务配置
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "OCR",
"PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>",
"PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
}
}
}
}
```
**说明**
- 将 `<your-server-url>` 替换为您的星河社区服务的 **服务基础 URL**,例如 `https://xxxxx.aistudio-hub.baidu.com`,注意不要带有端点路径(如 `/ocr`)。
- 将 `<your-access-token>` 替换为您的 **访问令牌**
### 5.2 本地 Python 库配置
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "OCR",
"PADDLEOCR_MCP_PPOCR_SOURCE": "local"
}
}
}
}
```
**说明**
- `PADDLEOCR_MCP_PIPELINE_CONFIG` 为可选项,不设置时使用产线默认配置。如需调整配置,例如更换模型,请参考 [PaddleOCR 文档](../paddleocr_and_paddlex.md) 导出产线配置文件,并将 `PADDLEOCR_MCP_PIPELINE_CONFIG` 设置为配置文件的绝对路径。
### 5.3 自托管服务配置
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "OCR",
"PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
}
}
}
}
```
**说明**
- 将 `<your-server-url>` 替换为底层服务的基础 URL`http://127.0.0.1:8000`)。

5
mcp_server/README.md Normal file
View File

@ -0,0 +1,5 @@
# PaddleOCR MCP 服务器
中文 | [English](./README_en.md)
请查看 [文档](../docs/version3.x/deployment/mcp_server.md)。

5
mcp_server/README_en.md Normal file
View File

@ -0,0 +1,5 @@
# PaddleOCR MCP Server
[中文](./README.md)| English
Please refer to the [documentation](../docs/version3.x/deployment/mcp_server.en.md)。

View File

@ -0,0 +1,15 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
__version__ = "0.1.0"

View File

@ -0,0 +1,187 @@
#!/usr/bin/env python3
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import contextlib
import os
import sys
from typing import AsyncIterator, Dict
from fastmcp import FastMCP
from .pipelines import create_pipeline_handler
def _parse_args() -> argparse.Namespace:
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description="PaddleOCR MCP server - Supports local library, AI Studio service, and self-hosted servers."
)
parser.add_argument(
"--pipeline",
choices=["OCR", "PP-StructureV3"],
default=os.getenv("PADDLEOCR_MCP_PIPELINE", "OCR"),
help="Pipeline name.",
)
parser.add_argument(
"--ppocr_source",
choices=["local", "aistudio", "self_hosted"],
default=os.getenv("PADDLEOCR_MCP_PPOCR_SOURCE", "local"),
help="Source of PaddleOCR functionality: local (local library), aistudio (AI Studio service), self_hosted (self-hosted server).",
)
parser.add_argument(
"--http",
action="store_true",
help="Use HTTP transport instead of STDIO (suitable for remote deployment and multiple clients).",
)
parser.add_argument(
"--host",
default="127.0.0.1",
help="Host address for HTTP mode (default: 127.0.0.1).",
)
parser.add_argument(
"--port",
type=int,
default=8000,
help="Port for HTTP mode (default: 8000).",
)
parser.add_argument(
"--verbose", action="store_true", help="Enable verbose logging for debugging."
)
# Local mode configuration
parser.add_argument(
"--pipeline_config",
default=os.getenv("PADDLEOCR_MCP_PIPELINE_CONFIG"),
help="PaddleOCR pipeline configuration file path (for local mode).",
)
parser.add_argument(
"--device",
default=os.getenv("PADDLEOCR_MCP_DEVICE"),
help="Device to run inference on.",
)
# Service mode configuration
parser.add_argument(
"--server_url",
default=os.getenv("PADDLEOCR_MCP_SERVER_URL"),
help="Base URL of the underlying server (required in service mode).",
)
parser.add_argument(
"--aistudio_access_token",
default=os.getenv("PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN"),
help="AI Studio access token (required for AI Studio).",
)
parser.add_argument(
"--timeout",
type=int,
default=int(os.getenv("PADDLEOCR_MCP_TIMEOUT", "30")),
help="API request timeout in seconds for the underlying server.",
)
args = parser.parse_args()
return args
def _validate_args(args: argparse.Namespace) -> None:
"""Validate command line arguments."""
if not args.http and (args.host != "127.0.0.1" or args.port != 8000):
print(
"Host and port arguments are only valid when using HTTP transport (see: `--http`).",
file=sys.stderr,
)
sys.exit(2)
if args.ppocr_source in ["aistudio", "self_hosted"]:
if not args.server_url:
print("Error: The server base URL is required.", file=sys.stderr)
print(
"Please either set `--server_url` or set the environment variable "
"`PADDLEOCR_MCP_SERVER_URL`.",
file=sys.stderr,
)
sys.exit(2)
if args.ppocr_source == "aistudio" and not args.aistudio_access_token:
print("Error: The AI Studio access token is required.", file=sys.stderr)
print(
"Please either set `--aistudio_access_token` or set the environment variable "
"`PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN`.",
file=sys.stderr,
)
sys.exit(2)
def main() -> None:
"""Main entry point."""
args = _parse_args()
_validate_args(args)
try:
pipeline_handler = create_pipeline_handler(
args.pipeline,
args.ppocr_source,
pipeline_config=args.pipeline_config,
device=args.device,
server_url=args.server_url,
aistudio_access_token=args.aistudio_access_token,
timeout=args.timeout,
)
except Exception as e:
print(f"Failed to create the pipeline handler: {e}", file=sys.stderr)
if args.verbose:
import traceback
traceback.print_exc(file=sys.stderr)
sys.exit(1)
@contextlib.asynccontextmanager
async def _lifespan(mcp: FastMCP) -> AsyncIterator[Dict]:
async with pipeline_handler:
yield {}
try:
server_name = f"PaddleOCR {args.pipeline} MCP server"
mcp = FastMCP(
name=server_name,
lifespan=_lifespan,
log_level="INFO" if args.verbose else "WARNING",
)
pipeline_handler.register_tools(mcp)
if args.http:
mcp.run(
transport="streamable-http",
host=args.host,
port=args.port,
)
else:
mcp.run()
except Exception as e:
print(f"Failed to start the server: {e}", file=sys.stderr)
if args.verbose:
import traceback
traceback.print_exc(file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,721 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import abc
import asyncio
import base64
import io
import json
import mimetypes
import re
from pathlib import PurePath
from queue import Queue
from threading import Thread
from typing import Any, Callable, Dict, List, Optional, Type, Union
from urllib.parse import urlparse
import httpx
import magic
import numpy as np
from fastmcp import Context, FastMCP
from mcp.types import ImageContent, TextContent
from PIL import Image as PILImage
from typing_extensions import Literal, Self, assert_never
try:
from paddleocr import PaddleOCR, PPStructureV3
LOCAL_OCR_AVAILABLE = True
except ImportError:
LOCAL_OCR_AVAILABLE = False
OutputMode = Literal["simple", "detailed"]
def _is_file_path(s: str) -> bool:
try:
PurePath(s)
return True
except Exception:
return False
def _is_base64(s: str) -> bool:
pattern = r"^[A-Za-z0-9+/]+={0,2}$"
return bool(re.fullmatch(pattern, s))
def _is_url(s: str) -> bool:
if not (s.startswith("http://") or s.startswith("https://")):
return False
result = urlparse(s)
return all([result.scheme, result.netloc]) and result.scheme in ("http", "https")
def _infer_file_type_from_url(url: str) -> str:
url_parts = urlparse(url)
filename = url_parts.path.split("/")[-1]
file_type = mimetypes.guess_type(filename)[0]
if not file_type:
return "UNKNOWN"
if file_type.startswith("image/"):
return "IMAGE"
elif file_type == "application/pdf":
return "PDF"
return "UNKNOWN"
def _infer_file_type_from_bytes(data: bytes) -> str:
mime = magic.from_buffer(data, mime=True)
if mime.startswith("image/"):
return "IMAGE"
elif mime == "application/pdf":
return "PDF"
return "UNKNOWN"
class _EngineWrapper:
def __init__(self, engine: Any) -> None:
self._engine = engine
self._queue: Queue = Queue()
self._closed = False
self._loop = asyncio.get_running_loop()
self._thread = Thread(target=self._worker, daemon=False)
self._thread.start()
@property
def engine(self) -> Any:
return self._engine
async def call(self, func: Callable, *args: Any, **kwargs: Any) -> Any:
if self._closed:
raise RuntimeError("Engine wrapper has already been closed")
fut = self._loop.create_future()
self._queue.put((func, args, kwargs, fut))
return await fut
async def close(self) -> None:
if not self._closed:
self._queue.put(None)
await self._loop.run_in_executor(None, self._thread.join)
def _worker(self) -> None:
while not self._closed:
item = self._queue.get()
if item is None:
break
func, args, kwargs, fut = item
try:
result = func(*args, **kwargs)
self._loop.call_soon_threadsafe(fut.set_result, result)
except Exception as e:
self._loop.call_soon_threadsafe(fut.set_exception, e)
finally:
self._queue.task_done()
class PipelineHandler(abc.ABC):
"""Abstract base class for pipeline handlers."""
def __init__(
self,
pipeline: str,
ppocr_source: str,
pipeline_config: Optional[str],
device: Optional[str],
server_url: Optional[str],
aistudio_access_token: Optional[str],
timeout: Optional[int],
) -> None:
"""Initialize the pipeline handler.
Args:
pipeline: Pipeline name.
ppocr_source: Source of PaddleOCR functionality.
pipeline_config: Path to pipeline configuration.
device: Device to run inference on.
server_url: Base URL for service mode.
aistudio_access_token: AI Studio access token.
timeout: Timeout in seconds.
"""
self._pipeline = pipeline
if ppocr_source == "local":
self._mode = "local"
elif ppocr_source in ("aistudio", "self_hosted"):
self._mode = "service"
else:
raise ValueError(f"Unknown PaddleOCR source {repr(ppocr_source)}")
self._ppocr_source = ppocr_source
self._pipeline_config = pipeline_config
self._device = device
self._server_url = server_url
self._aistudio_access_token = aistudio_access_token
self._timeout = timeout or 30 # Default timeout of 30 seconds
if self._mode == "local":
if not LOCAL_OCR_AVAILABLE:
raise RuntimeError("PaddleOCR is not locally available")
self._engine = self._create_local_engine()
self._status: Literal["initialized", "started", "stopped"] = "initialized"
async def start(self) -> None:
if self._status == "initialized":
if self._mode == "local":
self._engine_wrapper = _EngineWrapper(self._engine)
self._status = "started"
elif self._status == "started":
pass
elif self._status == "stopped":
raise RuntimeError("Pipeline handler has already been stopped")
else:
assert_never(self._status)
async def stop(self) -> None:
if self._status == "initialized":
raise RuntimeError("Pipeline handler has not been started")
elif self._status == "started":
if self._mode == "local":
await self._engine_wrapper.close()
self._status = "stopped"
elif self._status == "stopped":
pass
else:
assert_never(self._status)
async def __aenter__(self) -> Self:
await self.start()
return self
async def __aexit__(
self,
exc_type: Any,
exc_val: Any,
exc_tb: Any,
) -> None:
await self.stop()
@abc.abstractmethod
def register_tools(self, mcp: FastMCP) -> None:
"""Register tools with the MCP server.
Args:
mcp: The `FastMCP` instance.
"""
raise NotImplementedError
@abc.abstractmethod
def _create_local_engine(self) -> Any:
"""Create the local OCR engine.
Returns:
The OCR engine instance.
"""
raise NotImplementedError
class SimpleInferencePipelineHandler(PipelineHandler):
"""Base class for simple inference pipeline handlers."""
async def process(
self, input_data: str, output_mode: OutputMode, ctx: Context, **kwargs: Any
) -> Union[str, List[Union[TextContent, ImageContent]]]:
"""Process input data through the pipeline.
Args:
input_data: Input data (file path, URL, or Base64).
output_mode: Output mode ("simple" or "detailed").
ctx: MCP context.
**kwargs: Additional pipeline-specific arguments.
Returns:
Processed result in the requested output format.
"""
try:
await ctx.info(
f"Starting {self._pipeline} processing (source: {self._ppocr_source})"
)
if self._mode == "local":
processed_input = self._process_input_for_local(input_data)
raw_result = await self._predict_with_local_engine(
processed_input, ctx, **kwargs
)
result = self._parse_local_result(raw_result, ctx)
else:
processed_input, file_type = self._process_input_for_service(input_data)
raw_result = await self._call_service(
processed_input, file_type, ctx, **kwargs
)
result = await self._parse_service_result(raw_result, ctx)
await self._log_completion_stats(result, ctx)
return self._format_output(result, output_mode == "detailed", ctx)
except Exception as e:
await ctx.error(f"{self._pipeline} processing failed: {str(e)}")
return self._handle_error(str(e), output_mode)
def _process_input_for_local(self, input_data: str) -> Union[str, np.ndarray]:
if _is_file_path(input_data) or _is_url(input_data):
return input_data
elif _is_base64(input_data):
if input_data.startswith("data:"):
base64_data = input_data.split(",", 1)[1]
else:
base64_data = input_data
try:
image_bytes = base64.b64decode(base64_data)
image_pil = PILImage.open(io.BytesIO(image_bytes))
image_arr = np.array(image_pil.convert("RGB"))
# Convert RGB to BGR
return np.ascontiguousarray(image_arr[..., ::-1])
except Exception as e:
raise ValueError(f"Failed to decode Base64 image: {e}")
else:
raise ValueError("Invalid input data format")
def _process_input_for_service(self, input_data: str) -> tuple[str, str]:
if _is_file_path(input_data):
try:
with open(input_data, "rb") as f:
bytes_ = f.read()
input_data = base64.b64encode(bytes_).decode("ascii")
file_type = _infer_file_type_from_bytes(bytes_)
except Exception as e:
raise ValueError(f"Failed to read file: {e}")
elif _is_url(input_data):
file_type = _infer_file_type_from_url(input_data)
elif _is_base64(input_data):
try:
if input_data.startswith("data:"):
base64_data = input_data.split(",", 1)[1]
else:
base64_data = input_data
bytes_ = base64.b64decode(base64_data)
file_type = _infer_file_type_from_bytes(bytes_)
except Exception as e:
raise ValueError(f"Failed to decode Base64 data: {e}")
else:
raise ValueError("Invalid input data format")
return input_data, file_type
async def _call_service(
self, processed_input: str, file_type: str, ctx: Context, **kwargs: Any
) -> Dict[str, Any]:
if not self._server_url:
raise RuntimeError("Server URL not configured")
endpoint = self._get_service_endpoint()
url = f"{self._server_url.rstrip('/')}/{endpoint.lstrip('/')}"
payload = self._prepare_service_payload(processed_input, file_type, **kwargs)
headers = {"Content-Type": "application/json"}
if self._ppocr_source == "aistudio":
if not self._aistudio_access_token:
raise RuntimeError("Missing AI Studio access token")
headers["Authorization"] = f"token {self._aistudio_access_token}"
try:
async with httpx.AsyncClient(timeout=self._timeout) as client:
response = await client.post(url, json=payload, headers=headers)
response.raise_for_status()
return response.json()
except httpx.HTTPError as e:
raise RuntimeError(f"Service call failed: {str(e)}")
except json.JSONDecodeError as e:
raise RuntimeError(f"Invalid service response: {str(e)}")
def _prepare_service_payload(
self, processed_input: str, file_type: str, **kwargs: Any
) -> Dict[str, Any]:
api_file_type = 1 if file_type == "IMAGE" else 0
payload = {"file": processed_input, "fileType": api_file_type, **kwargs}
return payload
def _handle_error(
self, error_msg: str, output_mode: OutputMode
) -> Union[str, List[Union[TextContent, ImageContent]]]:
if output_mode == "detailed":
return [TextContent(type="text", text=f"Error: {error_msg}")]
return f"Error: {error_msg}"
@abc.abstractmethod
def _get_service_endpoint(self) -> str:
"""Get the service endpoint.
Returns:
Service endpoint path.
"""
raise NotImplementedError
@abc.abstractmethod
def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict[str, Any]:
"""Parse raw result from local engine into a unified format.
Args:
local_result: Raw result from local engine.
ctx: MCP context.
Returns:
Parsed result in unified format.
"""
raise NotImplementedError
@abc.abstractmethod
async def _parse_service_result(
self, service_result: Dict[str, Any], ctx: Context
) -> Dict[str, Any]:
"""Parse raw result from the service into a unified format.
Args:
service_result: Raw result from the service.
ctx: MCP context.
Returns:
Parsed result in unified format.
"""
raise NotImplementedError
@abc.abstractmethod
async def _log_completion_stats(self, result: Dict[str, Any], ctx: Context) -> None:
"""Log statistics after processing completion.
Args:
result: Processing result.
ctx: MCP context.
"""
raise NotImplementedError
@abc.abstractmethod
def _format_output(
self, result: Dict[str, Any], detailed: bool, ctx: Context
) -> Union[str, List[Union[TextContent, ImageContent]]]:
"""Format output into simple or detailed format.
Args:
result: Processing result.
detailed: Whether to use detailed format.
ctx: MCP context.
Returns:
Formatted output in requested format.
"""
raise NotImplementedError
async def _predict_with_local_engine(
self, processed_input: Union[str, np.ndarray], ctx: Context, **kwargs: Any
) -> Dict:
if not hasattr(self, "_engine_wrapper"):
raise RuntimeError("Engine wrapper has not been initialized")
return await self._engine_wrapper.call(
self._engine_wrapper.engine.predict, processed_input, **kwargs
)
class OCRHandler(SimpleInferencePipelineHandler):
def register_tools(self, mcp: FastMCP) -> None:
@mcp.tool()
async def _ocr(
input_data: str,
output_mode: OutputMode,
ctx: Context,
) -> Union[str, List[Union[TextContent, ImageContent]]]:
"""Extract text from images and PDFs.
Args:
input_data: File path, URL, or Base64 data.
output_mode: "simple" for clean text, "detailed" for JSON with positioning.
"""
return await self.process(input_data, output_mode, ctx)
def _create_local_engine(self) -> Any:
return PaddleOCR(
paddlex_config=self._pipeline_config,
device=self._device,
enable_mkldnn=False,
)
def _get_service_endpoint(self) -> str:
return "ocr"
def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict:
result = local_result[0]
texts = result["rec_texts"]
scores = result["rec_scores"]
boxes = result["rec_boxes"]
# Direct assembly
clean_texts, confidences, blocks = [], [], []
for i, text in enumerate(texts):
if text and text.strip():
conf = scores[i] if i < len(scores) else 0
clean_texts.append(text.strip())
confidences.append(conf)
block = {
"text": text.strip(),
"confidence": round(conf, 3),
"bbox": boxes[i].tolist(),
}
blocks.append(block)
return {
"text": "\n".join(clean_texts),
"confidence": sum(confidences) / len(confidences) if confidences else 0,
"blocks": blocks,
}
async def _parse_service_result(self, service_result: Dict, ctx: Context) -> Dict:
result_data = service_result.get("result", service_result)
ocr_results = result_data.get("ocrResults")
# Direct extraction and assembly
all_texts, all_confidences, blocks = [], [], []
for ocr_result in ocr_results:
pruned = ocr_result["prunedResult"]
texts = pruned["rec_texts"]
scores = pruned["rec_scores"]
boxes = pruned["rec_boxes"]
for i, text in enumerate(texts):
if text and text.strip():
conf = scores[i] if i < len(scores) else 0
all_texts.append(text.strip())
all_confidences.append(conf)
block = {
"text": text.strip(),
"confidence": round(conf, 3),
"bbox": boxes[i],
}
blocks.append(block)
return {
"text": "\n".join(all_texts),
"confidence": (
sum(all_confidences) / len(all_confidences) if all_confidences else 0
),
"blocks": blocks,
}
async def _log_completion_stats(self, result: Dict, ctx: Context) -> None:
text_length = len(result["text"])
block_count = len(result["blocks"])
await ctx.info(
f"OCR completed: {text_length} characters, {block_count} text blocks"
)
def _format_output(
self, result: Dict, detailed: bool, ctx: Context
) -> Union[str, List[Union[TextContent, ImageContent]]]:
if not result["text"].strip():
return (
"❌ No text detected"
if not detailed
else json.dumps({"error": "No text detected"}, ensure_ascii=False)
)
if detailed:
# L2: Return all data
return json.dumps(result, ensure_ascii=False, indent=2)
else:
# L1: Core text + key statistics
confidence = result["confidence"]
block_count = len(result["blocks"])
output = result["text"]
if confidence > 0:
output += f"\n\n📊 Confidence: {(confidence * 100):.1f}% | {block_count} text blocks"
return output
class PPStructureV3Handler(SimpleInferencePipelineHandler):
def register_tools(self, mcp: FastMCP) -> None:
@mcp.tool()
async def _pp_structurev3(
input_data: str,
output_mode: OutputMode,
ctx: Context,
) -> Union[str, List[Union[TextContent, ImageContent]]]:
"""Document layout analysis.
Args:
input_data: File path, URL, or Base64 data.
output_mode: "simple" for markdown text, "detailed" for JSON with metadata + prunedResult.
Returns:
- Simple: Markdown text + images (if available)
- Detailed: prunedResult/local detailed info + markdown text + images
"""
return await self.process(input_data, output_mode, ctx)
def _create_local_engine(self) -> Any:
return PPStructureV3(paddlex_config=self._pipeline_config, device=self._device)
def _get_service_endpoint(self) -> str:
return "layout-parsing"
def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict:
markdown_parts = []
detailed_results = []
# TODO return images
for result in local_result:
text = result.markdown["markdown_texts"]
markdown_parts.append(text)
detailed_results.append(result)
return {
# TODO: Page concatenation can be done better via `pipeline.concatenate_markdown_pages`
"markdown": "\n".join(markdown_parts),
"pages": len(local_result),
"images_mapping": {},
"detailed_results": detailed_results,
}
async def _parse_service_result(self, service_result: Dict, ctx: Context) -> Dict:
result_data = service_result.get("result", service_result)
layout_results = result_data.get("layoutParsingResults")
if not layout_results:
return {
"markdown": "",
"pages": 0,
"images_mapping": {},
"detailed_results": [],
}
# 简化:直接提取需要的信息
markdown_parts = []
all_images_mapping = {}
detailed_results = []
for res in layout_results:
# 提取markdown文本
markdown_parts.append(res["markdown"]["text"])
# 提取图片
all_images_mapping.update(res["markdown"]["images"])
# 保存prunedResult用于L2详细信息
detailed_results.append(res["prunedResult"])
return {
"markdown": "\n".join(markdown_parts),
"pages": len(layout_results), # 简化为页数
"images_mapping": all_images_mapping,
"detailed_results": detailed_results,
}
async def _log_completion_stats(self, result: Dict, ctx: Context) -> None:
page_count = result["pages"] # 现在是数字而不是列表
await ctx.info(f"Structure analysis completed: {page_count} pages")
def _format_output(
self, result: Dict, detailed: bool, ctx: Context
) -> Union[str, List[Union[TextContent, ImageContent]]]:
if not result["markdown"].strip():
return (
"❌ No document content detected"
if not detailed
else json.dumps({"error": "No content detected"}, ensure_ascii=False)
)
markdown_text = result["markdown"]
images_mapping = result.get("images_mapping", {})
if detailed:
# L2: 返回统一的详细结果 + markdown混合内容
content_list = []
if "detailed_results" in result and result["detailed_results"]:
for detailed_result in result["detailed_results"]:
content_list.append(
TextContent(
type="text",
text=json.dumps(
detailed_result,
ensure_ascii=False,
indent=2,
default=str,
),
)
)
# 添加markdown混合内容
content_list.extend(
self._parse_markdown_with_images(markdown_text, images_mapping)
)
return content_list
else:
# L1: 简化的混合内容格式只包含markdown和图片
return self._parse_markdown_with_images(markdown_text, images_mapping)
def _parse_markdown_with_images(
self, markdown_text: str, images_mapping: Dict[str, str]
) -> List[Union[TextContent, ImageContent]]:
"""解析markdown文本返回文字和图片的混合列表"""
if not images_mapping:
# 没有图片,直接返回文本
return [TextContent(type="text", text=markdown_text)]
content_list = []
img_pattern = r'<img[^>]+src="([^"]+)"[^>]*>'
last_pos = 0
for match in re.finditer(img_pattern, markdown_text):
# 添加图片前的文本
text_before = markdown_text[last_pos : match.start()]
if text_before.strip():
content_list.append(TextContent(type="text", text=text_before))
# 添加图片
img_src = match.group(1)
if img_src in images_mapping:
content_list.append(
ImageContent(
type="image",
data=images_mapping[img_src],
mimeType="image/jpeg",
)
)
last_pos = match.end()
# 添加剩余文本
remaining_text = markdown_text[last_pos:]
if remaining_text.strip():
content_list.append(TextContent(type="text", text=remaining_text))
return content_list or [TextContent(type="text", text=markdown_text)]
_PIPELINE_HANDLERS: Dict[str, Type[PipelineHandler]] = {
"OCR": OCRHandler,
"PP-StructureV3": PPStructureV3Handler,
}
def create_pipeline_handler(
pipeline: str, /, *args: Any, **kwargs: Any
) -> PipelineHandler:
if pipeline in _PIPELINE_HANDLERS:
cls = _PIPELINE_HANDLERS[pipeline]
return cls(pipeline, *args, **kwargs)
else:
raise ValueError(f"Unknown pipeline {repr(pipeline)}")

View File

20
mcp_server/pyproject.toml Normal file
View File

@ -0,0 +1,20 @@
[build-system]
requires = ["setuptools>=69"]
build-backend = "setuptools.build_meta"
[project]
name = "paddleocr_mcp"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
"mcp>=1.5.0",
"fastmcp>=2.0.0",
"httpx>=0.24.0",
"numpy>=1.24.0",
"pillow>=9.0.0",
"python-magic>=0.4.24",
"typing-extensions>=4.0.0",
]
[project.scripts]
paddleocr_mcp = "paddleocr_mcp.__main__:main"

View File

@ -278,6 +278,7 @@ nav:
- 端侧部署: version3.x/deployment/on_device_deployment.md
- 服务化部署: version3.x/deployment/serving.md
- 基于Python或C++预测引擎推理: version3.x/deployment/python_and_cpp_infer.md
- MCP 服务器: version3.x/deployment/mcp_server.md
- 模块列表:
- 模块概述: version3.x/module_usage/module_overview.md
- 文档图像方向分类模块: version3.x/module_usage/doc_img_orientation_classification.md