[Feat] Mcp draft version for ocrv5 and structurev3 (#15604)

* Add MCP OCR server draft version * update code review * structure can return images * refine code and code review * fix images return logic * refractor structure for abstract layer * Fix bugs and enhance code * Use string literal for output mode * update images logic for service * update readme and config example * update readme and config example * Fix bugs and add * refine structure image logic, now can show positions in texts * update readme file based on code review * update readme file * update readme file * udpate readme * udpate readme * Polish doc * add en readme * Refactor docs and update installation guide --------- Co-authored-by: Bobholamovic <mhlin425@whu.edu.cn>
2025-06-26 21:24:27 +00:00 · 2025-06-13 18:36:44 +08:00 · 2025-06-13 18:36:44 +08:00 · 3ce3dc56fa
commit 3ce3dc56fa
parent 8e7994992b
10 changed files with 1342 additions and 0 deletions
--- a/docs/version3.x/deployment/mcp_server.en.md
+++ b/docs/version3.x/deployment/mcp_server.en.md
@ -0,0 +1,194 @@
+# PaddleOCR MCP Server
+
+[![PaddleOCR](https://img.shields.io/badge/OCR-PaddleOCR-orange)](https://github.com/PaddlePaddle/PaddleOCR)
+[![FastMCP](https://img.shields.io/badge/Built%20with-FastMCP%20v2-blue)](https://gofastmcp.com)
+
+This project provides a lightweight [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) server designed to integrate the powerful capabilities of PaddleOCR into a compatible MCP Host.
+
+### Key Features
+
+- **Currently Supported Pipelines**
+    - **OCR**: Performs text detection and recognition on images and PDF files.
+    - **PP-StructureV3**: Recognizes and extracts text blocks, titles, paragraphs, images, tables, and other layout elements from an image or PDF file, converting the input into a Markdown document.
+- **Supports the following working modes**:
+    - **Local**: Runs the PaddleOCR pipeline directly on your machine using the installed Python library.
+    - **AI Studio**: Calls cloud services provided by the Paddle AI Studio community.
+    - **Self-hosted**: Calls a PaddleOCR service that you deploy yourself (serving).
+
+### Table of Contents
+
+- [1. Installation](#1-installation)
+- [2. Quick Start](#2-quick-start)
+- [3. Configuration](#3-configuration)
+  - [3.1. MCP Host Configuration](#31-mcp-host-configuration)
+  - [3.2. Working Modes Explained](#32-working-modes-explained)
+    - [Mode 1: AI Studio Service (`aistudio`)](#mode-1-ai-studio-service-aistudio)
+    - [Mode 2: Local Python Library (`local`)](#mode-2-local-python-library-local)
+    - [Mode 3: Self-hosted Service (`self_hosted`)](#mode-3-self-hosted-service-self_hosted)
+- [4. Parameter Reference](#4-parameter-reference)
+- [5. Configuration Examples](#5-configuration-examples)
+  - [5.1 AI Studio Service Configuration](#51-ai-studio-service-configuration)
+  - [5.2 Local Python Library Configuration](#52-local-python-library-configuration)
+  - [5.3 Self-hosted Service Configuration](#53-self-hosted-service-configuration)
+
+## 1. Installation
+
+```bash
+# Install the wheel
+pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl
+
+# Or, install from source
+# git clone https://github.com/PaddlePaddle/PaddleOCR.git
+# pip install -e mcp_server
+```
+
+Some [working modes](#32-working-modes-explained) may require additional dependencies.
+
+## 2. Quick Start
+
+This section guides you through a quick setup using **Claude Desktop** as the MCP Host and the **AI Studio** mode. This mode is recommended for new users as it does not require complex local dependencies. Please refer to [3. Configuration](#3-configuration) for other working modes and more configuration options.
+
+1. **Prepare the AI Studio Service**
+    - Visit the [Paddle AI Studio community](https://aistudio.baidu.com/pipeline/mine) and log in.
+    - In the "PaddleX Pipeline" section under "More" on the left, navigate to [Create Pipeline] - [OCR] - [General OCR] - [Deploy Directly] - [Text Recognition Module, select PP-OCRv5_server_rec] - [Start Deployment].
+    - Once deployed, obtain your **Service Base URL** (e.g., `https://xxxxxx.aistudio-hub.baidu.com`).
+    - Get your **Access Token** from [this page](https://aistudio.baidu.com/index/accessToken).
+
+2. **Locate the MCP Configuration File** - For details, refer to the [Official MCP Documentation](https://modelcontextprotocol.io/quickstart/user).
+    - **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
+    - **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
+    - **Linux**: `~/.config/Claude/claude_desktop_config.json`
+
+3. **Add MCP Server Configuration**
+    Open the `claude_desktop_config.json` file and add the configuration by referring to [5.1 AI Studio Service Configuration](#51-ai-studio-service-configuration).
+
+    **Note**:
+    - Do not leak your **Access Token**.
+    - If `paddleocr_mcp` is not in your system's `PATH`, set `command` to the absolute path of the executable.
+
+4. **Restart the MCP Host**
+    Restart Claude Desktop. The new `paddleocr-ocr` tool should now be available in the application.
+
+## 3. Configuration
+
+### 3.1. MCP Host Configuration
+
+In the Host's configuration file (e.g., `claude_desktop_config.json`), you need to define how to start the tool server. Key fields are:
+- `command`: `paddleocr_mcp` (if the executable is in your `PATH`) or an absolute path.
+- `args`: Configurable command-line arguments, e.g., `["--verbose"]`. See [4. Parameter Reference](#4-parameter-reference).
+- `env`: Configurable environment variables. See [4. Parameter Reference](#4-parameter-reference).
+
+### 3.2. Working Modes Explained
+
+You can configure the MCP server to run in different modes based on your needs.
+
+#### Mode 1: AI Studio Service (`aistudio`)
+
+This mode calls services from the [Paddle AI Studio community](https://aistudio.baidu.com/pipeline/mine).
+- **Use Case**: Ideal for quickly trying out features, validating solutions, and for no-code development scenarios.
+- **Procedure**: Please refer to [2. Quick Start](#2-quick-start).
+- In addition to using the platform's preset model solutions, you can also train and deploy custom models on the platform.
+
+#### Mode 2: Local Python Library (`local`)
+
+This mode runs the model directly on your local machine and has certain requirements for the local environment and computer performance. It relies on the installed `paddleocr` inference package.
+- **Use Case**: Suitable for offline usage and scenarios with strict data privacy requirements.
+- **Procedure**:
+    1.  Refer to the [PaddleOCR Installation Guide](../installation.en.md) to install the *PaddlePaddle framework* and *PaddleOCR*. **It is strongly recommended to install them in a separate virtual environment** to avoid dependency conflicts.
+    2.  Refer to [5.2 Local Python Library Configuration](#52-local-python-library-configuration) to modify the `claude_desktop_config.json` file.
+
+#### Mode 3: Self-hosted Service (`self_hosted`)
+
+This mode calls a PaddleOCR inference service that you have deployed yourself. This corresponds to the **Serving** solutions provided by PaddleX.
+- **Use Case**: Offers the advantages of service-oriented deployment and high flexibility, making it well-suited for production environments, especially for scenarios requiring custom service configurations.
+- **Procedure**:
+    1.  Refer to the [PaddleOCR Installation Guide](../installation.en.md) to install the *PaddlePaddle framework* and *PaddleOCR*.
+    2.  Refer to the [PaddleOCR Serving Deployment Guide](./serving.en.md) to run the server.
+    3.  Refer to [5.3 Self-hosted Service Configuration](#53-self-hosted-service-configuration) to modify the `claude_desktop_config.json` file.
+    4. Set your service address in `PADDLEOCR_MCP_SERVER_URL` (e.g., `"http://127.0.0.1:8080"`).
+
+## 4. Parameter Reference
+
+You can control the server's behavior via environment variables or command-line arguments.
+
+| Environment Variable | Command-line Argument | Type | Description | Options | Default |
+|:---|:---|:---|:---|:---|:---|
+| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | The pipeline to run | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
+| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | The source of PaddleOCR capabilities | `"local"`, `"aistudio"`, `"self_hosted"` | `"local"` |
+| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | Base URL of the underlying service (required for `aistudio` or `self_hosted` mode) | - | `None` |
+| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio authentication token (required for `aistudio` mode) | - | `None` |
+| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | Request timeout for the underlying service (in seconds) | - | `30` |
+| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | Specify the device for inference (only effective in `local` mode) | - | `None` |
+| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | Path to the PaddleX pipeline configuration file (only effective in `local` mode) | - | `None` |
+| - | `--http` | `bool` | Use HTTP transport instead of stdio (for remote deployment and multiple clients) | - | `False` |
+| - | `--host` | `str` | Host address for HTTP mode | - | `"127.0.0.1"` |
+| - | `--port` | `int` | Port for HTTP mode | - | `8080` |
+| - | `--verbose` | `bool` | Enable verbose logging for debugging | - | `False` |
+
+## 5. Configuration Examples
+
+Below are complete configuration examples for different working modes. You can copy and modify them as needed.
+
+### 5.1 AI Studio Service Configuration
+
+```json
+{
+  "mcpServers": {
+    "paddleocr-ocr": {
+      "command": "paddleocr_mcp",
+      "args": [],
+      "env": {
+        "PADDLEOCR_MCP_PIPELINE": "OCR",
+        "PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
+        "PADDLEOCR_MCP_SERVER_URL": "<your-server-url>", 
+        "PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
+      }
+    }
+  }
+}
+```
+
+**Note**:
+- Replace `<your-server-url>` with your AI Studio **Service Base URL**, e.g., `https://xxxxx.aistudio-hub.baidu.com`. Do not include endpoint paths (like `/ocr`).
+- Replace `<your-access-token>` with your **Access Token**.
+
+### 5.2 Local Python Library Configuration
+
+```json
+{
+  "mcpServers": {
+    "paddleocr-ocr": {
+      "command": "paddleocr_mcp",
+      "args": [],
+      "env": {
+        "PADDLEOCR_MCP_PIPELINE": "OCR",
+        "PADDLEOCR_MCP_PPOCR_SOURCE": "local"
+      }
+    }
+  }
+}
+```
+
+**Note**:
+- `PADDLEOCR_MCP_PIPELINE_CONFIG` is optional. If not set, the default pipeline configuration is used. To adjust settings, such as changing models, refer to the [PaddleOCR and PaddleX documentation](../paddleocr_and_paddlex.en.md), export a pipeline configuration file, and set `PADDLEOCR_MCP_PIPELINE_CONFIG` to its absolute path.
+
+### 5.3 Self-hosted Service Configuration
+
+```json
+{
+  "mcpServers": {
+    "paddleocr-ocr": {
+      "command": "paddleocr_mcp",
+      "args": [],
+      "env": {
+        "PADDLEOCR_MCP_PIPELINE": "OCR",
+        "PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
+        "PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
+      }
+    }
+  }
+}
+```
+
+**Note**:
+- Replace `<your-server-url>` with the base URL of your underlying service (e.g., `http://127.0.0.1:8080`). 
--- a/docs/version3.x/deployment/mcp_server.md
+++ b/docs/version3.x/deployment/mcp_server.md
@ -0,0 +1,194 @@
+# PaddleOCR MCP 服务器
+
+[![PaddleOCR](https://img.shields.io/badge/OCR-PaddleOCR-orange)](https://github.com/PaddlePaddle/PaddleOCR)
+[![FastMCP](https://img.shields.io/badge/Built%20with-FastMCP%20v2-blue)](https://gofastmcp.com)
+
+本项目提供一个轻量级的 [Model Context Protocol（MCP）](https://modelcontextprotocol.io/introduction) 服务器，旨在将 PaddleOCR 的强大能力集成到兼容的 MCP Host 中。
+
+### 主要功能
+
+- **当前支持的工具**
+    - **OCR**：对图像和 PDF 文件进行文本检测与识别。
+    - **PP-StructureV3**：从图像或 PDF 文件中识别和提取文本块、标题、段落、图片、表格以及其他版面元素，将输入转换为 Markdown 文档。
+- **支持运行在如下工作模式**
+    - **本地 Python 库**：在本机直接运行 PaddleOCR 产线。
+    - **星河社区服务**：调用飞桨星河社区提供的云端服务。
+    - **自托管服务**：调用用户自行部署的 PaddleOCR 服务。
+
+### 目录
+
+- [1. 安装](#1-安装)
+- [2. 快速开始](#2-快速开始)
+- [3. 配置说明](#3-配置说明)
+  - [3.1. MCP Host 配置](#31-mcp-host-配置)
+  - [3.2. 工作模式详解](#32-工作模式详解)
+    - [模式一：托管在星河社区的服务](#模式一托管在星河社区的服务-aistudio)
+    - [模式二：本地 Python 库](#模式二本地-python-库-local)
+    - [模式三：自托管服务](#模式三自托管服务-self_hosted)
+- [4. 参数参考](#4-参数参考)
+- [5. 配置示例](#5-配置示例)
+  - [5.1 星河社区服务配置](#51-ai-studio-星河社区服务配置)
+  - [5.2 本地 Python 库配置](#52-本地-python-库配置)
+  - [5.3 自托管服务配置](#53-自托管服务配置)
+
+## 1. 安装
+
+```bash
+# 安装 wheel 包
+pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl
+
+# 或者，从项目源码安装
+# git clone https://github.com/PaddlePaddle/PaddleOCR.git
+# pip install -e mcp_server
+```
+
+部分 [工作模式](#32-工作模式详解) 可能需要安装额外依赖。
+
+## 2. 快速开始
+
+本节将以 **Claude Desktop** 作为 MCP Host，并以 **星河社区服务** 工作模式为例，引导您完成快速配置。此模式无需在本地安装复杂的依赖，推荐新用户使用。请参考 [3. 配置说明](#3-配置说明) 了解其他工作模式的操作流程以及更多配置项。
+
+1. **准备星河社区服务**
+    - 访问 [飞桨星河社区](https://aistudio.baidu.com/pipeline/mine) 并登录。
+    - 在左侧"更多内容"下的 "PaddleX 产线" 部分，[创建产线] - [OCR] - [通用 OCR] - [直接部署] - [文本识别模块，选择 PP-OCRv5_server_rec] - [开始部署]。
+    - 部署成功后，获取您的 **服务基础 URL**（示例：`https://xxxxxx.aistudio-hub.baidu.com`）。
+    - 在 [此页面](https://aistudio.baidu.com/index/accessToken) 获取您的 **访问令牌**。
+
+2. **定位 MCP 配置文件** - 详情请参考 [MCP 官方文档](https://modelcontextprotocol.io/quickstart/user)。
+    - **macOS**：`~/Library/Application Support/Claude/claude_desktop_config.json`
+    - **Windows**：`%APPDATA%\Claude\claude_desktop_config.json`
+    - **Linux**：`~/.config/Claude/claude_desktop_config.json`
+
+3. **添加 MCP 服务器配置**
+    打开 `claude_desktop_config.json` 文件，参考 [5.1 星河社区服务配置](#51-星河社区服务配置) 调整配置，填充到 `claude_desktop_config.json` 中。
+
+    **注意**：
+    - 请勿泄漏您的 **访问令牌**。
+    - 如果 `paddleocr_mcp` 无法在系统 `PATH` 中找到，请将 `command` 设置为可执行文件的绝对路径。
+
+4. **重启 MCP Host**
+    重启 Claude Desktop。新的 `paddleocr-ocr` 工具现在应该可以在应用中使用了。
+
+## 3. 配置说明
+
+### 3.1. MCP Host 配置
+
+在 Host 的配置文件中（如 `claude_desktop_config.json`），您需要定义工具服务器的启动方式。关键字段如下：
+- `command`：`paddleocr_mcp`（如果可执行文件可在 `PATH` 中找到）或绝对路径。
+- `args`：可配置命令行参数，如 `["--verbose"]`。详见 [4. 参数参考](#4-参数参考)。
+- `env`：可配置环境变量。详见 [4. 参数参考](#4-参数参考)。
+
+### 3.2. 工作模式详解
+
+您可以根据需求配置 MCP 服务器，使其运行在不同的工作模式。
+
+#### 模式一：托管在星河社区的服务 (`aistudio`)
+
+此模式调用 [飞桨星河社区](https://aistudio.baidu.com/pipeline/mine) 的服务。
+- **适用场景**：适合快速体验功能、快速验证方案等，也适用于零代码开发场景。
+- **操作流程**：请参考 [2. 快速开始](#2-快速开始)。
+- 除了使用平台预设的模型方案，您也可以在平台上自行训练并部署自定义模型。
+
+#### 模式二：本地 Python 库 (`local`)
+
+此模式直接在本地计算机上运行模型，对本地环境与计算机性能有一定要求。
+- **适用场景**：需要离线使用、对数据隐私有严格要求的场景。
+- **操作流程**：
+    1. 参考 [PaddleOCR 安装文档](../installation.md) 安装 *飞桨框架* 和 *PaddleOCR*。为避免依赖冲突，**强烈建议在独立的虚拟环境中安装**。
+    2. 参考 [配置示例](#52-本地-python-库配置) 更改 `claude_desktop_config.json` 文件内容。
+
+#### 模式三：自托管服务 (`self_hosted`)
+
+此模式调用您自行部署的 PaddleOCR 推理服务。
+- **适用场景**：具备服务化部署优势及高度灵活性，较适合生产环境，尤其是适用于需要自定义服务配置的场景。
+- **操作流程**：
+    1. 参考 [PaddleOCR 安装文档](../installation.md) 安装 *飞桨框架* 和 *PaddleOCR*。
+    2. 参考 [PaddleOCR 服务化部署文档](./serving.md) 运行服务器。
+    3. 参考 [配置示例](#53-自托管服务配置) 更改 `claude_desktop_config.json` 文件内容。
+    4. 将您的服务地址填入 `PADDLEOCR_MCP_SERVER_URL` (例如：`"http://127.0.0.1:8000"`)。
+
+## 4. 参数参考
+
+您可以通过环境变量或命令行参数来控制服务器的行为。
+
+| 环境变量 | 命令行参数 | 类型 | 描述 | 可选值 | 默认值 |
+|:---------|:-----------|:-----|:-----|:-------|:-------|
+| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | 要运行的产线 | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
+| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | PaddleOCR 能力来源 | `"local"`, `"aistudio"`, `"self_hosted"` | `"local"` |
+| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | 底层服务基础 URL（`aistudio` 或 `self_hosted` 模式下必需） | - | `None` |
+| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio 认证令牌（`aistudio` 模式下必需） | - | `None` |
+| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | 底层服务请求的超时时间（秒） | - | `30` |
+| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | 指定运行推理的设备（仅在 `local` 模式下生效） | - | `None` |
+| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | PaddleOCR 产线配置文件路径（仅在 `local` 模式下生效） | - | `None` |
+| - | `--http` | `bool` | 使用 HTTP 传输而非 stdio（适用于远程部署和多客户端） | - | `False` |
+| - | `--host` | `str` | HTTP 模式的主机地址 | - | `"127.0.0.1"` |
+| - | `--port` | `int` | HTTP 模式的端口 | - | `8000` |
+| - | `--verbose` | `bool` | 启用详细日志记录，便于调试 | - | `False` |
+
+## 5. 配置示例
+
+以下是针对不同工作模式的完整配置示例，您可以直接复制并根据需要修改：
+
+### 5.1 星河社区服务配置
+
+```json
+{
+  "mcpServers": {
+    "paddleocr-ocr": {
+      "command": "paddleocr_mcp",
+      "args": [],
+      "env": {
+        "PADDLEOCR_MCP_PIPELINE": "OCR",
+        "PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
+        "PADDLEOCR_MCP_SERVER_URL": "<your-server-url>", 
+        "PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
+      }
+    }
+  }
+}
+```
+
+**说明**：
+- 将 `<your-server-url>` 替换为您的星河社区服务的 **服务基础 URL**，例如 `https://xxxxx.aistudio-hub.baidu.com`，注意不要带有端点路径（如 `/ocr`）。
+- 将 `<your-access-token>` 替换为您的 **访问令牌**。
+
+### 5.2 本地 Python 库配置
+
+```json
+{
+  "mcpServers": {
+    "paddleocr-ocr": {
+      "command": "paddleocr_mcp",
+      "args": [],
+      "env": {
+        "PADDLEOCR_MCP_PIPELINE": "OCR",
+        "PADDLEOCR_MCP_PPOCR_SOURCE": "local"
+      }
+    }
+  }
+}
+```
+
+**说明**：
+- `PADDLEOCR_MCP_PIPELINE_CONFIG` 为可选项，不设置时使用产线默认配置。如需调整配置，例如更换模型，请参考 [PaddleOCR 文档](../paddleocr_and_paddlex.md) 导出产线配置文件，并将 `PADDLEOCR_MCP_PIPELINE_CONFIG` 设置为配置文件的绝对路径。
+
+### 5.3 自托管服务配置
+
+```json
+{
+  "mcpServers": {
+    "paddleocr-ocr": {
+      "command": "paddleocr_mcp",
+      "args": [],
+      "env": {
+        "PADDLEOCR_MCP_PIPELINE": "OCR",
+        "PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
+        "PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
+      }
+    }
+  }
+}
+```
+
+**说明**：
+- 将 `<your-server-url>` 替换为底层服务的基础 URL（如：`http://127.0.0.1:8000`）。
--- a/mcp_server/README.md
+++ b/mcp_server/README.md
@ -0,0 +1,5 @@
+# PaddleOCR MCP 服务器
+
+中文 | [English](./README_en.md)
+
+请查看 [文档](../docs/version3.x/deployment/mcp_server.md)。
--- a/mcp_server/README_en.md
+++ b/mcp_server/README_en.md
@ -0,0 +1,5 @@
+# PaddleOCR MCP Server
+
+[中文](./README.md)| English 
+
+Please refer to the [documentation](../docs/version3.x/deployment/mcp_server.en.md)。
--- a/mcp_server/paddleocr_mcp/init.py
+++ b/mcp_server/paddleocr_mcp/init.py
@ -0,0 +1,15 @@
+# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+__version__ = "0.1.0"
--- a/mcp_server/paddleocr_mcp/main.py
+++ b/mcp_server/paddleocr_mcp/main.py
@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+
+# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import contextlib
+import os
+import sys
+from typing import AsyncIterator, Dict
+
+from fastmcp import FastMCP
+
+from .pipelines import create_pipeline_handler
+
+
+def _parse_args() -> argparse.Namespace:
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="PaddleOCR MCP server - Supports local library, AI Studio service, and self-hosted servers."
+    )
+
+    parser.add_argument(
+        "--pipeline",
+        choices=["OCR", "PP-StructureV3"],
+        default=os.getenv("PADDLEOCR_MCP_PIPELINE", "OCR"),
+        help="Pipeline name.",
+    )
+    parser.add_argument(
+        "--ppocr_source",
+        choices=["local", "aistudio", "self_hosted"],
+        default=os.getenv("PADDLEOCR_MCP_PPOCR_SOURCE", "local"),
+        help="Source of PaddleOCR functionality: local (local library), aistudio (AI Studio service), self_hosted (self-hosted server).",
+    )
+    parser.add_argument(
+        "--http",
+        action="store_true",
+        help="Use HTTP transport instead of STDIO (suitable for remote deployment and multiple clients).",
+    )
+    parser.add_argument(
+        "--host",
+        default="127.0.0.1",
+        help="Host address for HTTP mode (default: 127.0.0.1).",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=8000,
+        help="Port for HTTP mode (default: 8000).",
+    )
+    parser.add_argument(
+        "--verbose", action="store_true", help="Enable verbose logging for debugging."
+    )
+
+    # Local mode configuration
+    parser.add_argument(
+        "--pipeline_config",
+        default=os.getenv("PADDLEOCR_MCP_PIPELINE_CONFIG"),
+        help="PaddleOCR pipeline configuration file path (for local mode).",
+    )
+    parser.add_argument(
+        "--device",
+        default=os.getenv("PADDLEOCR_MCP_DEVICE"),
+        help="Device to run inference on.",
+    )
+
+    # Service mode configuration
+    parser.add_argument(
+        "--server_url",
+        default=os.getenv("PADDLEOCR_MCP_SERVER_URL"),
+        help="Base URL of the underlying server (required in service mode).",
+    )
+    parser.add_argument(
+        "--aistudio_access_token",
+        default=os.getenv("PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN"),
+        help="AI Studio access token (required for AI Studio).",
+    )
+    parser.add_argument(
+        "--timeout",
+        type=int,
+        default=int(os.getenv("PADDLEOCR_MCP_TIMEOUT", "30")),
+        help="API request timeout in seconds for the underlying server.",
+    )
+
+    args = parser.parse_args()
+    return args
+
+
+def _validate_args(args: argparse.Namespace) -> None:
+    """Validate command line arguments."""
+    if not args.http and (args.host != "127.0.0.1" or args.port != 8000):
+        print(
+            "Host and port arguments are only valid when using HTTP transport (see: `--http`).",
+            file=sys.stderr,
+        )
+        sys.exit(2)
+
+    if args.ppocr_source in ["aistudio", "self_hosted"]:
+        if not args.server_url:
+            print("Error: The server base URL is required.", file=sys.stderr)
+            print(
+                "Please either set `--server_url` or set the environment variable "
+                "`PADDLEOCR_MCP_SERVER_URL`.",
+                file=sys.stderr,
+            )
+            sys.exit(2)
+
+        if args.ppocr_source == "aistudio" and not args.aistudio_access_token:
+            print("Error: The AI Studio access token is required.", file=sys.stderr)
+            print(
+                "Please either set `--aistudio_access_token` or set the environment variable "
+                "`PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN`.",
+                file=sys.stderr,
+            )
+            sys.exit(2)
+
+
+def main() -> None:
+    """Main entry point."""
+    args = _parse_args()
+
+    _validate_args(args)
+
+    try:
+        pipeline_handler = create_pipeline_handler(
+            args.pipeline,
+            args.ppocr_source,
+            pipeline_config=args.pipeline_config,
+            device=args.device,
+            server_url=args.server_url,
+            aistudio_access_token=args.aistudio_access_token,
+            timeout=args.timeout,
+        )
+    except Exception as e:
+        print(f"Failed to create the pipeline handler: {e}", file=sys.stderr)
+        if args.verbose:
+            import traceback
+
+            traceback.print_exc(file=sys.stderr)
+        sys.exit(1)
+
+    @contextlib.asynccontextmanager
+    async def _lifespan(mcp: FastMCP) -> AsyncIterator[Dict]:
+        async with pipeline_handler:
+            yield {}
+
+    try:
+        server_name = f"PaddleOCR {args.pipeline} MCP server"
+        mcp = FastMCP(
+            name=server_name,
+            lifespan=_lifespan,
+            log_level="INFO" if args.verbose else "WARNING",
+        )
+
+        pipeline_handler.register_tools(mcp)
+
+        if args.http:
+            mcp.run(
+                transport="streamable-http",
+                host=args.host,
+                port=args.port,
+            )
+        else:
+            mcp.run()
+
+    except Exception as e:
+        print(f"Failed to start the server: {e}", file=sys.stderr)
+        if args.verbose:
+            import traceback
+
+            traceback.print_exc(file=sys.stderr)
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/mcp_server/paddleocr_mcp/pipelines.py
+++ b/mcp_server/paddleocr_mcp/pipelines.py
@ -0,0 +1,721 @@
+# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import abc
+import asyncio
+import base64
+import io
+import json
+import mimetypes
+import re
+from pathlib import PurePath
+from queue import Queue
+from threading import Thread
+from typing import Any, Callable, Dict, List, Optional, Type, Union
+from urllib.parse import urlparse
+
+import httpx
+import magic
+import numpy as np
+from fastmcp import Context, FastMCP
+from mcp.types import ImageContent, TextContent
+from PIL import Image as PILImage
+from typing_extensions import Literal, Self, assert_never
+
+try:
+    from paddleocr import PaddleOCR, PPStructureV3
+
+    LOCAL_OCR_AVAILABLE = True
+except ImportError:
+    LOCAL_OCR_AVAILABLE = False
+
+
+OutputMode = Literal["simple", "detailed"]
+
+
+def _is_file_path(s: str) -> bool:
+    try:
+        PurePath(s)
+        return True
+    except Exception:
+        return False
+
+
+def _is_base64(s: str) -> bool:
+    pattern = r"^[A-Za-z0-9+/]+={0,2}$"
+    return bool(re.fullmatch(pattern, s))
+
+
+def _is_url(s: str) -> bool:
+    if not (s.startswith("http://") or s.startswith("https://")):
+        return False
+    result = urlparse(s)
+    return all([result.scheme, result.netloc]) and result.scheme in ("http", "https")
+
+
+def _infer_file_type_from_url(url: str) -> str:
+    url_parts = urlparse(url)
+    filename = url_parts.path.split("/")[-1]
+    file_type = mimetypes.guess_type(filename)[0]
+    if not file_type:
+        return "UNKNOWN"
+    if file_type.startswith("image/"):
+        return "IMAGE"
+    elif file_type == "application/pdf":
+        return "PDF"
+    return "UNKNOWN"
+
+
+def _infer_file_type_from_bytes(data: bytes) -> str:
+    mime = magic.from_buffer(data, mime=True)
+    if mime.startswith("image/"):
+        return "IMAGE"
+    elif mime == "application/pdf":
+        return "PDF"
+    return "UNKNOWN"
+
+
+class _EngineWrapper:
+    def __init__(self, engine: Any) -> None:
+        self._engine = engine
+        self._queue: Queue = Queue()
+        self._closed = False
+        self._loop = asyncio.get_running_loop()
+        self._thread = Thread(target=self._worker, daemon=False)
+        self._thread.start()
+
+    @property
+    def engine(self) -> Any:
+        return self._engine
+
+    async def call(self, func: Callable, *args: Any, **kwargs: Any) -> Any:
+        if self._closed:
+            raise RuntimeError("Engine wrapper has already been closed")
+        fut = self._loop.create_future()
+        self._queue.put((func, args, kwargs, fut))
+        return await fut
+
+    async def close(self) -> None:
+        if not self._closed:
+            self._queue.put(None)
+            await self._loop.run_in_executor(None, self._thread.join)
+
+    def _worker(self) -> None:
+        while not self._closed:
+            item = self._queue.get()
+            if item is None:
+                break
+            func, args, kwargs, fut = item
+            try:
+                result = func(*args, **kwargs)
+                self._loop.call_soon_threadsafe(fut.set_result, result)
+            except Exception as e:
+                self._loop.call_soon_threadsafe(fut.set_exception, e)
+            finally:
+                self._queue.task_done()
+
+
+class PipelineHandler(abc.ABC):
+    """Abstract base class for pipeline handlers."""
+
+    def __init__(
+        self,
+        pipeline: str,
+        ppocr_source: str,
+        pipeline_config: Optional[str],
+        device: Optional[str],
+        server_url: Optional[str],
+        aistudio_access_token: Optional[str],
+        timeout: Optional[int],
+    ) -> None:
+        """Initialize the pipeline handler.
+
+        Args:
+            pipeline: Pipeline name.
+            ppocr_source: Source of PaddleOCR functionality.
+            pipeline_config: Path to pipeline configuration.
+            device: Device to run inference on.
+            server_url: Base URL for service mode.
+            aistudio_access_token: AI Studio access token.
+            timeout: Timeout in seconds.
+        """
+        self._pipeline = pipeline
+        if ppocr_source == "local":
+            self._mode = "local"
+        elif ppocr_source in ("aistudio", "self_hosted"):
+            self._mode = "service"
+        else:
+            raise ValueError(f"Unknown PaddleOCR source {repr(ppocr_source)}")
+        self._ppocr_source = ppocr_source
+        self._pipeline_config = pipeline_config
+        self._device = device
+        self._server_url = server_url
+        self._aistudio_access_token = aistudio_access_token
+        self._timeout = timeout or 30  # Default timeout of 30 seconds
+
+        if self._mode == "local":
+            if not LOCAL_OCR_AVAILABLE:
+                raise RuntimeError("PaddleOCR is not locally available")
+            self._engine = self._create_local_engine()
+
+        self._status: Literal["initialized", "started", "stopped"] = "initialized"
+
+    async def start(self) -> None:
+        if self._status == "initialized":
+            if self._mode == "local":
+                self._engine_wrapper = _EngineWrapper(self._engine)
+            self._status = "started"
+        elif self._status == "started":
+            pass
+        elif self._status == "stopped":
+            raise RuntimeError("Pipeline handler has already been stopped")
+        else:
+            assert_never(self._status)
+
+    async def stop(self) -> None:
+        if self._status == "initialized":
+            raise RuntimeError("Pipeline handler has not been started")
+        elif self._status == "started":
+            if self._mode == "local":
+                await self._engine_wrapper.close()
+            self._status = "stopped"
+        elif self._status == "stopped":
+            pass
+        else:
+            assert_never(self._status)
+
+    async def __aenter__(self) -> Self:
+        await self.start()
+        return self
+
+    async def __aexit__(
+        self,
+        exc_type: Any,
+        exc_val: Any,
+        exc_tb: Any,
+    ) -> None:
+        await self.stop()
+
+    @abc.abstractmethod
+    def register_tools(self, mcp: FastMCP) -> None:
+        """Register tools with the MCP server.
+
+        Args:
+            mcp: The `FastMCP` instance.
+        """
+        raise NotImplementedError
+
+    @abc.abstractmethod
+    def _create_local_engine(self) -> Any:
+        """Create the local OCR engine.
+
+        Returns:
+            The OCR engine instance.
+        """
+        raise NotImplementedError
+
+
+class SimpleInferencePipelineHandler(PipelineHandler):
+    """Base class for simple inference pipeline handlers."""
+
+    async def process(
+        self, input_data: str, output_mode: OutputMode, ctx: Context, **kwargs: Any
+    ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+        """Process input data through the pipeline.
+
+        Args:
+            input_data: Input data (file path, URL, or Base64).
+            output_mode: Output mode ("simple" or "detailed").
+            ctx: MCP context.
+            **kwargs: Additional pipeline-specific arguments.
+
+        Returns:
+            Processed result in the requested output format.
+        """
+        try:
+            await ctx.info(
+                f"Starting {self._pipeline} processing (source: {self._ppocr_source})"
+            )
+
+            if self._mode == "local":
+                processed_input = self._process_input_for_local(input_data)
+                raw_result = await self._predict_with_local_engine(
+                    processed_input, ctx, **kwargs
+                )
+                result = self._parse_local_result(raw_result, ctx)
+            else:
+                processed_input, file_type = self._process_input_for_service(input_data)
+                raw_result = await self._call_service(
+                    processed_input, file_type, ctx, **kwargs
+                )
+                result = await self._parse_service_result(raw_result, ctx)
+
+            await self._log_completion_stats(result, ctx)
+            return self._format_output(result, output_mode == "detailed", ctx)
+
+        except Exception as e:
+            await ctx.error(f"{self._pipeline} processing failed: {str(e)}")
+            return self._handle_error(str(e), output_mode)
+
+    def _process_input_for_local(self, input_data: str) -> Union[str, np.ndarray]:
+        if _is_file_path(input_data) or _is_url(input_data):
+            return input_data
+        elif _is_base64(input_data):
+            if input_data.startswith("data:"):
+                base64_data = input_data.split(",", 1)[1]
+            else:
+                base64_data = input_data
+            try:
+                image_bytes = base64.b64decode(base64_data)
+                image_pil = PILImage.open(io.BytesIO(image_bytes))
+                image_arr = np.array(image_pil.convert("RGB"))
+                # Convert RGB to BGR
+                return np.ascontiguousarray(image_arr[..., ::-1])
+            except Exception as e:
+                raise ValueError(f"Failed to decode Base64 image: {e}")
+        else:
+            raise ValueError("Invalid input data format")
+
+    def _process_input_for_service(self, input_data: str) -> tuple[str, str]:
+        if _is_file_path(input_data):
+            try:
+                with open(input_data, "rb") as f:
+                    bytes_ = f.read()
+                input_data = base64.b64encode(bytes_).decode("ascii")
+                file_type = _infer_file_type_from_bytes(bytes_)
+            except Exception as e:
+                raise ValueError(f"Failed to read file: {e}")
+        elif _is_url(input_data):
+            file_type = _infer_file_type_from_url(input_data)
+        elif _is_base64(input_data):
+            try:
+                if input_data.startswith("data:"):
+                    base64_data = input_data.split(",", 1)[1]
+                else:
+                    base64_data = input_data
+                bytes_ = base64.b64decode(base64_data)
+                file_type = _infer_file_type_from_bytes(bytes_)
+            except Exception as e:
+                raise ValueError(f"Failed to decode Base64 data: {e}")
+        else:
+            raise ValueError("Invalid input data format")
+
+        return input_data, file_type
+
+    async def _call_service(
+        self, processed_input: str, file_type: str, ctx: Context, **kwargs: Any
+    ) -> Dict[str, Any]:
+        if not self._server_url:
+            raise RuntimeError("Server URL not configured")
+
+        endpoint = self._get_service_endpoint()
+        url = f"{self._server_url.rstrip('/')}/{endpoint.lstrip('/')}"
+
+        payload = self._prepare_service_payload(processed_input, file_type, **kwargs)
+        headers = {"Content-Type": "application/json"}
+
+        if self._ppocr_source == "aistudio":
+            if not self._aistudio_access_token:
+                raise RuntimeError("Missing AI Studio access token")
+            headers["Authorization"] = f"token {self._aistudio_access_token}"
+
+        try:
+            async with httpx.AsyncClient(timeout=self._timeout) as client:
+                response = await client.post(url, json=payload, headers=headers)
+                response.raise_for_status()
+                return response.json()
+        except httpx.HTTPError as e:
+            raise RuntimeError(f"Service call failed: {str(e)}")
+        except json.JSONDecodeError as e:
+            raise RuntimeError(f"Invalid service response: {str(e)}")
+
+    def _prepare_service_payload(
+        self, processed_input: str, file_type: str, **kwargs: Any
+    ) -> Dict[str, Any]:
+        api_file_type = 1 if file_type == "IMAGE" else 0
+        payload = {"file": processed_input, "fileType": api_file_type, **kwargs}
+        return payload
+
+    def _handle_error(
+        self, error_msg: str, output_mode: OutputMode
+    ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+        if output_mode == "detailed":
+            return [TextContent(type="text", text=f"Error: {error_msg}")]
+        return f"Error: {error_msg}"
+
+    @abc.abstractmethod
+    def _get_service_endpoint(self) -> str:
+        """Get the service endpoint.
+
+        Returns:
+            Service endpoint path.
+        """
+        raise NotImplementedError
+
+    @abc.abstractmethod
+    def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict[str, Any]:
+        """Parse raw result from local engine into a unified format.
+
+        Args:
+            local_result: Raw result from local engine.
+            ctx: MCP context.
+
+        Returns:
+            Parsed result in unified format.
+        """
+        raise NotImplementedError
+
+    @abc.abstractmethod
+    async def _parse_service_result(
+        self, service_result: Dict[str, Any], ctx: Context
+    ) -> Dict[str, Any]:
+        """Parse raw result from the service into a unified format.
+
+        Args:
+            service_result: Raw result from the service.
+            ctx: MCP context.
+
+        Returns:
+            Parsed result in unified format.
+        """
+        raise NotImplementedError
+
+    @abc.abstractmethod
+    async def _log_completion_stats(self, result: Dict[str, Any], ctx: Context) -> None:
+        """Log statistics after processing completion.
+
+        Args:
+            result: Processing result.
+            ctx: MCP context.
+        """
+        raise NotImplementedError
+
+    @abc.abstractmethod
+    def _format_output(
+        self, result: Dict[str, Any], detailed: bool, ctx: Context
+    ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+        """Format output into simple or detailed format.
+
+        Args:
+            result: Processing result.
+            detailed: Whether to use detailed format.
+            ctx: MCP context.
+
+        Returns:
+            Formatted output in requested format.
+        """
+        raise NotImplementedError
+
+    async def _predict_with_local_engine(
+        self, processed_input: Union[str, np.ndarray], ctx: Context, **kwargs: Any
+    ) -> Dict:
+        if not hasattr(self, "_engine_wrapper"):
+            raise RuntimeError("Engine wrapper has not been initialized")
+        return await self._engine_wrapper.call(
+            self._engine_wrapper.engine.predict, processed_input, **kwargs
+        )
+
+
+class OCRHandler(SimpleInferencePipelineHandler):
+    def register_tools(self, mcp: FastMCP) -> None:
+        @mcp.tool()
+        async def _ocr(
+            input_data: str,
+            output_mode: OutputMode,
+            ctx: Context,
+        ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+            """Extract text from images and PDFs.
+
+            Args:
+                input_data: File path, URL, or Base64 data.
+                output_mode: "simple" for clean text, "detailed" for JSON with positioning.
+            """
+            return await self.process(input_data, output_mode, ctx)
+
+    def _create_local_engine(self) -> Any:
+        return PaddleOCR(
+            paddlex_config=self._pipeline_config,
+            device=self._device,
+            enable_mkldnn=False,
+        )
+
+    def _get_service_endpoint(self) -> str:
+        return "ocr"
+
+    def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict:
+        result = local_result[0]
+        texts = result["rec_texts"]
+        scores = result["rec_scores"]
+        boxes = result["rec_boxes"]
+
+        # Direct assembly
+        clean_texts, confidences, blocks = [], [], []
+
+        for i, text in enumerate(texts):
+            if text and text.strip():
+                conf = scores[i] if i < len(scores) else 0
+                clean_texts.append(text.strip())
+                confidences.append(conf)
+                block = {
+                    "text": text.strip(),
+                    "confidence": round(conf, 3),
+                    "bbox": boxes[i].tolist(),
+                }
+                blocks.append(block)
+
+        return {
+            "text": "\n".join(clean_texts),
+            "confidence": sum(confidences) / len(confidences) if confidences else 0,
+            "blocks": blocks,
+        }
+
+    async def _parse_service_result(self, service_result: Dict, ctx: Context) -> Dict:
+        result_data = service_result.get("result", service_result)
+        ocr_results = result_data.get("ocrResults")
+
+        # Direct extraction and assembly
+        all_texts, all_confidences, blocks = [], [], []
+
+        for ocr_result in ocr_results:
+            pruned = ocr_result["prunedResult"]
+
+            texts = pruned["rec_texts"]
+            scores = pruned["rec_scores"]
+            boxes = pruned["rec_boxes"]
+
+            for i, text in enumerate(texts):
+                if text and text.strip():
+                    conf = scores[i] if i < len(scores) else 0
+                    all_texts.append(text.strip())
+                    all_confidences.append(conf)
+                    block = {
+                        "text": text.strip(),
+                        "confidence": round(conf, 3),
+                        "bbox": boxes[i],
+                    }
+                    blocks.append(block)
+
+        return {
+            "text": "\n".join(all_texts),
+            "confidence": (
+                sum(all_confidences) / len(all_confidences) if all_confidences else 0
+            ),
+            "blocks": blocks,
+        }
+
+    async def _log_completion_stats(self, result: Dict, ctx: Context) -> None:
+        text_length = len(result["text"])
+        block_count = len(result["blocks"])
+        await ctx.info(
+            f"OCR completed: {text_length} characters, {block_count} text blocks"
+        )
+
+    def _format_output(
+        self, result: Dict, detailed: bool, ctx: Context
+    ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+        if not result["text"].strip():
+            return (
+                "❌ No text detected"
+                if not detailed
+                else json.dumps({"error": "No text detected"}, ensure_ascii=False)
+            )
+
+        if detailed:
+            # L2: Return all data
+            return json.dumps(result, ensure_ascii=False, indent=2)
+        else:
+            # L1: Core text + key statistics
+            confidence = result["confidence"]
+            block_count = len(result["blocks"])
+
+            output = result["text"]
+            if confidence > 0:
+                output += f"\n\n📊 Confidence: {(confidence * 100):.1f}% | {block_count} text blocks"
+
+            return output
+
+
+class PPStructureV3Handler(SimpleInferencePipelineHandler):
+    def register_tools(self, mcp: FastMCP) -> None:
+        @mcp.tool()
+        async def _pp_structurev3(
+            input_data: str,
+            output_mode: OutputMode,
+            ctx: Context,
+        ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+            """Document layout analysis.
+
+            Args:
+                input_data: File path, URL, or Base64 data.
+                output_mode: "simple" for markdown text, "detailed" for JSON with metadata + prunedResult.
+
+            Returns:
+                - Simple: Markdown text + images (if available)
+                - Detailed: prunedResult/local detailed info + markdown text + images
+            """
+            return await self.process(input_data, output_mode, ctx)
+
+    def _create_local_engine(self) -> Any:
+        return PPStructureV3(paddlex_config=self._pipeline_config, device=self._device)
+
+    def _get_service_endpoint(self) -> str:
+        return "layout-parsing"
+
+    def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict:
+        markdown_parts = []
+        detailed_results = []
+
+        # TODO return images
+        for result in local_result:
+            text = result.markdown["markdown_texts"]
+            markdown_parts.append(text)
+            detailed_results.append(result)
+
+        return {
+            # TODO: Page concatenation can be done better via `pipeline.concatenate_markdown_pages`
+            "markdown": "\n".join(markdown_parts),
+            "pages": len(local_result),
+            "images_mapping": {},
+            "detailed_results": detailed_results,
+        }
+
+    async def _parse_service_result(self, service_result: Dict, ctx: Context) -> Dict:
+        result_data = service_result.get("result", service_result)
+        layout_results = result_data.get("layoutParsingResults")
+
+        if not layout_results:
+            return {
+                "markdown": "",
+                "pages": 0,
+                "images_mapping": {},
+                "detailed_results": [],
+            }
+
+        # 简化：直接提取需要的信息
+        markdown_parts = []
+        all_images_mapping = {}
+        detailed_results = []
+
+        for res in layout_results:
+            # 提取markdown文本
+            markdown_parts.append(res["markdown"]["text"])
+            # 提取图片
+            all_images_mapping.update(res["markdown"]["images"])
+            # 保存prunedResult用于L2详细信息
+            detailed_results.append(res["prunedResult"])
+
+        return {
+            "markdown": "\n".join(markdown_parts),
+            "pages": len(layout_results),  # 简化为页数
+            "images_mapping": all_images_mapping,
+            "detailed_results": detailed_results,
+        }
+
+    async def _log_completion_stats(self, result: Dict, ctx: Context) -> None:
+        page_count = result["pages"]  # 现在是数字而不是列表
+        await ctx.info(f"Structure analysis completed: {page_count} pages")
+
+    def _format_output(
+        self, result: Dict, detailed: bool, ctx: Context
+    ) -> Union[str, List[Union[TextContent, ImageContent]]]:
+        if not result["markdown"].strip():
+            return (
+                "❌ No document content detected"
+                if not detailed
+                else json.dumps({"error": "No content detected"}, ensure_ascii=False)
+            )
+
+        markdown_text = result["markdown"]
+        images_mapping = result.get("images_mapping", {})
+
+        if detailed:
+            # L2: 返回统一的详细结果 + markdown混合内容
+            content_list = []
+            if "detailed_results" in result and result["detailed_results"]:
+                for detailed_result in result["detailed_results"]:
+                    content_list.append(
+                        TextContent(
+                            type="text",
+                            text=json.dumps(
+                                detailed_result,
+                                ensure_ascii=False,
+                                indent=2,
+                                default=str,
+                            ),
+                        )
+                    )
+
+            # 添加markdown混合内容
+            content_list.extend(
+                self._parse_markdown_with_images(markdown_text, images_mapping)
+            )
+
+            return content_list
+        else:
+            # L1: 简化的混合内容格式，只包含markdown和图片
+            return self._parse_markdown_with_images(markdown_text, images_mapping)
+
+    def _parse_markdown_with_images(
+        self, markdown_text: str, images_mapping: Dict[str, str]
+    ) -> List[Union[TextContent, ImageContent]]:
+        """解析markdown文本，返回文字和图片的混合列表"""
+        if not images_mapping:
+            # 没有图片，直接返回文本
+            return [TextContent(type="text", text=markdown_text)]
+
+        content_list = []
+        img_pattern = r'<img[^>]+src="([^"]+)"[^>]*>'
+        last_pos = 0
+
+        for match in re.finditer(img_pattern, markdown_text):
+            # 添加图片前的文本
+            text_before = markdown_text[last_pos : match.start()]
+            if text_before.strip():
+                content_list.append(TextContent(type="text", text=text_before))
+
+            # 添加图片
+            img_src = match.group(1)
+            if img_src in images_mapping:
+                content_list.append(
+                    ImageContent(
+                        type="image",
+                        data=images_mapping[img_src],
+                        mimeType="image/jpeg",
+                    )
+                )
+
+            last_pos = match.end()
+
+        # 添加剩余文本
+        remaining_text = markdown_text[last_pos:]
+        if remaining_text.strip():
+            content_list.append(TextContent(type="text", text=remaining_text))
+
+        return content_list or [TextContent(type="text", text=markdown_text)]
+
+
+_PIPELINE_HANDLERS: Dict[str, Type[PipelineHandler]] = {
+    "OCR": OCRHandler,
+    "PP-StructureV3": PPStructureV3Handler,
+}
+
+
+def create_pipeline_handler(
+    pipeline: str, /, *args: Any, **kwargs: Any
+) -> PipelineHandler:
+    if pipeline in _PIPELINE_HANDLERS:
+        cls = _PIPELINE_HANDLERS[pipeline]
+        return cls(pipeline, *args, **kwargs)
+    else:
+        raise ValueError(f"Unknown pipeline {repr(pipeline)}")
--- a/mcp_server/paddleocr_mcp/py.typed
+++ b/mcp_server/paddleocr_mcp/py.typed
--- a/mcp_server/pyproject.toml
+++ b/mcp_server/pyproject.toml
@ -0,0 +1,20 @@
+[build-system]
+requires = ["setuptools>=69"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "paddleocr_mcp"
+version = "0.1.0"
+requires-python = ">=3.10"
+dependencies = [
+    "mcp>=1.5.0",
+    "fastmcp>=2.0.0",
+    "httpx>=0.24.0",
+    "numpy>=1.24.0",
+    "pillow>=9.0.0",
+    "python-magic>=0.4.24",
+    "typing-extensions>=4.0.0",
+]
+
+[project.scripts]
+paddleocr_mcp = "paddleocr_mcp.__main__:main"
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -278,6 +278,7 @@ nav:
    - 端侧部署: version3.x/deployment/on_device_deployment.md
    - 服务化部署: version3.x/deployment/serving.md
    - 基于Python或C++预测引擎推理: version3.x/deployment/python_and_cpp_infer.md
+    - MCP 服务器: version3.x/deployment/mcp_server.md
  - 模块列表: 
    - 模块概述: version3.x/module_usage/module_overview.md
    - 文档图像方向分类模块: version3.x/module_usage/doc_img_orientation_classification.md