mirror of
https://github.com/PaddlePaddle/PaddleOCR.git
synced 2025-06-26 21:24:27 +00:00
[Feat] Mcp draft version for ocrv5 and structurev3 (#15604)
* Add MCP OCR server draft version * update code review * structure can return images * refine code and code review * fix images return logic * refractor structure for abstract layer * Fix bugs and enhance code * Use string literal for output mode * update images logic for service * update readme and config example * update readme and config example * Fix bugs and add * refine structure image logic, now can show positions in texts * update readme file based on code review * update readme file * update readme file * udpate readme * udpate readme * Polish doc * add en readme * Refactor docs and update installation guide --------- Co-authored-by: Bobholamovic <mhlin425@whu.edu.cn>
This commit is contained in:
parent
8e7994992b
commit
3ce3dc56fa
194
docs/version3.x/deployment/mcp_server.en.md
Normal file
194
docs/version3.x/deployment/mcp_server.en.md
Normal file
@ -0,0 +1,194 @@
|
||||
# PaddleOCR MCP Server
|
||||
|
||||
[](https://github.com/PaddlePaddle/PaddleOCR)
|
||||
[](https://gofastmcp.com)
|
||||
|
||||
This project provides a lightweight [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) server designed to integrate the powerful capabilities of PaddleOCR into a compatible MCP Host.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Currently Supported Pipelines**
|
||||
- **OCR**: Performs text detection and recognition on images and PDF files.
|
||||
- **PP-StructureV3**: Recognizes and extracts text blocks, titles, paragraphs, images, tables, and other layout elements from an image or PDF file, converting the input into a Markdown document.
|
||||
- **Supports the following working modes**:
|
||||
- **Local**: Runs the PaddleOCR pipeline directly on your machine using the installed Python library.
|
||||
- **AI Studio**: Calls cloud services provided by the Paddle AI Studio community.
|
||||
- **Self-hosted**: Calls a PaddleOCR service that you deploy yourself (serving).
|
||||
|
||||
### Table of Contents
|
||||
|
||||
- [1. Installation](#1-installation)
|
||||
- [2. Quick Start](#2-quick-start)
|
||||
- [3. Configuration](#3-configuration)
|
||||
- [3.1. MCP Host Configuration](#31-mcp-host-configuration)
|
||||
- [3.2. Working Modes Explained](#32-working-modes-explained)
|
||||
- [Mode 1: AI Studio Service (`aistudio`)](#mode-1-ai-studio-service-aistudio)
|
||||
- [Mode 2: Local Python Library (`local`)](#mode-2-local-python-library-local)
|
||||
- [Mode 3: Self-hosted Service (`self_hosted`)](#mode-3-self-hosted-service-self_hosted)
|
||||
- [4. Parameter Reference](#4-parameter-reference)
|
||||
- [5. Configuration Examples](#5-configuration-examples)
|
||||
- [5.1 AI Studio Service Configuration](#51-ai-studio-service-configuration)
|
||||
- [5.2 Local Python Library Configuration](#52-local-python-library-configuration)
|
||||
- [5.3 Self-hosted Service Configuration](#53-self-hosted-service-configuration)
|
||||
|
||||
## 1. Installation
|
||||
|
||||
```bash
|
||||
# Install the wheel
|
||||
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl
|
||||
|
||||
# Or, install from source
|
||||
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
|
||||
# pip install -e mcp_server
|
||||
```
|
||||
|
||||
Some [working modes](#32-working-modes-explained) may require additional dependencies.
|
||||
|
||||
## 2. Quick Start
|
||||
|
||||
This section guides you through a quick setup using **Claude Desktop** as the MCP Host and the **AI Studio** mode. This mode is recommended for new users as it does not require complex local dependencies. Please refer to [3. Configuration](#3-configuration) for other working modes and more configuration options.
|
||||
|
||||
1. **Prepare the AI Studio Service**
|
||||
- Visit the [Paddle AI Studio community](https://aistudio.baidu.com/pipeline/mine) and log in.
|
||||
- In the "PaddleX Pipeline" section under "More" on the left, navigate to [Create Pipeline] - [OCR] - [General OCR] - [Deploy Directly] - [Text Recognition Module, select PP-OCRv5_server_rec] - [Start Deployment].
|
||||
- Once deployed, obtain your **Service Base URL** (e.g., `https://xxxxxx.aistudio-hub.baidu.com`).
|
||||
- Get your **Access Token** from [this page](https://aistudio.baidu.com/index/accessToken).
|
||||
|
||||
2. **Locate the MCP Configuration File** - For details, refer to the [Official MCP Documentation](https://modelcontextprotocol.io/quickstart/user).
|
||||
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
||||
- **Linux**: `~/.config/Claude/claude_desktop_config.json`
|
||||
|
||||
3. **Add MCP Server Configuration**
|
||||
Open the `claude_desktop_config.json` file and add the configuration by referring to [5.1 AI Studio Service Configuration](#51-ai-studio-service-configuration).
|
||||
|
||||
**Note**:
|
||||
- Do not leak your **Access Token**.
|
||||
- If `paddleocr_mcp` is not in your system's `PATH`, set `command` to the absolute path of the executable.
|
||||
|
||||
4. **Restart the MCP Host**
|
||||
Restart Claude Desktop. The new `paddleocr-ocr` tool should now be available in the application.
|
||||
|
||||
## 3. Configuration
|
||||
|
||||
### 3.1. MCP Host Configuration
|
||||
|
||||
In the Host's configuration file (e.g., `claude_desktop_config.json`), you need to define how to start the tool server. Key fields are:
|
||||
- `command`: `paddleocr_mcp` (if the executable is in your `PATH`) or an absolute path.
|
||||
- `args`: Configurable command-line arguments, e.g., `["--verbose"]`. See [4. Parameter Reference](#4-parameter-reference).
|
||||
- `env`: Configurable environment variables. See [4. Parameter Reference](#4-parameter-reference).
|
||||
|
||||
### 3.2. Working Modes Explained
|
||||
|
||||
You can configure the MCP server to run in different modes based on your needs.
|
||||
|
||||
#### Mode 1: AI Studio Service (`aistudio`)
|
||||
|
||||
This mode calls services from the [Paddle AI Studio community](https://aistudio.baidu.com/pipeline/mine).
|
||||
- **Use Case**: Ideal for quickly trying out features, validating solutions, and for no-code development scenarios.
|
||||
- **Procedure**: Please refer to [2. Quick Start](#2-quick-start).
|
||||
- In addition to using the platform's preset model solutions, you can also train and deploy custom models on the platform.
|
||||
|
||||
#### Mode 2: Local Python Library (`local`)
|
||||
|
||||
This mode runs the model directly on your local machine and has certain requirements for the local environment and computer performance. It relies on the installed `paddleocr` inference package.
|
||||
- **Use Case**: Suitable for offline usage and scenarios with strict data privacy requirements.
|
||||
- **Procedure**:
|
||||
1. Refer to the [PaddleOCR Installation Guide](../installation.en.md) to install the *PaddlePaddle framework* and *PaddleOCR*. **It is strongly recommended to install them in a separate virtual environment** to avoid dependency conflicts.
|
||||
2. Refer to [5.2 Local Python Library Configuration](#52-local-python-library-configuration) to modify the `claude_desktop_config.json` file.
|
||||
|
||||
#### Mode 3: Self-hosted Service (`self_hosted`)
|
||||
|
||||
This mode calls a PaddleOCR inference service that you have deployed yourself. This corresponds to the **Serving** solutions provided by PaddleX.
|
||||
- **Use Case**: Offers the advantages of service-oriented deployment and high flexibility, making it well-suited for production environments, especially for scenarios requiring custom service configurations.
|
||||
- **Procedure**:
|
||||
1. Refer to the [PaddleOCR Installation Guide](../installation.en.md) to install the *PaddlePaddle framework* and *PaddleOCR*.
|
||||
2. Refer to the [PaddleOCR Serving Deployment Guide](./serving.en.md) to run the server.
|
||||
3. Refer to [5.3 Self-hosted Service Configuration](#53-self-hosted-service-configuration) to modify the `claude_desktop_config.json` file.
|
||||
4. Set your service address in `PADDLEOCR_MCP_SERVER_URL` (e.g., `"http://127.0.0.1:8080"`).
|
||||
|
||||
## 4. Parameter Reference
|
||||
|
||||
You can control the server's behavior via environment variables or command-line arguments.
|
||||
|
||||
| Environment Variable | Command-line Argument | Type | Description | Options | Default |
|
||||
|:---|:---|:---|:---|:---|:---|
|
||||
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | The pipeline to run | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
|
||||
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | The source of PaddleOCR capabilities | `"local"`, `"aistudio"`, `"self_hosted"` | `"local"` |
|
||||
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | Base URL of the underlying service (required for `aistudio` or `self_hosted` mode) | - | `None` |
|
||||
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio authentication token (required for `aistudio` mode) | - | `None` |
|
||||
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | Request timeout for the underlying service (in seconds) | - | `30` |
|
||||
| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | Specify the device for inference (only effective in `local` mode) | - | `None` |
|
||||
| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | Path to the PaddleX pipeline configuration file (only effective in `local` mode) | - | `None` |
|
||||
| - | `--http` | `bool` | Use HTTP transport instead of stdio (for remote deployment and multiple clients) | - | `False` |
|
||||
| - | `--host` | `str` | Host address for HTTP mode | - | `"127.0.0.1"` |
|
||||
| - | `--port` | `int` | Port for HTTP mode | - | `8080` |
|
||||
| - | `--verbose` | `bool` | Enable verbose logging for debugging | - | `False` |
|
||||
|
||||
## 5. Configuration Examples
|
||||
|
||||
Below are complete configuration examples for different working modes. You can copy and modify them as needed.
|
||||
|
||||
### 5.1 AI Studio Service Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"paddleocr-ocr": {
|
||||
"command": "paddleocr_mcp",
|
||||
"args": [],
|
||||
"env": {
|
||||
"PADDLEOCR_MCP_PIPELINE": "OCR",
|
||||
"PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
|
||||
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>",
|
||||
"PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**:
|
||||
- Replace `<your-server-url>` with your AI Studio **Service Base URL**, e.g., `https://xxxxx.aistudio-hub.baidu.com`. Do not include endpoint paths (like `/ocr`).
|
||||
- Replace `<your-access-token>` with your **Access Token**.
|
||||
|
||||
### 5.2 Local Python Library Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"paddleocr-ocr": {
|
||||
"command": "paddleocr_mcp",
|
||||
"args": [],
|
||||
"env": {
|
||||
"PADDLEOCR_MCP_PIPELINE": "OCR",
|
||||
"PADDLEOCR_MCP_PPOCR_SOURCE": "local"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**:
|
||||
- `PADDLEOCR_MCP_PIPELINE_CONFIG` is optional. If not set, the default pipeline configuration is used. To adjust settings, such as changing models, refer to the [PaddleOCR and PaddleX documentation](../paddleocr_and_paddlex.en.md), export a pipeline configuration file, and set `PADDLEOCR_MCP_PIPELINE_CONFIG` to its absolute path.
|
||||
|
||||
### 5.3 Self-hosted Service Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"paddleocr-ocr": {
|
||||
"command": "paddleocr_mcp",
|
||||
"args": [],
|
||||
"env": {
|
||||
"PADDLEOCR_MCP_PIPELINE": "OCR",
|
||||
"PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
|
||||
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**:
|
||||
- Replace `<your-server-url>` with the base URL of your underlying service (e.g., `http://127.0.0.1:8080`).
|
194
docs/version3.x/deployment/mcp_server.md
Normal file
194
docs/version3.x/deployment/mcp_server.md
Normal file
@ -0,0 +1,194 @@
|
||||
# PaddleOCR MCP 服务器
|
||||
|
||||
[](https://github.com/PaddlePaddle/PaddleOCR)
|
||||
[](https://gofastmcp.com)
|
||||
|
||||
本项目提供一个轻量级的 [Model Context Protocol(MCP)](https://modelcontextprotocol.io/introduction) 服务器,旨在将 PaddleOCR 的强大能力集成到兼容的 MCP Host 中。
|
||||
|
||||
### 主要功能
|
||||
|
||||
- **当前支持的工具**
|
||||
- **OCR**:对图像和 PDF 文件进行文本检测与识别。
|
||||
- **PP-StructureV3**:从图像或 PDF 文件中识别和提取文本块、标题、段落、图片、表格以及其他版面元素,将输入转换为 Markdown 文档。
|
||||
- **支持运行在如下工作模式**
|
||||
- **本地 Python 库**:在本机直接运行 PaddleOCR 产线。
|
||||
- **星河社区服务**:调用飞桨星河社区提供的云端服务。
|
||||
- **自托管服务**:调用用户自行部署的 PaddleOCR 服务。
|
||||
|
||||
### 目录
|
||||
|
||||
- [1. 安装](#1-安装)
|
||||
- [2. 快速开始](#2-快速开始)
|
||||
- [3. 配置说明](#3-配置说明)
|
||||
- [3.1. MCP Host 配置](#31-mcp-host-配置)
|
||||
- [3.2. 工作模式详解](#32-工作模式详解)
|
||||
- [模式一:托管在星河社区的服务](#模式一托管在星河社区的服务-aistudio)
|
||||
- [模式二:本地 Python 库](#模式二本地-python-库-local)
|
||||
- [模式三:自托管服务](#模式三自托管服务-self_hosted)
|
||||
- [4. 参数参考](#4-参数参考)
|
||||
- [5. 配置示例](#5-配置示例)
|
||||
- [5.1 星河社区服务配置](#51-ai-studio-星河社区服务配置)
|
||||
- [5.2 本地 Python 库配置](#52-本地-python-库配置)
|
||||
- [5.3 自托管服务配置](#53-自托管服务配置)
|
||||
|
||||
## 1. 安装
|
||||
|
||||
```bash
|
||||
# 安装 wheel 包
|
||||
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl
|
||||
|
||||
# 或者,从项目源码安装
|
||||
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
|
||||
# pip install -e mcp_server
|
||||
```
|
||||
|
||||
部分 [工作模式](#32-工作模式详解) 可能需要安装额外依赖。
|
||||
|
||||
## 2. 快速开始
|
||||
|
||||
本节将以 **Claude Desktop** 作为 MCP Host,并以 **星河社区服务** 工作模式为例,引导您完成快速配置。此模式无需在本地安装复杂的依赖,推荐新用户使用。请参考 [3. 配置说明](#3-配置说明) 了解其他工作模式的操作流程以及更多配置项。
|
||||
|
||||
1. **准备星河社区服务**
|
||||
- 访问 [飞桨星河社区](https://aistudio.baidu.com/pipeline/mine) 并登录。
|
||||
- 在左侧"更多内容"下的 "PaddleX 产线" 部分,[创建产线] - [OCR] - [通用 OCR] - [直接部署] - [文本识别模块,选择 PP-OCRv5_server_rec] - [开始部署]。
|
||||
- 部署成功后,获取您的 **服务基础 URL**(示例:`https://xxxxxx.aistudio-hub.baidu.com`)。
|
||||
- 在 [此页面](https://aistudio.baidu.com/index/accessToken) 获取您的 **访问令牌**。
|
||||
|
||||
2. **定位 MCP 配置文件** - 详情请参考 [MCP 官方文档](https://modelcontextprotocol.io/quickstart/user)。
|
||||
- **macOS**:`~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||
- **Windows**:`%APPDATA%\Claude\claude_desktop_config.json`
|
||||
- **Linux**:`~/.config/Claude/claude_desktop_config.json`
|
||||
|
||||
3. **添加 MCP 服务器配置**
|
||||
打开 `claude_desktop_config.json` 文件,参考 [5.1 星河社区服务配置](#51-星河社区服务配置) 调整配置,填充到 `claude_desktop_config.json` 中。
|
||||
|
||||
**注意**:
|
||||
- 请勿泄漏您的 **访问令牌**。
|
||||
- 如果 `paddleocr_mcp` 无法在系统 `PATH` 中找到,请将 `command` 设置为可执行文件的绝对路径。
|
||||
|
||||
4. **重启 MCP Host**
|
||||
重启 Claude Desktop。新的 `paddleocr-ocr` 工具现在应该可以在应用中使用了。
|
||||
|
||||
## 3. 配置说明
|
||||
|
||||
### 3.1. MCP Host 配置
|
||||
|
||||
在 Host 的配置文件中(如 `claude_desktop_config.json`),您需要定义工具服务器的启动方式。关键字段如下:
|
||||
- `command`:`paddleocr_mcp`(如果可执行文件可在 `PATH` 中找到)或绝对路径。
|
||||
- `args`:可配置命令行参数,如 `["--verbose"]`。详见 [4. 参数参考](#4-参数参考)。
|
||||
- `env`:可配置环境变量。详见 [4. 参数参考](#4-参数参考)。
|
||||
|
||||
### 3.2. 工作模式详解
|
||||
|
||||
您可以根据需求配置 MCP 服务器,使其运行在不同的工作模式。
|
||||
|
||||
#### 模式一:托管在星河社区的服务 (`aistudio`)
|
||||
|
||||
此模式调用 [飞桨星河社区](https://aistudio.baidu.com/pipeline/mine) 的服务。
|
||||
- **适用场景**:适合快速体验功能、快速验证方案等,也适用于零代码开发场景。
|
||||
- **操作流程**:请参考 [2. 快速开始](#2-快速开始)。
|
||||
- 除了使用平台预设的模型方案,您也可以在平台上自行训练并部署自定义模型。
|
||||
|
||||
#### 模式二:本地 Python 库 (`local`)
|
||||
|
||||
此模式直接在本地计算机上运行模型,对本地环境与计算机性能有一定要求。
|
||||
- **适用场景**:需要离线使用、对数据隐私有严格要求的场景。
|
||||
- **操作流程**:
|
||||
1. 参考 [PaddleOCR 安装文档](../installation.md) 安装 *飞桨框架* 和 *PaddleOCR*。为避免依赖冲突,**强烈建议在独立的虚拟环境中安装**。
|
||||
2. 参考 [配置示例](#52-本地-python-库配置) 更改 `claude_desktop_config.json` 文件内容。
|
||||
|
||||
#### 模式三:自托管服务 (`self_hosted`)
|
||||
|
||||
此模式调用您自行部署的 PaddleOCR 推理服务。
|
||||
- **适用场景**:具备服务化部署优势及高度灵活性,较适合生产环境,尤其是适用于需要自定义服务配置的场景。
|
||||
- **操作流程**:
|
||||
1. 参考 [PaddleOCR 安装文档](../installation.md) 安装 *飞桨框架* 和 *PaddleOCR*。
|
||||
2. 参考 [PaddleOCR 服务化部署文档](./serving.md) 运行服务器。
|
||||
3. 参考 [配置示例](#53-自托管服务配置) 更改 `claude_desktop_config.json` 文件内容。
|
||||
4. 将您的服务地址填入 `PADDLEOCR_MCP_SERVER_URL` (例如:`"http://127.0.0.1:8000"`)。
|
||||
|
||||
## 4. 参数参考
|
||||
|
||||
您可以通过环境变量或命令行参数来控制服务器的行为。
|
||||
|
||||
| 环境变量 | 命令行参数 | 类型 | 描述 | 可选值 | 默认值 |
|
||||
|:---------|:-----------|:-----|:-----|:-------|:-------|
|
||||
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | 要运行的产线 | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
|
||||
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | PaddleOCR 能力来源 | `"local"`, `"aistudio"`, `"self_hosted"` | `"local"` |
|
||||
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | 底层服务基础 URL(`aistudio` 或 `self_hosted` 模式下必需) | - | `None` |
|
||||
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio 认证令牌(`aistudio` 模式下必需) | - | `None` |
|
||||
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | 底层服务请求的超时时间(秒) | - | `30` |
|
||||
| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | 指定运行推理的设备(仅在 `local` 模式下生效) | - | `None` |
|
||||
| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | PaddleOCR 产线配置文件路径(仅在 `local` 模式下生效) | - | `None` |
|
||||
| - | `--http` | `bool` | 使用 HTTP 传输而非 stdio(适用于远程部署和多客户端) | - | `False` |
|
||||
| - | `--host` | `str` | HTTP 模式的主机地址 | - | `"127.0.0.1"` |
|
||||
| - | `--port` | `int` | HTTP 模式的端口 | - | `8000` |
|
||||
| - | `--verbose` | `bool` | 启用详细日志记录,便于调试 | - | `False` |
|
||||
|
||||
## 5. 配置示例
|
||||
|
||||
以下是针对不同工作模式的完整配置示例,您可以直接复制并根据需要修改:
|
||||
|
||||
### 5.1 星河社区服务配置
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"paddleocr-ocr": {
|
||||
"command": "paddleocr_mcp",
|
||||
"args": [],
|
||||
"env": {
|
||||
"PADDLEOCR_MCP_PIPELINE": "OCR",
|
||||
"PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
|
||||
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>",
|
||||
"PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**说明**:
|
||||
- 将 `<your-server-url>` 替换为您的星河社区服务的 **服务基础 URL**,例如 `https://xxxxx.aistudio-hub.baidu.com`,注意不要带有端点路径(如 `/ocr`)。
|
||||
- 将 `<your-access-token>` 替换为您的 **访问令牌**。
|
||||
|
||||
### 5.2 本地 Python 库配置
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"paddleocr-ocr": {
|
||||
"command": "paddleocr_mcp",
|
||||
"args": [],
|
||||
"env": {
|
||||
"PADDLEOCR_MCP_PIPELINE": "OCR",
|
||||
"PADDLEOCR_MCP_PPOCR_SOURCE": "local"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**说明**:
|
||||
- `PADDLEOCR_MCP_PIPELINE_CONFIG` 为可选项,不设置时使用产线默认配置。如需调整配置,例如更换模型,请参考 [PaddleOCR 文档](../paddleocr_and_paddlex.md) 导出产线配置文件,并将 `PADDLEOCR_MCP_PIPELINE_CONFIG` 设置为配置文件的绝对路径。
|
||||
|
||||
### 5.3 自托管服务配置
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"paddleocr-ocr": {
|
||||
"command": "paddleocr_mcp",
|
||||
"args": [],
|
||||
"env": {
|
||||
"PADDLEOCR_MCP_PIPELINE": "OCR",
|
||||
"PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
|
||||
"PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**说明**:
|
||||
- 将 `<your-server-url>` 替换为底层服务的基础 URL(如:`http://127.0.0.1:8000`)。
|
5
mcp_server/README.md
Normal file
5
mcp_server/README.md
Normal file
@ -0,0 +1,5 @@
|
||||
# PaddleOCR MCP 服务器
|
||||
|
||||
中文 | [English](./README_en.md)
|
||||
|
||||
请查看 [文档](../docs/version3.x/deployment/mcp_server.md)。
|
5
mcp_server/README_en.md
Normal file
5
mcp_server/README_en.md
Normal file
@ -0,0 +1,5 @@
|
||||
# PaddleOCR MCP Server
|
||||
|
||||
[中文](./README.md)| English
|
||||
|
||||
Please refer to the [documentation](../docs/version3.x/deployment/mcp_server.en.md)。
|
15
mcp_server/paddleocr_mcp/__init__.py
Normal file
15
mcp_server/paddleocr_mcp/__init__.py
Normal file
@ -0,0 +1,15 @@
|
||||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
__version__ = "0.1.0"
|
187
mcp_server/paddleocr_mcp/__main__.py
Normal file
187
mcp_server/paddleocr_mcp/__main__.py
Normal file
@ -0,0 +1,187 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import argparse
|
||||
import contextlib
|
||||
import os
|
||||
import sys
|
||||
from typing import AsyncIterator, Dict
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
from .pipelines import create_pipeline_handler
|
||||
|
||||
|
||||
def _parse_args() -> argparse.Namespace:
|
||||
"""Parse command line arguments."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="PaddleOCR MCP server - Supports local library, AI Studio service, and self-hosted servers."
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--pipeline",
|
||||
choices=["OCR", "PP-StructureV3"],
|
||||
default=os.getenv("PADDLEOCR_MCP_PIPELINE", "OCR"),
|
||||
help="Pipeline name.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ppocr_source",
|
||||
choices=["local", "aistudio", "self_hosted"],
|
||||
default=os.getenv("PADDLEOCR_MCP_PPOCR_SOURCE", "local"),
|
||||
help="Source of PaddleOCR functionality: local (local library), aistudio (AI Studio service), self_hosted (self-hosted server).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--http",
|
||||
action="store_true",
|
||||
help="Use HTTP transport instead of STDIO (suitable for remote deployment and multiple clients).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--host",
|
||||
default="127.0.0.1",
|
||||
help="Host address for HTTP mode (default: 127.0.0.1).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--port",
|
||||
type=int,
|
||||
default=8000,
|
||||
help="Port for HTTP mode (default: 8000).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--verbose", action="store_true", help="Enable verbose logging for debugging."
|
||||
)
|
||||
|
||||
# Local mode configuration
|
||||
parser.add_argument(
|
||||
"--pipeline_config",
|
||||
default=os.getenv("PADDLEOCR_MCP_PIPELINE_CONFIG"),
|
||||
help="PaddleOCR pipeline configuration file path (for local mode).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default=os.getenv("PADDLEOCR_MCP_DEVICE"),
|
||||
help="Device to run inference on.",
|
||||
)
|
||||
|
||||
# Service mode configuration
|
||||
parser.add_argument(
|
||||
"--server_url",
|
||||
default=os.getenv("PADDLEOCR_MCP_SERVER_URL"),
|
||||
help="Base URL of the underlying server (required in service mode).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--aistudio_access_token",
|
||||
default=os.getenv("PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN"),
|
||||
help="AI Studio access token (required for AI Studio).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.getenv("PADDLEOCR_MCP_TIMEOUT", "30")),
|
||||
help="API request timeout in seconds for the underlying server.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def _validate_args(args: argparse.Namespace) -> None:
|
||||
"""Validate command line arguments."""
|
||||
if not args.http and (args.host != "127.0.0.1" or args.port != 8000):
|
||||
print(
|
||||
"Host and port arguments are only valid when using HTTP transport (see: `--http`).",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
if args.ppocr_source in ["aistudio", "self_hosted"]:
|
||||
if not args.server_url:
|
||||
print("Error: The server base URL is required.", file=sys.stderr)
|
||||
print(
|
||||
"Please either set `--server_url` or set the environment variable "
|
||||
"`PADDLEOCR_MCP_SERVER_URL`.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
if args.ppocr_source == "aistudio" and not args.aistudio_access_token:
|
||||
print("Error: The AI Studio access token is required.", file=sys.stderr)
|
||||
print(
|
||||
"Please either set `--aistudio_access_token` or set the environment variable "
|
||||
"`PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN`.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Main entry point."""
|
||||
args = _parse_args()
|
||||
|
||||
_validate_args(args)
|
||||
|
||||
try:
|
||||
pipeline_handler = create_pipeline_handler(
|
||||
args.pipeline,
|
||||
args.ppocr_source,
|
||||
pipeline_config=args.pipeline_config,
|
||||
device=args.device,
|
||||
server_url=args.server_url,
|
||||
aistudio_access_token=args.aistudio_access_token,
|
||||
timeout=args.timeout,
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"Failed to create the pipeline handler: {e}", file=sys.stderr)
|
||||
if args.verbose:
|
||||
import traceback
|
||||
|
||||
traceback.print_exc(file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
@contextlib.asynccontextmanager
|
||||
async def _lifespan(mcp: FastMCP) -> AsyncIterator[Dict]:
|
||||
async with pipeline_handler:
|
||||
yield {}
|
||||
|
||||
try:
|
||||
server_name = f"PaddleOCR {args.pipeline} MCP server"
|
||||
mcp = FastMCP(
|
||||
name=server_name,
|
||||
lifespan=_lifespan,
|
||||
log_level="INFO" if args.verbose else "WARNING",
|
||||
)
|
||||
|
||||
pipeline_handler.register_tools(mcp)
|
||||
|
||||
if args.http:
|
||||
mcp.run(
|
||||
transport="streamable-http",
|
||||
host=args.host,
|
||||
port=args.port,
|
||||
)
|
||||
else:
|
||||
mcp.run()
|
||||
|
||||
except Exception as e:
|
||||
print(f"Failed to start the server: {e}", file=sys.stderr)
|
||||
if args.verbose:
|
||||
import traceback
|
||||
|
||||
traceback.print_exc(file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
721
mcp_server/paddleocr_mcp/pipelines.py
Normal file
721
mcp_server/paddleocr_mcp/pipelines.py
Normal file
@ -0,0 +1,721 @@
|
||||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import abc
|
||||
import asyncio
|
||||
import base64
|
||||
import io
|
||||
import json
|
||||
import mimetypes
|
||||
import re
|
||||
from pathlib import PurePath
|
||||
from queue import Queue
|
||||
from threading import Thread
|
||||
from typing import Any, Callable, Dict, List, Optional, Type, Union
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import httpx
|
||||
import magic
|
||||
import numpy as np
|
||||
from fastmcp import Context, FastMCP
|
||||
from mcp.types import ImageContent, TextContent
|
||||
from PIL import Image as PILImage
|
||||
from typing_extensions import Literal, Self, assert_never
|
||||
|
||||
try:
|
||||
from paddleocr import PaddleOCR, PPStructureV3
|
||||
|
||||
LOCAL_OCR_AVAILABLE = True
|
||||
except ImportError:
|
||||
LOCAL_OCR_AVAILABLE = False
|
||||
|
||||
|
||||
OutputMode = Literal["simple", "detailed"]
|
||||
|
||||
|
||||
def _is_file_path(s: str) -> bool:
|
||||
try:
|
||||
PurePath(s)
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def _is_base64(s: str) -> bool:
|
||||
pattern = r"^[A-Za-z0-9+/]+={0,2}$"
|
||||
return bool(re.fullmatch(pattern, s))
|
||||
|
||||
|
||||
def _is_url(s: str) -> bool:
|
||||
if not (s.startswith("http://") or s.startswith("https://")):
|
||||
return False
|
||||
result = urlparse(s)
|
||||
return all([result.scheme, result.netloc]) and result.scheme in ("http", "https")
|
||||
|
||||
|
||||
def _infer_file_type_from_url(url: str) -> str:
|
||||
url_parts = urlparse(url)
|
||||
filename = url_parts.path.split("/")[-1]
|
||||
file_type = mimetypes.guess_type(filename)[0]
|
||||
if not file_type:
|
||||
return "UNKNOWN"
|
||||
if file_type.startswith("image/"):
|
||||
return "IMAGE"
|
||||
elif file_type == "application/pdf":
|
||||
return "PDF"
|
||||
return "UNKNOWN"
|
||||
|
||||
|
||||
def _infer_file_type_from_bytes(data: bytes) -> str:
|
||||
mime = magic.from_buffer(data, mime=True)
|
||||
if mime.startswith("image/"):
|
||||
return "IMAGE"
|
||||
elif mime == "application/pdf":
|
||||
return "PDF"
|
||||
return "UNKNOWN"
|
||||
|
||||
|
||||
class _EngineWrapper:
|
||||
def __init__(self, engine: Any) -> None:
|
||||
self._engine = engine
|
||||
self._queue: Queue = Queue()
|
||||
self._closed = False
|
||||
self._loop = asyncio.get_running_loop()
|
||||
self._thread = Thread(target=self._worker, daemon=False)
|
||||
self._thread.start()
|
||||
|
||||
@property
|
||||
def engine(self) -> Any:
|
||||
return self._engine
|
||||
|
||||
async def call(self, func: Callable, *args: Any, **kwargs: Any) -> Any:
|
||||
if self._closed:
|
||||
raise RuntimeError("Engine wrapper has already been closed")
|
||||
fut = self._loop.create_future()
|
||||
self._queue.put((func, args, kwargs, fut))
|
||||
return await fut
|
||||
|
||||
async def close(self) -> None:
|
||||
if not self._closed:
|
||||
self._queue.put(None)
|
||||
await self._loop.run_in_executor(None, self._thread.join)
|
||||
|
||||
def _worker(self) -> None:
|
||||
while not self._closed:
|
||||
item = self._queue.get()
|
||||
if item is None:
|
||||
break
|
||||
func, args, kwargs, fut = item
|
||||
try:
|
||||
result = func(*args, **kwargs)
|
||||
self._loop.call_soon_threadsafe(fut.set_result, result)
|
||||
except Exception as e:
|
||||
self._loop.call_soon_threadsafe(fut.set_exception, e)
|
||||
finally:
|
||||
self._queue.task_done()
|
||||
|
||||
|
||||
class PipelineHandler(abc.ABC):
|
||||
"""Abstract base class for pipeline handlers."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
pipeline: str,
|
||||
ppocr_source: str,
|
||||
pipeline_config: Optional[str],
|
||||
device: Optional[str],
|
||||
server_url: Optional[str],
|
||||
aistudio_access_token: Optional[str],
|
||||
timeout: Optional[int],
|
||||
) -> None:
|
||||
"""Initialize the pipeline handler.
|
||||
|
||||
Args:
|
||||
pipeline: Pipeline name.
|
||||
ppocr_source: Source of PaddleOCR functionality.
|
||||
pipeline_config: Path to pipeline configuration.
|
||||
device: Device to run inference on.
|
||||
server_url: Base URL for service mode.
|
||||
aistudio_access_token: AI Studio access token.
|
||||
timeout: Timeout in seconds.
|
||||
"""
|
||||
self._pipeline = pipeline
|
||||
if ppocr_source == "local":
|
||||
self._mode = "local"
|
||||
elif ppocr_source in ("aistudio", "self_hosted"):
|
||||
self._mode = "service"
|
||||
else:
|
||||
raise ValueError(f"Unknown PaddleOCR source {repr(ppocr_source)}")
|
||||
self._ppocr_source = ppocr_source
|
||||
self._pipeline_config = pipeline_config
|
||||
self._device = device
|
||||
self._server_url = server_url
|
||||
self._aistudio_access_token = aistudio_access_token
|
||||
self._timeout = timeout or 30 # Default timeout of 30 seconds
|
||||
|
||||
if self._mode == "local":
|
||||
if not LOCAL_OCR_AVAILABLE:
|
||||
raise RuntimeError("PaddleOCR is not locally available")
|
||||
self._engine = self._create_local_engine()
|
||||
|
||||
self._status: Literal["initialized", "started", "stopped"] = "initialized"
|
||||
|
||||
async def start(self) -> None:
|
||||
if self._status == "initialized":
|
||||
if self._mode == "local":
|
||||
self._engine_wrapper = _EngineWrapper(self._engine)
|
||||
self._status = "started"
|
||||
elif self._status == "started":
|
||||
pass
|
||||
elif self._status == "stopped":
|
||||
raise RuntimeError("Pipeline handler has already been stopped")
|
||||
else:
|
||||
assert_never(self._status)
|
||||
|
||||
async def stop(self) -> None:
|
||||
if self._status == "initialized":
|
||||
raise RuntimeError("Pipeline handler has not been started")
|
||||
elif self._status == "started":
|
||||
if self._mode == "local":
|
||||
await self._engine_wrapper.close()
|
||||
self._status = "stopped"
|
||||
elif self._status == "stopped":
|
||||
pass
|
||||
else:
|
||||
assert_never(self._status)
|
||||
|
||||
async def __aenter__(self) -> Self:
|
||||
await self.start()
|
||||
return self
|
||||
|
||||
async def __aexit__(
|
||||
self,
|
||||
exc_type: Any,
|
||||
exc_val: Any,
|
||||
exc_tb: Any,
|
||||
) -> None:
|
||||
await self.stop()
|
||||
|
||||
@abc.abstractmethod
|
||||
def register_tools(self, mcp: FastMCP) -> None:
|
||||
"""Register tools with the MCP server.
|
||||
|
||||
Args:
|
||||
mcp: The `FastMCP` instance.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abc.abstractmethod
|
||||
def _create_local_engine(self) -> Any:
|
||||
"""Create the local OCR engine.
|
||||
|
||||
Returns:
|
||||
The OCR engine instance.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class SimpleInferencePipelineHandler(PipelineHandler):
|
||||
"""Base class for simple inference pipeline handlers."""
|
||||
|
||||
async def process(
|
||||
self, input_data: str, output_mode: OutputMode, ctx: Context, **kwargs: Any
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
"""Process input data through the pipeline.
|
||||
|
||||
Args:
|
||||
input_data: Input data (file path, URL, or Base64).
|
||||
output_mode: Output mode ("simple" or "detailed").
|
||||
ctx: MCP context.
|
||||
**kwargs: Additional pipeline-specific arguments.
|
||||
|
||||
Returns:
|
||||
Processed result in the requested output format.
|
||||
"""
|
||||
try:
|
||||
await ctx.info(
|
||||
f"Starting {self._pipeline} processing (source: {self._ppocr_source})"
|
||||
)
|
||||
|
||||
if self._mode == "local":
|
||||
processed_input = self._process_input_for_local(input_data)
|
||||
raw_result = await self._predict_with_local_engine(
|
||||
processed_input, ctx, **kwargs
|
||||
)
|
||||
result = self._parse_local_result(raw_result, ctx)
|
||||
else:
|
||||
processed_input, file_type = self._process_input_for_service(input_data)
|
||||
raw_result = await self._call_service(
|
||||
processed_input, file_type, ctx, **kwargs
|
||||
)
|
||||
result = await self._parse_service_result(raw_result, ctx)
|
||||
|
||||
await self._log_completion_stats(result, ctx)
|
||||
return self._format_output(result, output_mode == "detailed", ctx)
|
||||
|
||||
except Exception as e:
|
||||
await ctx.error(f"{self._pipeline} processing failed: {str(e)}")
|
||||
return self._handle_error(str(e), output_mode)
|
||||
|
||||
def _process_input_for_local(self, input_data: str) -> Union[str, np.ndarray]:
|
||||
if _is_file_path(input_data) or _is_url(input_data):
|
||||
return input_data
|
||||
elif _is_base64(input_data):
|
||||
if input_data.startswith("data:"):
|
||||
base64_data = input_data.split(",", 1)[1]
|
||||
else:
|
||||
base64_data = input_data
|
||||
try:
|
||||
image_bytes = base64.b64decode(base64_data)
|
||||
image_pil = PILImage.open(io.BytesIO(image_bytes))
|
||||
image_arr = np.array(image_pil.convert("RGB"))
|
||||
# Convert RGB to BGR
|
||||
return np.ascontiguousarray(image_arr[..., ::-1])
|
||||
except Exception as e:
|
||||
raise ValueError(f"Failed to decode Base64 image: {e}")
|
||||
else:
|
||||
raise ValueError("Invalid input data format")
|
||||
|
||||
def _process_input_for_service(self, input_data: str) -> tuple[str, str]:
|
||||
if _is_file_path(input_data):
|
||||
try:
|
||||
with open(input_data, "rb") as f:
|
||||
bytes_ = f.read()
|
||||
input_data = base64.b64encode(bytes_).decode("ascii")
|
||||
file_type = _infer_file_type_from_bytes(bytes_)
|
||||
except Exception as e:
|
||||
raise ValueError(f"Failed to read file: {e}")
|
||||
elif _is_url(input_data):
|
||||
file_type = _infer_file_type_from_url(input_data)
|
||||
elif _is_base64(input_data):
|
||||
try:
|
||||
if input_data.startswith("data:"):
|
||||
base64_data = input_data.split(",", 1)[1]
|
||||
else:
|
||||
base64_data = input_data
|
||||
bytes_ = base64.b64decode(base64_data)
|
||||
file_type = _infer_file_type_from_bytes(bytes_)
|
||||
except Exception as e:
|
||||
raise ValueError(f"Failed to decode Base64 data: {e}")
|
||||
else:
|
||||
raise ValueError("Invalid input data format")
|
||||
|
||||
return input_data, file_type
|
||||
|
||||
async def _call_service(
|
||||
self, processed_input: str, file_type: str, ctx: Context, **kwargs: Any
|
||||
) -> Dict[str, Any]:
|
||||
if not self._server_url:
|
||||
raise RuntimeError("Server URL not configured")
|
||||
|
||||
endpoint = self._get_service_endpoint()
|
||||
url = f"{self._server_url.rstrip('/')}/{endpoint.lstrip('/')}"
|
||||
|
||||
payload = self._prepare_service_payload(processed_input, file_type, **kwargs)
|
||||
headers = {"Content-Type": "application/json"}
|
||||
|
||||
if self._ppocr_source == "aistudio":
|
||||
if not self._aistudio_access_token:
|
||||
raise RuntimeError("Missing AI Studio access token")
|
||||
headers["Authorization"] = f"token {self._aistudio_access_token}"
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=self._timeout) as client:
|
||||
response = await client.post(url, json=payload, headers=headers)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except httpx.HTTPError as e:
|
||||
raise RuntimeError(f"Service call failed: {str(e)}")
|
||||
except json.JSONDecodeError as e:
|
||||
raise RuntimeError(f"Invalid service response: {str(e)}")
|
||||
|
||||
def _prepare_service_payload(
|
||||
self, processed_input: str, file_type: str, **kwargs: Any
|
||||
) -> Dict[str, Any]:
|
||||
api_file_type = 1 if file_type == "IMAGE" else 0
|
||||
payload = {"file": processed_input, "fileType": api_file_type, **kwargs}
|
||||
return payload
|
||||
|
||||
def _handle_error(
|
||||
self, error_msg: str, output_mode: OutputMode
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
if output_mode == "detailed":
|
||||
return [TextContent(type="text", text=f"Error: {error_msg}")]
|
||||
return f"Error: {error_msg}"
|
||||
|
||||
@abc.abstractmethod
|
||||
def _get_service_endpoint(self) -> str:
|
||||
"""Get the service endpoint.
|
||||
|
||||
Returns:
|
||||
Service endpoint path.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abc.abstractmethod
|
||||
def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict[str, Any]:
|
||||
"""Parse raw result from local engine into a unified format.
|
||||
|
||||
Args:
|
||||
local_result: Raw result from local engine.
|
||||
ctx: MCP context.
|
||||
|
||||
Returns:
|
||||
Parsed result in unified format.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abc.abstractmethod
|
||||
async def _parse_service_result(
|
||||
self, service_result: Dict[str, Any], ctx: Context
|
||||
) -> Dict[str, Any]:
|
||||
"""Parse raw result from the service into a unified format.
|
||||
|
||||
Args:
|
||||
service_result: Raw result from the service.
|
||||
ctx: MCP context.
|
||||
|
||||
Returns:
|
||||
Parsed result in unified format.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abc.abstractmethod
|
||||
async def _log_completion_stats(self, result: Dict[str, Any], ctx: Context) -> None:
|
||||
"""Log statistics after processing completion.
|
||||
|
||||
Args:
|
||||
result: Processing result.
|
||||
ctx: MCP context.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abc.abstractmethod
|
||||
def _format_output(
|
||||
self, result: Dict[str, Any], detailed: bool, ctx: Context
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
"""Format output into simple or detailed format.
|
||||
|
||||
Args:
|
||||
result: Processing result.
|
||||
detailed: Whether to use detailed format.
|
||||
ctx: MCP context.
|
||||
|
||||
Returns:
|
||||
Formatted output in requested format.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
async def _predict_with_local_engine(
|
||||
self, processed_input: Union[str, np.ndarray], ctx: Context, **kwargs: Any
|
||||
) -> Dict:
|
||||
if not hasattr(self, "_engine_wrapper"):
|
||||
raise RuntimeError("Engine wrapper has not been initialized")
|
||||
return await self._engine_wrapper.call(
|
||||
self._engine_wrapper.engine.predict, processed_input, **kwargs
|
||||
)
|
||||
|
||||
|
||||
class OCRHandler(SimpleInferencePipelineHandler):
|
||||
def register_tools(self, mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
async def _ocr(
|
||||
input_data: str,
|
||||
output_mode: OutputMode,
|
||||
ctx: Context,
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
"""Extract text from images and PDFs.
|
||||
|
||||
Args:
|
||||
input_data: File path, URL, or Base64 data.
|
||||
output_mode: "simple" for clean text, "detailed" for JSON with positioning.
|
||||
"""
|
||||
return await self.process(input_data, output_mode, ctx)
|
||||
|
||||
def _create_local_engine(self) -> Any:
|
||||
return PaddleOCR(
|
||||
paddlex_config=self._pipeline_config,
|
||||
device=self._device,
|
||||
enable_mkldnn=False,
|
||||
)
|
||||
|
||||
def _get_service_endpoint(self) -> str:
|
||||
return "ocr"
|
||||
|
||||
def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict:
|
||||
result = local_result[0]
|
||||
texts = result["rec_texts"]
|
||||
scores = result["rec_scores"]
|
||||
boxes = result["rec_boxes"]
|
||||
|
||||
# Direct assembly
|
||||
clean_texts, confidences, blocks = [], [], []
|
||||
|
||||
for i, text in enumerate(texts):
|
||||
if text and text.strip():
|
||||
conf = scores[i] if i < len(scores) else 0
|
||||
clean_texts.append(text.strip())
|
||||
confidences.append(conf)
|
||||
block = {
|
||||
"text": text.strip(),
|
||||
"confidence": round(conf, 3),
|
||||
"bbox": boxes[i].tolist(),
|
||||
}
|
||||
blocks.append(block)
|
||||
|
||||
return {
|
||||
"text": "\n".join(clean_texts),
|
||||
"confidence": sum(confidences) / len(confidences) if confidences else 0,
|
||||
"blocks": blocks,
|
||||
}
|
||||
|
||||
async def _parse_service_result(self, service_result: Dict, ctx: Context) -> Dict:
|
||||
result_data = service_result.get("result", service_result)
|
||||
ocr_results = result_data.get("ocrResults")
|
||||
|
||||
# Direct extraction and assembly
|
||||
all_texts, all_confidences, blocks = [], [], []
|
||||
|
||||
for ocr_result in ocr_results:
|
||||
pruned = ocr_result["prunedResult"]
|
||||
|
||||
texts = pruned["rec_texts"]
|
||||
scores = pruned["rec_scores"]
|
||||
boxes = pruned["rec_boxes"]
|
||||
|
||||
for i, text in enumerate(texts):
|
||||
if text and text.strip():
|
||||
conf = scores[i] if i < len(scores) else 0
|
||||
all_texts.append(text.strip())
|
||||
all_confidences.append(conf)
|
||||
block = {
|
||||
"text": text.strip(),
|
||||
"confidence": round(conf, 3),
|
||||
"bbox": boxes[i],
|
||||
}
|
||||
blocks.append(block)
|
||||
|
||||
return {
|
||||
"text": "\n".join(all_texts),
|
||||
"confidence": (
|
||||
sum(all_confidences) / len(all_confidences) if all_confidences else 0
|
||||
),
|
||||
"blocks": blocks,
|
||||
}
|
||||
|
||||
async def _log_completion_stats(self, result: Dict, ctx: Context) -> None:
|
||||
text_length = len(result["text"])
|
||||
block_count = len(result["blocks"])
|
||||
await ctx.info(
|
||||
f"OCR completed: {text_length} characters, {block_count} text blocks"
|
||||
)
|
||||
|
||||
def _format_output(
|
||||
self, result: Dict, detailed: bool, ctx: Context
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
if not result["text"].strip():
|
||||
return (
|
||||
"❌ No text detected"
|
||||
if not detailed
|
||||
else json.dumps({"error": "No text detected"}, ensure_ascii=False)
|
||||
)
|
||||
|
||||
if detailed:
|
||||
# L2: Return all data
|
||||
return json.dumps(result, ensure_ascii=False, indent=2)
|
||||
else:
|
||||
# L1: Core text + key statistics
|
||||
confidence = result["confidence"]
|
||||
block_count = len(result["blocks"])
|
||||
|
||||
output = result["text"]
|
||||
if confidence > 0:
|
||||
output += f"\n\n📊 Confidence: {(confidence * 100):.1f}% | {block_count} text blocks"
|
||||
|
||||
return output
|
||||
|
||||
|
||||
class PPStructureV3Handler(SimpleInferencePipelineHandler):
|
||||
def register_tools(self, mcp: FastMCP) -> None:
|
||||
@mcp.tool()
|
||||
async def _pp_structurev3(
|
||||
input_data: str,
|
||||
output_mode: OutputMode,
|
||||
ctx: Context,
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
"""Document layout analysis.
|
||||
|
||||
Args:
|
||||
input_data: File path, URL, or Base64 data.
|
||||
output_mode: "simple" for markdown text, "detailed" for JSON with metadata + prunedResult.
|
||||
|
||||
Returns:
|
||||
- Simple: Markdown text + images (if available)
|
||||
- Detailed: prunedResult/local detailed info + markdown text + images
|
||||
"""
|
||||
return await self.process(input_data, output_mode, ctx)
|
||||
|
||||
def _create_local_engine(self) -> Any:
|
||||
return PPStructureV3(paddlex_config=self._pipeline_config, device=self._device)
|
||||
|
||||
def _get_service_endpoint(self) -> str:
|
||||
return "layout-parsing"
|
||||
|
||||
def _parse_local_result(self, local_result: Dict, ctx: Context) -> Dict:
|
||||
markdown_parts = []
|
||||
detailed_results = []
|
||||
|
||||
# TODO return images
|
||||
for result in local_result:
|
||||
text = result.markdown["markdown_texts"]
|
||||
markdown_parts.append(text)
|
||||
detailed_results.append(result)
|
||||
|
||||
return {
|
||||
# TODO: Page concatenation can be done better via `pipeline.concatenate_markdown_pages`
|
||||
"markdown": "\n".join(markdown_parts),
|
||||
"pages": len(local_result),
|
||||
"images_mapping": {},
|
||||
"detailed_results": detailed_results,
|
||||
}
|
||||
|
||||
async def _parse_service_result(self, service_result: Dict, ctx: Context) -> Dict:
|
||||
result_data = service_result.get("result", service_result)
|
||||
layout_results = result_data.get("layoutParsingResults")
|
||||
|
||||
if not layout_results:
|
||||
return {
|
||||
"markdown": "",
|
||||
"pages": 0,
|
||||
"images_mapping": {},
|
||||
"detailed_results": [],
|
||||
}
|
||||
|
||||
# 简化:直接提取需要的信息
|
||||
markdown_parts = []
|
||||
all_images_mapping = {}
|
||||
detailed_results = []
|
||||
|
||||
for res in layout_results:
|
||||
# 提取markdown文本
|
||||
markdown_parts.append(res["markdown"]["text"])
|
||||
# 提取图片
|
||||
all_images_mapping.update(res["markdown"]["images"])
|
||||
# 保存prunedResult用于L2详细信息
|
||||
detailed_results.append(res["prunedResult"])
|
||||
|
||||
return {
|
||||
"markdown": "\n".join(markdown_parts),
|
||||
"pages": len(layout_results), # 简化为页数
|
||||
"images_mapping": all_images_mapping,
|
||||
"detailed_results": detailed_results,
|
||||
}
|
||||
|
||||
async def _log_completion_stats(self, result: Dict, ctx: Context) -> None:
|
||||
page_count = result["pages"] # 现在是数字而不是列表
|
||||
await ctx.info(f"Structure analysis completed: {page_count} pages")
|
||||
|
||||
def _format_output(
|
||||
self, result: Dict, detailed: bool, ctx: Context
|
||||
) -> Union[str, List[Union[TextContent, ImageContent]]]:
|
||||
if not result["markdown"].strip():
|
||||
return (
|
||||
"❌ No document content detected"
|
||||
if not detailed
|
||||
else json.dumps({"error": "No content detected"}, ensure_ascii=False)
|
||||
)
|
||||
|
||||
markdown_text = result["markdown"]
|
||||
images_mapping = result.get("images_mapping", {})
|
||||
|
||||
if detailed:
|
||||
# L2: 返回统一的详细结果 + markdown混合内容
|
||||
content_list = []
|
||||
if "detailed_results" in result and result["detailed_results"]:
|
||||
for detailed_result in result["detailed_results"]:
|
||||
content_list.append(
|
||||
TextContent(
|
||||
type="text",
|
||||
text=json.dumps(
|
||||
detailed_result,
|
||||
ensure_ascii=False,
|
||||
indent=2,
|
||||
default=str,
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
# 添加markdown混合内容
|
||||
content_list.extend(
|
||||
self._parse_markdown_with_images(markdown_text, images_mapping)
|
||||
)
|
||||
|
||||
return content_list
|
||||
else:
|
||||
# L1: 简化的混合内容格式,只包含markdown和图片
|
||||
return self._parse_markdown_with_images(markdown_text, images_mapping)
|
||||
|
||||
def _parse_markdown_with_images(
|
||||
self, markdown_text: str, images_mapping: Dict[str, str]
|
||||
) -> List[Union[TextContent, ImageContent]]:
|
||||
"""解析markdown文本,返回文字和图片的混合列表"""
|
||||
if not images_mapping:
|
||||
# 没有图片,直接返回文本
|
||||
return [TextContent(type="text", text=markdown_text)]
|
||||
|
||||
content_list = []
|
||||
img_pattern = r'<img[^>]+src="([^"]+)"[^>]*>'
|
||||
last_pos = 0
|
||||
|
||||
for match in re.finditer(img_pattern, markdown_text):
|
||||
# 添加图片前的文本
|
||||
text_before = markdown_text[last_pos : match.start()]
|
||||
if text_before.strip():
|
||||
content_list.append(TextContent(type="text", text=text_before))
|
||||
|
||||
# 添加图片
|
||||
img_src = match.group(1)
|
||||
if img_src in images_mapping:
|
||||
content_list.append(
|
||||
ImageContent(
|
||||
type="image",
|
||||
data=images_mapping[img_src],
|
||||
mimeType="image/jpeg",
|
||||
)
|
||||
)
|
||||
|
||||
last_pos = match.end()
|
||||
|
||||
# 添加剩余文本
|
||||
remaining_text = markdown_text[last_pos:]
|
||||
if remaining_text.strip():
|
||||
content_list.append(TextContent(type="text", text=remaining_text))
|
||||
|
||||
return content_list or [TextContent(type="text", text=markdown_text)]
|
||||
|
||||
|
||||
_PIPELINE_HANDLERS: Dict[str, Type[PipelineHandler]] = {
|
||||
"OCR": OCRHandler,
|
||||
"PP-StructureV3": PPStructureV3Handler,
|
||||
}
|
||||
|
||||
|
||||
def create_pipeline_handler(
|
||||
pipeline: str, /, *args: Any, **kwargs: Any
|
||||
) -> PipelineHandler:
|
||||
if pipeline in _PIPELINE_HANDLERS:
|
||||
cls = _PIPELINE_HANDLERS[pipeline]
|
||||
return cls(pipeline, *args, **kwargs)
|
||||
else:
|
||||
raise ValueError(f"Unknown pipeline {repr(pipeline)}")
|
0
mcp_server/paddleocr_mcp/py.typed
Normal file
0
mcp_server/paddleocr_mcp/py.typed
Normal file
20
mcp_server/pyproject.toml
Normal file
20
mcp_server/pyproject.toml
Normal file
@ -0,0 +1,20 @@
|
||||
[build-system]
|
||||
requires = ["setuptools>=69"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "paddleocr_mcp"
|
||||
version = "0.1.0"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
"mcp>=1.5.0",
|
||||
"fastmcp>=2.0.0",
|
||||
"httpx>=0.24.0",
|
||||
"numpy>=1.24.0",
|
||||
"pillow>=9.0.0",
|
||||
"python-magic>=0.4.24",
|
||||
"typing-extensions>=4.0.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
paddleocr_mcp = "paddleocr_mcp.__main__:main"
|
@ -278,6 +278,7 @@ nav:
|
||||
- 端侧部署: version3.x/deployment/on_device_deployment.md
|
||||
- 服务化部署: version3.x/deployment/serving.md
|
||||
- 基于Python或C++预测引擎推理: version3.x/deployment/python_and_cpp_infer.md
|
||||
- MCP 服务器: version3.x/deployment/mcp_server.md
|
||||
- 模块列表:
|
||||
- 模块概述: version3.x/module_usage/module_overview.md
|
||||
- 文档图像方向分类模块: version3.x/module_usage/doc_img_orientation_classification.md
|
||||
|
Loading…
x
Reference in New Issue
Block a user