mirror of https://github.com/PaddlePaddle/PaddleOCR.git synced 2025-06-26 21:24:27 +00:00

[Feat] Mcp draft version for ocrv5 and structurev3 (#15604 )

* Add MCP OCR server draft version

* update code review

* structure can return images

* refine code and code review

* fix images return logic

* refractor structure for abstract layer

* Fix bugs and enhance code

* Use string literal for output mode

* update images logic for service

* update readme and config example

* update readme and config example

* Fix bugs and add

* refine structure image logic, now can show positions in texts

* update readme file based on code review

* update readme file

* update readme file

* udpate readme

* udpate readme

* Polish doc

* add en readme

* Refactor docs and update installation guide

---------

Co-authored-by: Bobholamovic <mhlin425@whu.edu.cn>

2025-06-13 18:36:44 +08:00

10 KiB

Raw Blame History

PaddleOCR MCP Server

This project provides a lightweight Model Context Protocol (MCP) server designed to integrate the powerful capabilities of PaddleOCR into a compatible MCP Host.

Key Features

Currently Supported Pipelines
- OCR: Performs text detection and recognition on images and PDF files.
- PP-StructureV3: Recognizes and extracts text blocks, titles, paragraphs, images, tables, and other layout elements from an image or PDF file, converting the input into a Markdown document.
Supports the following working modes:
- Local: Runs the PaddleOCR pipeline directly on your machine using the installed Python library.
- AI Studio: Calls cloud services provided by the Paddle AI Studio community.
- Self-hosted: Calls a PaddleOCR service that you deploy yourself (serving).

1. Installation
2. Quick Start
3. Configuration
- 3.1. MCP Host Configuration
- 3.2. Working Modes Explained
4. Parameter Reference
5. Configuration Examples

1. Installation

# Install the wheel
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.1.0/paddleocr_mcp-0.1.0-py3-none-any.whl

# Or, install from source
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
# pip install -e mcp_server

Some working modes may require additional dependencies.

2. Quick Start

This section guides you through a quick setup using Claude Desktop as the MCP Host and the AI Studio mode. This mode is recommended for new users as it does not require complex local dependencies. Please refer to 3. Configuration for other working modes and more configuration options.

Prepare the AI Studio Service
- Visit the Paddle AI Studio community and log in.
- In the "PaddleX Pipeline" section under "More" on the left, navigate to [Create Pipeline] - [OCR] - [General OCR] - [Deploy Directly] - [Text Recognition Module, select PP-OCRv5_server_rec] - [Start Deployment].
- Once deployed, obtain your Service Base URL (e.g., https://xxxxxx.aistudio-hub.baidu.com).
- Get your Access Token from this page.
Locate the MCP Configuration File - For details, refer to the Official MCP Documentation.
- macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
- Windows: %APPDATA%\Claude\claude_desktop_config.json
- Linux: ~/.config/Claude/claude_desktop_config.json
Add MCP Server Configuration Open the claude_desktop_config.json file and add the configuration by referring to 5.1 AI Studio Service Configuration.

Note:
- Do not leak your Access Token.
- If paddleocr_mcp is not in your system's PATH, set command to the absolute path of the executable.
Restart the MCP Host Restart Claude Desktop. The new paddleocr-ocr tool should now be available in the application.

3. Configuration

3.1. MCP Host Configuration

In the Host's configuration file (e.g., claude_desktop_config.json), you need to define how to start the tool server. Key fields are:

command: paddleocr_mcp (if the executable is in your PATH) or an absolute path.
args: Configurable command-line arguments, e.g., ["--verbose"]. See 4. Parameter Reference.
env: Configurable environment variables. See 4. Parameter Reference.

3.2. Working Modes Explained

You can configure the MCP server to run in different modes based on your needs.

Mode 1: AI Studio Service (`aistudio`)

This mode calls services from the Paddle AI Studio community.

Use Case: Ideal for quickly trying out features, validating solutions, and for no-code development scenarios.
Procedure: Please refer to 2. Quick Start.
In addition to using the platform's preset model solutions, you can also train and deploy custom models on the platform.

Mode 2: Local Python Library (`local`)

This mode runs the model directly on your local machine and has certain requirements for the local environment and computer performance. It relies on the installed paddleocr inference package.

Use Case: Suitable for offline usage and scenarios with strict data privacy requirements.
Procedure:
1. Refer to the PaddleOCR Installation Guide to install the PaddlePaddle framework and PaddleOCR. It is strongly recommended to install them in a separate virtual environment to avoid dependency conflicts.
2. Refer to 5.2 Local Python Library Configuration to modify the claude_desktop_config.json file.

Mode 3: Self-hosted Service (`self_hosted`)

This mode calls a PaddleOCR inference service that you have deployed yourself. This corresponds to the Serving solutions provided by PaddleX.

Use Case: Offers the advantages of service-oriented deployment and high flexibility, making it well-suited for production environments, especially for scenarios requiring custom service configurations.
Procedure:
1. Refer to the PaddleOCR Installation Guide to install the PaddlePaddle framework and PaddleOCR.
2. Refer to the PaddleOCR Serving Deployment Guide to run the server.
3. Refer to 5.3 Self-hosted Service Configuration to modify the claude_desktop_config.json file.
4. Set your service address in PADDLEOCR_MCP_SERVER_URL (e.g., "http://127.0.0.1:8080").

4. Parameter Reference

You can control the server's behavior via environment variables or command-line arguments.

Environment Variable	Command-line Argument	Type	Description	Options	Default
`PADDLEOCR_MCP_PIPELINE`	`--pipeline`	`str`	The pipeline to run	`"OCR"`, `"PP-StructureV3"`	`"OCR"`
`PADDLEOCR_MCP_PPOCR_SOURCE`	`--ppocr_source`	`str`	The source of PaddleOCR capabilities	`"local"`, `"aistudio"`, `"self_hosted"`	`"local"`
`PADDLEOCR_MCP_SERVER_URL`	`--server_url`	`str`	Base URL of the underlying service (required for `aistudio` or `self_hosted` mode)	-	`None`
`PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN`	`--aistudio_access_token`	`str`	AI Studio authentication token (required for `aistudio` mode)	-	`None`
`PADDLEOCR_MCP_TIMEOUT`	`--timeout`	`int`	Request timeout for the underlying service (in seconds)	-	`30`
`PADDLEOCR_MCP_DEVICE`	`--device`	`str`	Specify the device for inference (only effective in `local` mode)	-	`None`
`PADDLEOCR_MCP_PIPELINE_CONFIG`	`--pipeline_config`	`str`	Path to the PaddleX pipeline configuration file (only effective in `local` mode)	-	`None`
-	`--http`	`bool`	Use HTTP transport instead of stdio (for remote deployment and multiple clients)	-	`False`
-	`--host`	`str`	Host address for HTTP mode	-	`"127.0.0.1"`
-	`--port`	`int`	Port for HTTP mode	-	`8080`
-	`--verbose`	`bool`	Enable verbose logging for debugging	-	`False`

5. Configuration Examples

Below are complete configuration examples for different working modes. You can copy and modify them as needed.

5.1 AI Studio Service Configuration

{
  "mcpServers": {
    "paddleocr-ocr": {
      "command": "paddleocr_mcp",
      "args": [],
      "env": {
        "PADDLEOCR_MCP_PIPELINE": "OCR",
        "PADDLEOCR_MCP_PPOCR_SOURCE": "aistudio",
        "PADDLEOCR_MCP_SERVER_URL": "<your-server-url>", 
        "PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN": "<your-access-token>"
      }
    }
  }
}

Note:

Replace <your-server-url> with your AI Studio Service Base URL, e.g., https://xxxxx.aistudio-hub.baidu.com. Do not include endpoint paths (like /ocr).
Replace <your-access-token> with your Access Token.

5.2 Local Python Library Configuration

{
  "mcpServers": {
    "paddleocr-ocr": {
      "command": "paddleocr_mcp",
      "args": [],
      "env": {
        "PADDLEOCR_MCP_PIPELINE": "OCR",
        "PADDLEOCR_MCP_PPOCR_SOURCE": "local"
      }
    }
  }
}

Note:

PADDLEOCR_MCP_PIPELINE_CONFIG is optional. If not set, the default pipeline configuration is used. To adjust settings, such as changing models, refer to the PaddleOCR and PaddleX documentation, export a pipeline configuration file, and set PADDLEOCR_MCP_PIPELINE_CONFIG to its absolute path.

5.3 Self-hosted Service Configuration

{
  "mcpServers": {
    "paddleocr-ocr": {
      "command": "paddleocr_mcp",
      "args": [],
      "env": {
        "PADDLEOCR_MCP_PIPELINE": "OCR",
        "PADDLEOCR_MCP_PPOCR_SOURCE": "self_hosted",
        "PADDLEOCR_MCP_SERVER_URL": "<your-server-url>"
      }
    }
  }
}

Note:

Replace <your-server-url> with the base URL of your underlying service (e.g., http://127.0.0.1:8080).

10 KiB Raw Blame History

PaddleOCR MCP Server

Key Features

Table of Contents

1. Installation

2. Quick Start

3. Configuration

3.1. MCP Host Configuration

3.2. Working Modes Explained

Mode 1: AI Studio Service (aistudio)

Mode 2: Local Python Library (local)

Mode 3: Self-hosted Service (self_hosted)

4. Parameter Reference

5. Configuration Examples

5.1 AI Studio Service Configuration

5.2 Local Python Library Configuration

5.3 Self-hosted Service Configuration

10 KiB

Raw Blame History

Mode 1: AI Studio Service (`aistudio`)

Mode 2: Local Python Library (`local`)

Mode 3: Self-hosted Service (`self_hosted`)