--- comments: true --- # Seal Text Detection Module Tutorial ## I. Overview The seal text detection module typically outputs multi-point bounding boxes around text regions, which are then passed as inputs to the distortion correction and text recognition modules for subsequent processing to identify the textual content of the seal. Recognizing seal text is an integral part of document processing and finds applications in various scenarios such as contract comparison, inventory access auditing, and invoice reimbursement verification. The seal text detection module serves as a subtask within OCR (Optical Character Recognition), responsible for locating and marking the regions containing seal text within an image. The performance of this module directly impacts the accuracy and efficiency of the entire seal text OCR system. ## II. Supported Model List
Model Name | Model Download Link | Hmean(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-OCRv4_server_seal_det | Inference Model/Training Model | 98.21 | 74.75 / 67.72 | 382.55 / 382.55 | 109 M | The server-side seal text detection model of PP-OCRv4 boasts higher accuracy and is suitable for deployment on better-equipped servers. |
PP-OCRv4_mobile_seal_det | Inference Model/Training Model | 96.47 | 7.82 / 3.09 | 48.28 / 23.97 | 4.6 M | The mobile-side seal text detection model of PP-OCRv4, on the other hand, offers greater efficiency and is suitable for deployment on end devices. |
Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
---|---|---|---|
Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |
Parameter | Description | Type | Default |
---|---|---|---|
model_name |
Model name. All supported seal text detection model names, such as PP-OCRv4_mobile_seal_det . |
str |
PP-OCRv4_mobile_seal_det |
model_dir |
Model storage path | str |
None |
device |
Device(s) to use for inference. Examples: cpu , gpu , npu , gpu:0 , gpu:0,1 .If multiple devices are specified, inference will be performed in parallel. Note that parallel inference is not always supported. By default, GPU 0 will be used if available; otherwise, the CPU will be used. |
str |
None |
enable_hpi |
Whether to use the high performance inference. | bool |
False |
use_tensorrt |
Whether to use the Paddle Inference TensorRT subgraph engine. For Paddle with CUDA version 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6. For Paddle with CUDA version 12.6, the compatible TensorRT version is 10.x (x>=5), and it is recommended to install TensorRT 10.5.0.18. | bool |
False |
min_subgraph_size |
Minimum subgraph size for TensorRT when using the Paddle Inference TensorRT subgraph engine. | int |
3 |
precision |
Precision for TensorRT when using the Paddle Inference TensorRT subgraph engine. Options: fp32 , fp16 , etc. |
str |
fp32 |
enable_mkldnn |
Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. | bool |
True |
cpu_threads |
Number of threads to use for inference on CPUs. | int |
10 |
limit_side_len |
Limit on the side length of the input image for detection. int specifies the value. If set to None , the default value from the official PaddleOCR model configuration will be used. |
int / None |
None |
limit_type |
Type of image side length limitation. "min" ensures the shortest side of the image is no less than det_limit_side_len ; "max" ensures the longest side is no greater than limit_side_len . If set to None , the default value from the official PaddleOCR model configuration will be used. |
str / None |
None |
thresh |
Pixel score threshold. Pixels in the output probability map with scores greater than this threshold are considered text pixels. Accepts any float value greater than 0. If set to None , the default value from the official PaddleOCR model configuration will be used. |
float / None |
None |
box_thresh |
If the average score of all pixels inside the bounding box is greater than this threshold, the result is considered a text region. Accepts any float value greater than 0. If set to None , the default value from the official PaddleOCR model configuration will be used. |
float / None |
None |
unclip_ratio |
Expansion ratio for the Vatti clipping algorithm, used to expand the text region. Accepts any float value greater than 0. If set to None , the default value from the official PaddleOCR model configuration will be used. |
float / None |
None |
input_shape |
Input image size for the model in the format (C, H, W) . If set to None , the model's default size will be used. |
tuple / None |
None |
Parameter | Description | Type | Default |
---|---|---|---|
input |
Input data to be predicted. Required. Supports multiple input types:
|
Python Var|str|list |
|
batch_size |
Batch size, positive integer. | int |
1 |
limit_side_len |
Limit on the side length of the input image for detection. int specifies the value. If set to None , the parameter value initialized by the model will be used by default. |
int / None |
None |
limit_type |
Type of image side length limitation. "min" ensures the shortest side of the image is no less than det_limit_side_len ; "max" ensures the longest side is no greater than limit_side_len . If set to None , the parameter value initialized by the model will be used by default. |
str / None |
None |
thresh |
Pixel score threshold. Pixels in the output probability map with scores greater than this threshold are considered text pixels. Accepts any float value greater than 0. If set to None , the parameter value initialized by the model will be used by default. |
float / None |
None |
box_thresh |
If the average score of all pixels inside the bounding box is greater than this threshold, the result is considered a text region. Accepts any float value greater than 0. If set to None , the parameter value initialized by the model will be used by default. |
float / None |
None |
unclip_ratio |
Expansion ratio for the Vatti clipping algorithm, used to expand the text region. Accepts any float value greater than 0. If set to None , the parameter value initialized by the model will be used by default. |
float / None |
None |
Method | Method Description | Parameter | Parameter Type | Parameter Description | Default Value |
---|---|---|---|---|---|
print() |
Print the result to the terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data, making it more readable. This is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode . When set to True , all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True |
False |
||
save_to_json() |
Save the result as a file in JSON format | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file name | None |
indent |
int |
Specify the indentation level to beautify the output JSON data, making it more readable. This is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode . When set to True , all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True |
False |
||
save_to_img() |
Save the result as a file in image format | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file name | None |
Attribute | Attribute Description |
---|---|
json |
Get the prediction result in json format |
img |
Get the visual image in dict format |