--- comments: true --- # Document Image Orientation Classification Module Tutorial ## 1. Overview The Document Image Orientation Classification Module is primarily designed to distinguish the orientation of document images and correct them through post-processing. During processes such as document scanning or ID photo capturing, the device might be rotated to achieve clearer images, resulting in images with various orientations. Standard OCR pipelines may not handle these images effectively. By leveraging image classification techniques, the orientation of documents or IDs containing text regions can be pre-determined and adjusted, thereby improving the accuracy of OCR processing. ## 2. Supported Models List
Model | Model Download Links | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (MB) | Description |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | Inference Model/Pretrained Model | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | A document image classification model based on PP-LCNet_x1_0, with four categories: 0°, 90°, 180°, and 270°. |
Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
---|---|---|---|
Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
High-Performance Mode | Optimal combination of precision type and acceleration strategy | FP32 Precision / 8 Threads | Optimal backend selected (Paddle/OpenVINO/TRT, etc.) |
Parameter | Description | Type | Default |
---|---|---|---|
model_name |
Model name | str |
PP-LCNet_x1_0_doc_ori |
model_dir |
Model storage path | str |
None |
device |
Device(s) to use for inference. Examples: cpu , gpu , npu , gpu:0 , gpu:0,1 .If multiple devices are specified, inference will be performed in parallel. Note that parallel inference is not always supported. By default, GPU 0 will be used if available; otherwise, the CPU will be used. |
str |
None |
enable_hpi |
Whether to use the high performance inference. | bool |
False |
use_tensorrt |
Whether to use the Paddle Inference TensorRT subgraph engine. For Paddle with CUDA version 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6. For Paddle with CUDA version 12.6, the compatible TensorRT version is 10.x (x>=5), and it is recommended to install TensorRT 10.5.0.18. | bool |
False |
min_subgraph_size |
Minimum subgraph size for TensorRT when using the Paddle Inference TensorRT subgraph engine. | int |
3 |
precision |
Precision for TensorRT when using the Paddle Inference TensorRT subgraph engine. Options: fp32 , fp16 , etc. |
str |
fp32 |
enable_mkldnn |
Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. | bool |
True |
cpu_threads |
Number of threads to use for inference on CPUs. | int |
10 |
top_k |
The top-k value for prediction results. If not specified, the default value in the official PaddleOCR model configuration is used. If the value is 5, the top 5 categories and their corresponding classification probabilities will be returned. | int |
None |
Parameter | Description | Type | Default |
---|---|---|---|
input |
Input data to be predicted. Required. Supports multiple input types:
|
Python Var|str|list |
|
batch_size |
Batch size, positive integer. | int |
1 |
top_k |
The top-k value for prediction results. If not specified, the value provided when the model was instantiated will be used; if it was not specified at instantiation either, the default value in the official PaddleOCR model configuration is used. | int |
None |
Method | Description | Parameter | Parameter Type | Description | Default Value |
---|---|---|---|---|---|
print() |
Print the result to the terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data and make it more readable. It is only valid when format_json is True . |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters as Unicode . When set to True , all non-ASCII characters will be escaped; when set to False , the original characters will be retained. It is only valid when format_json is True . |
False |
||
save_to_json() |
Save the result as a file in json format |
save_path |
str |
The file path to save. When it is a directory, the saved file name is consistent with the naming of the input file type. | None |
indent |
int |
Specify the indentation level to beautify the output JSON data and make it more readable. It is only valid when format_json is True . |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters as Unicode . When set to True , all non-ASCII characters will be escaped; when set to False , the original characters will be retained. It is only valid when format_json is True . |
False |
||
save_to_img() |
Save the result as a file in image format | save_path |
str |
The file path to save. When it is a directory, the saved file name is consistent with the naming of the input file type. | None |
Attribute | Description |
---|---|
json |
Get the prediction result in json format |
img |
Get the visualization image in dict format |