| Parameter | Description | Type | Default | ||
|---|---|---|---|---|---|
input |
Data to be predicted, required.
For example, the local path of an image file or PDF file: /root/data/img.jpg;Such as a URL link, for example, the network URL of an image file or PDF file:Example;Such as a local directory, which should contain the images to be predicted, for example, the local path: /root/data/(Currently, prediction for directories containing PDF files is not supported. PDF files need to be specified with a specific file path). |
str |
|||
save_path |
Specify the path where the inference result file will be saved. If not set, the inference results will not be saved locally. | str |
|||
layout_detection_model_name |
Name of the layout area detection and ranking model. If not set, the default model of the production line will be used. | str |
|||
layout_detection_model_dir |
Directory path of the layout area detection and ranking model. If not set, the official model will be downloaded. | str |
|||
layout_threshold |
Score threshold for the layout model. Any floating-point number between 0-1. If not set, the parameter value initialized by the production line will be used.float |
layout_nms |
layout_threshold |
Score threshold for the layout model. Any value between 0-1. If not set, the default value is used, which is 0.5.
|
|
Whether to use post-processing NMS for layout detection. If not set, the parameter value initialized by the production line will be used, with a default initialization of |
True .bool |
layout_unclip_ratio |
|||
Expansion coefficient for the detection boxes of the layout area detection model.
Any floating-point number greater than |
0 . If not set, the parameter value initialized by the production line will be used.float |
layout_merge_bboxes_mode |
|||
Merging mode for the detection boxes output by the model in layout detection. |
large
|
vl_rec_model_name |
|||
Name of the multimodal recognition model. If not set, the default model of the production line will be used. |
str | vl_rec_model_dir |
|||
Directory path of the multimodal recognition model. If not set, the official model will be downloaded. |
str | vl_rec_backend |
|||
Inference backend used by the multimodal recognition model. |
str | vl_rec_server_url |
|||
If the multimodal recognition model uses an inference service, this parameter is used to specify the server URL. |
str | vl_rec_max_concurrency |
|||
If the multimodal recognition model uses an inference service, this parameter is used to specify the maximum number of concurrent requests. |
str | doc_orientation_classify_model_name |
|||
Name of the document orientation classification model. If not set, the default model of the production line will be used. |
str | doc_orientation_classify_model_dir |
|||
Directory path of the document orientation classification model. If not set, the official model will be downloaded. |
str | doc_unwarping_model_name |
|||
Name of the text image rectification model. If not set, the default model of the production line will be used. |
str | doc_unwarping_model_dir |
|||
Directory path of the text image rectification model. If not set, the official model will be downloaded. |
str | use_doc_orientation_classify |
|||
Whether to load and use the document orientation classification module. If not set, the parameter value initialized by the production line will be used, with a default initialization of |
False .bool |
use_doc_unwarping |
|||
Whether to load and use the text image rectification module. If not set, the parameter value initialized by the production line will be used, with a default initialization of |
False .bool |
use_layout_detection |
|||
Whether to load and use the layout area detection and ranking module. If not set, the parameter value initialized by the production line will be used, with a default initialization of |
True .bool |
use_chart_recognition |
|||
Whether to load and use the chart parsing module. If not set, the parameter value initialized by the production line will be used, with a default initialization of |
False .bool |
bool |
|||
format_block_content |
Controls whether to format the content in block_contentas Markdown. If not set, the parameter value initialized by the production line will be used, which is initially set to Falseby default. |
bool |
|||
use_queues |
Used to control whether to enable internal queues. When set to True, data loading (such as rendering PDF pages as images), layout detection model processing, and VLM inference will be executed asynchronously in separate threads, with data passed through queues, thereby improving efficiency. This approach is particularly efficient for PDF documents with a large number of pages or directories containing a large number of images or PDF files. |
bool |
|||
prompt_label |
The prompt type setting for the VL model, which takes effect only when use_layout_detection=False. |
str |
|||
repetition_penalty |
The repetition penalty parameter used for VL model sampling. | float |
|||
temperature |
The temperature parameter used for VL model sampling. | float |
|||
top_p |
The top-p parameter used for VL model sampling. | float |
|||
min_pixels |
The minimum number of pixels allowed when the VL model preprocesses images. | int |
|||
max_pixels |
The maximum number of pixels allowed when the VL model preprocesses images. | int |
|||
device |
The device used for inference. Supports specifying specific card numbers:
|
str |
|||
enable_hpi |
Whether to enable high-performance inference. | bool |
False |
||
use_tensorrt |
Whether to enable the TensorRT subgraph engine of Paddle Inference. If the model does not support acceleration via TensorRT, acceleration will not be used even if this flag is set. For PaddlePaddle with CUDA 11.8, the compatible TensorRT version is 8.x (x>=6). It is recommended to install TensorRT 8.6.1.6. |
bool |
False |
||
precision |
Computational precision, such as fp32, fp16. | str |
fp32 |
||
enable_mkldnn |
Whether to enable MKL-DNN accelerated inference. If MKL-DNN is not available or the model does not support acceleration via MKL-DNN, acceleration will not be used even if this flag is set. | bool |
True |
||
mkldnn_cache_capacity |
MKL-DNN cache capacity. | int |
10 |
||
cpu_threads |
The number of threads used for inference on the CPU. | int |
8 |
||
paddlex_config |
The file path of the PaddleX production line configuration. | str |
{'res': {'input_path': 'paddleocr_vl_demo.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_chart_recognition': False, 'format_block_content': False}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 6, 'label': 'doc_title', 'score': 0.9636914134025574, 'coordinate': [np.float32(131.31366), np.float32(36.450516), np.float32(1384.522), np.float32(127.984665)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9281806349754333, 'coordinate': [np.float32(585.39465), np.float32(158.438), np.float32(930.2184), np.float32(182.57469)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9840355515480042, 'coordinate': [np.float32(9.023666), np.float32(200.86115), np.float32(361.41583), np.float32(343.8828)]}, {'cls_id': 14, 'label': 'image', 'score': 0.9871416091918945, 'coordinate': [np.float32(775.50574), np.float32(200.66502), np.float32(1503.3807), np.float32(684.9304)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9801855087280273, 'coordinate': [np.float32(9.532196), np.float32(344.90594), np.float32(361.4413), np.float32(440.8244)]}, {'cls_id': 17, 'label': 'paragraph_title', 'score': 0.9708921313285828, 'coordinate': [np.float32(28.040405), np.float32(455.87976), np.float32(341.7215), np.float32(520.7117)]}, {'cls_id': 24, 'label': 'vision_footnote', 'score': 0.9002962708473206, 'coordinate': [np.float32(809.0692), np.float32(703.70044), np.float32(1488.3016), np.float32(750.5238)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9825374484062195, 'coordinate': [np.float32(8.896561), np.float32(536.54895), np.float32(361.05237), np.float32(655.8058)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9822263717651367, 'coordinate': [np.float32(8.971573), np.float32(657.4949), np.float32(362.01715), np.float32(774.625)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9767460823059082, 'coordinate': [np.float32(9.407074), np.float32(776.5216), np.float32(361.31067), np.float32(846.82874)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9868153929710388, 'coordinate': [np.float32(8.669495), np.float32(848.2543), np.float32(361.64703), np.float32(1062.8568)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9826608300209045, 'coordinate': [np.float32(8.8025055), np.float32(1063.8615), np.float32(361.46588), np.float32(1182.8524)]}, {'cls_id': 22, 'label': 'text', 'score': 0.982555627822876, 'coordinate': [np.float32(8.820602), np.float32(1184.4663), np.float32(361.66394), np.float32(1302.4507)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9584776759147644, 'coordinate': [np.float32(9.170288), np.float32(1304.2161), np.float32(361.48898), np.float32(1351.7483)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9782056212425232, 'coordinate': [np.float32(389.1618), np.float32(200.38202), np.float32(742.7591), np.float32(295.65146)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9844875931739807, 'coordinate': [np.float32(388.73303), np.float32(297.18463), np.float32(744.00024), np.float32(441.3034)]}, {'cls_id': 17, 'label': 'paragraph_title', 'score': 0.9680547714233398, 'coordinate': [np.float32(409.39468), np.float32(455.89386), np.float32(721.7174), np.float32(520.9387)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9741666913032532, 'coordinate': [np.float32(389.71606), np.float32(536.8138), np.float32(742.7112), np.float32(608.00165)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9840384721755981, 'coordinate': [np.float32(389.30988), np.float32(609.39636), np.float32(743.09247), np.float32(750.3231)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9845995306968689, 'coordinate': [np.float32(389.13272), np.float32(751.7772), np.float32(743.058), np.float32(894.8815)]}, {'cls_id': 22, 'label': 'text', 'score': 0.984852135181427, 'coordinate': [np.float32(388.83267), np.float32(896.0371), np.float32(743.58215), np.float32(1038.7345)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9804865717887878, 'coordinate': [np.float32(389.08478), np.float32(1039.9119), np.float32(742.7585), np.float32(1134.4897)]}, {'cls_id': 22, 'label': 'text', 'score': 0.986461341381073, 'coordinate': [np.float32(388.52643), np.float32(1135.8137), np.float32(743.451), np.float32(1352.0085)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9869391918182373, 'coordinate': [np.float32(769.8341), np.float32(775.66235), np.float32(1124.9813), np.float32(1063.207)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9822869896888733, 'coordinate': [np.float32(770.30383), np.float32(1063.938), np.float32(1124.8295), np.float32(1184.2192)]}, {'cls_id': 17, 'label': 'paragraph_title', 'score': 0.9689218997955322, 'coordinate': [np.float32(791.3042), np.float32(1199.3169), np.float32(1104.4521), np.float32(1264.6985)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9713128209114075, 'coordinate': [np.float32(770.4253), np.float32(1279.6072), np.float32(1124.6917), np.float32(1351.8672)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9236552119255066, 'coordinate': [np.float32(1153.9058), np.float32(775.5814), np.float32(1334.0654), np.float32(798.1581)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9857938885688782, 'coordinate': [np.float32(1151.5197), np.float32(799.28015), np.float32(1506.3619), np.float32(991.1156)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9820687174797058, 'coordinate': [np.float32(1151.5686), np.float32(991.91095), np.float32(1506.6023), np.float32(1110.8875)]}, {'cls_id': 22, 'label': 'text', 'score': 0.9866049885749817, 'coordinate': [np.float32(1151.6919), np.float32(1112.1301), np.float32(1507.1611), np.float32(1351.9504)]}]}}}
| Parameter | Parameter Description | Parameter Type | Default Value |
|---|---|---|---|
layout_detection_model_name |
Name of the layout area detection and ranking model. If set to None, the default model of the production line will be used. |
str|None |
None |
layout_detection_model_dir |
Directory path of the layout area detection and ranking model. If set to None, the official model will be downloaded. |
str|None |
None |
layout_threshold |
Score threshold for the layout model.
|
, the parameter value initialized by the production line will be used. |
float|dict|None |
None |
layout_nms Whether to use post-processing NMS for layout detection. If set toNone |
, the parameter value initialized by the production line will be used. |
bool|None |
None |
layout_unclip_ratio
|
, the parameter value initialized by the production line will be used. |
float|Tuple[float,float]|dict|None |
None |
layout_merge_bboxes_mode
|
, the parameter value initialized by the production line will be used. |
str|dict|None |
None |
vl_rec_model_name Name of the multimodal recognition model. If set toNone |
, the default model of the production line will be used. |
str|None |
None |
vl_rec_model_dir Directory path of the multimodal recognition model. If set toNone |
, the official model will be downloaded. |
str|None |
None |
vl_rec_backend | Inference backend used by the multimodal recognition model. |
int|None |
None |
vl_rec_server_url | If the multimodal recognition model uses an inference service, this parameter is used to specify the server URL. |
str|None |
None |
vl_rec_max_concurrency | If the multimodal recognition model uses an inference service, this parameter is used to specify the maximum number of concurrent requests. |
str|None |
None |
doc_orientation_classify_model_name Name of the document orientation classification model. If set toNone |
, the default model of the production line will be used. |
str|None |
None |
doc_orientation_classify_model_dir Directory path of the document orientation classification model. If set toNone |
, the official model will be downloaded. |
str|None |
doc_unwarping_model_name |
Name of the text image rectification model. If set to None, the default model of the production line will be used. |
str|None |
None |
doc_unwarping_model_dir |
Directory path of the text image rectification model. If set to None, the official model will be downloaded. |
str|None |
None |
use_doc_orientation_classify |
Whether to load and use the document orientation classification module. If set to None, the parameter value initialized by the production line will be used, and it is initialized to False by default. |
bool|None |
None |
use_doc_unwarping |
Whether to load and use the text image rectification module. If set to None, the parameter value initialized by the production line will be used, and it is initialized to False by default. |
bool|None |
None |
use_layout_detection |
Whether to load and use the layout area detection and sorting module. If set to None, the parameter value initialized by the production line will be used, and it is initialized to True by default. |
bool|None |
None |
use_chart_recognition |
Whether to load and use the chart parsing module. If set to None, the parameter value initialized by the production line will be used, and it is initialized to False by default. |
bool|None |
None |
format_block_content |
Controls whether to format the content in block_content into Markdown format. If set to None, the parameter value initialized by the production line will be used, and it is initialized to False by default. |
bool|None |
None |
device |
Device used for inference. Supports specifying specific card numbers:
|
str|None |
None |
enable_hpi |
Whether to enable high-performance inference. | bool |
False |
use_tensorrt |
Whether to enable the TensorRT subgraph engine of Paddle Inference. If the model does not support acceleration via TensorRT, acceleration will not be used even if this flag is set. For PaddlePaddle with CUDA 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6. |
bool |
False |
precision |
Computational precision, such as fp32, fp16. | str |
"fp32" |
enable_mkldnn |
Whether to enable MKL-DNN accelerated inference. If MKL-DNN is not available or the model does not support acceleration via MKL-DNN, acceleration will not be used even if this flag is set. | bool |
True |
mkldnn_cache_capacity |
MKL-DNN cache capacity. | int |
10 |
cpu_threads |
Number of threads used for inference on the CPU. | int |
8 |
paddlex_config |
Path to the PaddleX production line configuration file. | str|None |
None |
predict()method of the PaddleOCR-VL production line object for inference prediction. This method will return a list of results. Additionally, the production line also provides the predict_iter()Method. The two are completely consistent in terms of parameter acceptance and result return. The difference lies in that predict_iter()returns a generator, which can process and obtain prediction results step by step. It is suitable for scenarios involving large datasets or where memory conservation is desired. You can choose either of these two methods based on actual needs. Below are the parameters of the predict()method and their descriptions:| Parameter | Parameter Description | Parameter Type | Default Value |
|---|---|---|---|
input |
Data to be predicted, supporting multiple input types. Required.
|
Python Var|str|list |
|
use_doc_orientation_classify |
Whether to use the document orientation classification module during inference. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
bool|None |
None |
use_doc_unwarping |
Whether to use the text image rectification module during inference. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
bool|None |
None |
use_layout_detection |
Whether to use the layout region detection and sorting module during inference. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
bool|None |
None |
use_chart_recognition |
Whether to use the chart parsing module during inference. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
bool|None |
None |
layout_threshold |
The parameter meaning is basically the same as the instantiation parameter. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
float|dict|None |
None |
layout_nms |
The parameter meaning is basically the same as the instantiation parameter. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
bool|None |
None |
layout_unclip_ratio |
The parameter meaning is basically the same as the instantiation parameter. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
float|Tuple[float,float]|dict|None |
None |
layout_merge_bboxes_mode |
The parameter meaning is basically the same as the instantiation parameter. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
str|dict|None |
None |
use_queues |
Used to control whether to enable internal queues. When set to True, data loading (such as rendering PDF pages as images), layout detection model processing, and VLM inference will be executed asynchronously in separate threads, with data passed through queues, thereby improving efficiency. This approach is particularly efficient for PDF documents with many pages or directories containing a large number of images or PDF files. |
bool|None |
None |
prompt_label |
The prompt type setting for the VL model, which takes effect only when use_layout_detection=False. |
str|None |
None |
format_block_content |
The parameter meaning is basically the same as the instantiation parameter. Setting it to Nonemeans using the instantiation parameter; otherwise, this parameter takes precedence. |
bool|None |
None |
repetition_penalty |
The repetition penalty parameter used for VL model sampling. | float|None |
None |
temperature |
Temperature parameter used for VL model sampling. | float|None |
None |
top_p |
Top-p parameter used for VL model sampling. | float|None |
None |
min_pixels |
The minimum number of pixels allowed when the VL model preprocesses images. | int|None |
None |
max_pixels |
The maximum number of pixels allowed when the VL model preprocesses images. | int|None |
None |
jsonfile:| Method | Method Description | Parameter | Parameter Type | Parameter Description | Default Value |
|---|---|---|---|---|---|
print() |
Print results to the terminal | format_json |
bool |
Whether to format the output content using JSONindentation. |
True |
indent |
int |
Specify the indentation level to beautify the output JSONdata, making it more readable. Only valid when format_jsonis True. |
4 | ||
ensure_ascii |
bool |
Control whether non- ASCIIcharacters are escaped as Unicode. When set to True, all non- ASCIIcharacters will be escaped; Falseretains the original characters. Only valid when format_jsonis True. |
False |
||
save_to_json() |
Save the results as a json format file | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file type naming. | None |
indent |
int |
Specify the indentation level to beautify the output JSONdata, making it more readable. Only valid when format_jsonis True. |
4 | ||
ensure_ascii |
bool |
Control whether non- ASCIIcharacters are escaped as Unicode. When set to True, all non- ASCIIcharacters will be escaped; Falseretains the original characters. Only valid when format_jsonis True. |
False |
||
save_to_img() |
Save the visualized images of each intermediate module in png format | save_path |
str |
The file path for saving, supporting directory or file paths. | None |
save_to_markdown() |
Save each page in an image or PDF file as a markdown format file separately | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file type naming | None |
pretty |
bool |
Whether to beautify the markdownoutput results, centering charts, etc., to make the markdownrendering more aesthetically pleasing. |
True | ||
show_formula_number |
bool |
Control whether to retain formula numbers in markdown. When set to True, all formula numbers are retained; Falseretains only the formulas |
False |
||
save_to_html() |
Save the tables in the file as html format files | save_path |
str |
The file path for saving, supporting directory or file paths. | None |
save_to_xlsx() |
Save the tables in the file as xlsx format files | save_path |
str |
The file path for saving, supporting directory or file paths. | None |
| Attribute | Attribute Description |
|---|---|
json |
Obtain the prediction jsonresult in the format |
img |
obtain in the format of dictvisualized image |
markdown |
obtain in the format of dictmarkdown result |