--- comments: true --- # Formula Recognition Module Tutorial ## I. Overview The formula recognition module is a key component of an OCR (Optical Character Recognition) system, responsible for converting mathematical formulas in images into editable text or computer-readable formats. The performance of this module directly affects the accuracy and efficiency of the entire OCR system. The formula recognition module typically outputs LaTeX or MathML code of the mathematical formulas, which will be passed as input to the text understanding module for further processing. ## II. Supported Model List
Model | Model Download Link | En-BLEU(%) | Zh-BLEU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | Introduction |
---|---|---|---|---|---|---|---|
UniMERNet | Inference Model/Training Model | 85.91 | 43.50 | 1311.84 / 1311.84 | - / 8288.07 | 1530 | UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. The model is trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, significantly improving the recognition accuracy of real-world formulas. | PP-FormulaNet-S | Inference Model/Training Model | 87.00 | 45.71 | 182.25 / 182.25 | - / 254.39 | 224 | PP-FormulaNet is an advanced formula recognition model developed by the Baidu PaddlePaddle Vision Team. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network. Through parallel masking and model distillation techniques, it significantly improves inference speed while maintaining high recognition accuracy, making it suitable for applications requiring fast inference. The PP-FormulaNet-L version, on the other hand, uses Vary_VIT_B as its backbone network and is trained on a large-scale formula dataset, showing significant improvements in recognizing complex formulas compared to PP-FormulaNet-S. | PP-FormulaNet-L | Inference Model/Training Model | 90.36 | 45.78 | 1482.03 / 1482.03 | - / 3131.54 | 695 | PP-FormulaNet_plus-S | Inference Model/Training Model | 88.71 | 53.32 | 179.20 / 179.20 | - / 260.99 | 248 | PP-FormulaNet_plus is an enhanced version of the formula recognition model developed by the Baidu PaddlePaddle Vision Team, building upon the original PP-FormulaNet. Compared to the original version, PP-FormulaNet_plus utilizes a more diverse formula dataset during training, including sources such as Chinese dissertations, professional books, textbooks, exam papers, and mathematics journals. This expansion significantly improves the model’s recognition capabilities. Among the models, PP-FormulaNet_plus-M and PP-FormulaNet_plus-L have added support for Chinese formulas and increased the maximum number of predicted tokens for formulas from 1,024 to 2,560, greatly enhancing the recognition performance for complex formulas. Meanwhile, the PP-FormulaNet_plus-S model focuses on improving the recognition of English formulas. With these improvements, the PP-FormulaNet_plus series models perform exceptionally well in handling complex and diverse formula recognition tasks. |
PP-FormulaNet_plus-M | Inference Model/Training Model | 91.45 | 89.76 | 1040.27 / 1040.27 | - / 1615.80 | 592 | |
PP-FormulaNet_plus-L | Inference Model/Training Model | 92.22 | 90.64 | 1476.07 / 1476.07 | - / 3125.58 | 698 | |
LaTeX_OCR_rec | Inference Model/Training Model | 74.55 | 39.96 | 1088.89 / 1088.89 | - / - | 99 | LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. It uses Hybrid ViT as the backbone network and a transformer as the decoder, significantly improving the accuracy of formula recognition. |
Mode | GPU Configuration | CPU Configuration | Acceleration Technique Combination |
---|---|---|---|
Standard Mode | FP32 precision / No TRT acceleration | FP32 precision / 8 threads | PaddleInference |
High-Performance Mode | Optimal combination of predefined precision type and acceleration strategy | FP32 precision / 8 threads | Optimal predefined backend (Paddle/OpenVINO/TRT, etc.) |
Parameter | Description | Type | Default |
---|---|---|---|
model_name |
Model name. If set to None , PP-FormulaNet_plus-M will be used. |
str|None |
None |
model_dir |
Model storage path. | str|None |
None |
device |
Device for inference. For example: "cpu" , "gpu" , "npu" , "gpu:0" , "gpu:0,1" .If multiple devices are specified, parallel inference will be performed. By default, GPU 0 is used if available; otherwise, CPU is used. |
str|None |
None |
enable_hpi |
Whether to enable high-performance inference. | bool |
False |
use_tensorrt |
Whether to use the Paddle Inference TensorRT subgraph engine. If the model does not support acceleration through TensorRT, setting this flag will not enable acceleration. For Paddle with CUDA version 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6. |
bool |
False |
precision |
Computation precision when using the TensorRT subgraph engine in Paddle Inference. Options: "fp32" , "fp16" . |
str |
"fp32" |
enable_mkldnn |
Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. | bool |
True |
mkldnn_cache_capacity |
MKL-DNN cache capacity. | int |
10 |
cpu_threads |
Number of threads to use for inference on CPUs. | int |
10 |
Parameter | Description | Type | Default |
---|---|---|---|
input |
Input data to be predicted. Required. Supports multiple input types:
|
Python Var|str|list |
|
batch_size |
Batch size, positive integer. | int |
1 |
Method | Description | Parameter | Type | Details | Default |
---|---|---|---|---|---|
print() |
Print the result to the terminal | format_json |
bool |
Whether to format the output using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the JSON output; only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Controls whether non-ASCII characters are escaped to Unicode . If set to True , all non-ASCII characters are escaped; if False , original characters are kept. Only effective when format_json is True |
False |
||
save_to_json() |
Save the result as a json-formatted file | save_path |
str |
Path to save the file. If it is a directory, the saved file name will match the input file type | None |
indent |
int |
Specify the indentation level to beautify the JSON output; only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Controls whether non-ASCII characters are escaped to Unicode . If set to True , all non-ASCII characters are escaped; if False , original characters are kept. Only effective when format_json is True |
False |
||
save_to_img() |
Save the result as an image file | save_path |
str |
Path to save the file. If it is a directory, the saved file name will match the input file type | None |
Attribute | Description |
---|---|
json |
Get the prediction result in json format |
img |
Get the visualized image in dict format |