Latexocr paddle (#13401)

* commit_test * modified: configs/rec/rec_latex_ocr.yml deleted: ppocr/modeling/backbones/rec_resnetv2.py * ntuple_solve * style * style * style * style * style * style * style * style * style * delete comment * cla_email
2025-11-03 03:09:16 +00:00 · 2024-07-22 11:50:23 +08:00 · 2024-07-22 11:50:23 +08:00 · cf26f2330e
commit cf26f2330e
parent c556b9083e
34 changed files with 4442 additions and 1 deletions
--- a/configs/rec/rec_latex_ocr.yml
+++ b/configs/rec/rec_latex_ocr.yml
@ -0,0 +1,126 @@
 Global:
  use_gpu: True
  epoch_num: 500
  log_smooth_window: 20
  print_batch_step: 100
  save_model_dir: ./output/rec/latex_ocr/
  save_epoch_step: 5
  max_seq_len: 512
  # evaluation is run every 60000 iterations (22 epoch)(batch_size = 56)
  eval_batch_step: [0, 60000]
  cal_metric_during_train: True
  pretrained_model:
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/datasets/pme_demo/0000013.png
  infer_mode: False
  use_space_char: False
  rec_char_dict_path:  ppocr/utils/dict/latex_ocr_tokenizer.json
  save_res_path: ./output/rec/predicts_latexocr.txt
 Optimizer:
  name: AdamW
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Const
    learning_rate: 0.0001
 Architecture:
  model_type: rec
  algorithm: LaTeXOCR
  in_channels: 1
  Transform:
  Backbone:
    name: HybridTransformer
    img_size: [192, 672]
    patch_size: 16
    num_classes: 0
    embed_dim: 256
    depth: 4
    num_heads: 8
    input_channel: 1
    is_predict: False
    is_export: False
  Head:
    name: LaTeXOCRHead
    pad_value: 0
    is_export: False
    decoder_args:
      attn_on_attn: True
      cross_attend: True
      ff_glu: True
      rel_pos_bias: False
      use_scalenorm: False
 Loss:
  name: LaTeXOCRLoss
 PostProcess:
  name: LaTeXOCRDecode
  rec_char_dict_path: ppocr/utils/dict/latex_ocr_tokenizer.json
 Metric:
  name: LaTeXOCRMetric
  main_indicator:  exp_rate
  cal_blue_score: False
 Train:
  dataset:
    name: LaTeXOCRDataSet
    data: ./train_data/LaTeXOCR/latexocr_train.pkl
    min_dimensions: [32, 32]
    max_dimensions: [672, 192]
    batch_size_per_pair: 56
    keep_smaller_batches: False
    transforms:
      - DecodeImage:
          channel_first: False
      - MinMaxResize:
          min_dimensions: [32, 32]
          max_dimensions: [672, 192]        
      - LatexTrainTransform:
          bitmap_prob: .04
      - NormalizeImage:
          mean: [0.7931, 0.7931, 0.7931]
          std: [0.1738, 0.1738, 0.1738]
          order: 'hwc'
      - LatexImageFormat:
      - KeepKeys:
          keep_keys: ['image']
  loader:
    shuffle: True
    batch_size_per_card: 1
    drop_last: False
    num_workers: 0
    collate_fn: LaTeXOCRCollator
 Eval:
  dataset:
    name: LaTeXOCRDataSet
    data: ./train_data/LaTeXOCR/latexocr_val.pkl
    min_dimensions: [32, 32]
    max_dimensions: [672, 192]
    batch_size_per_pair: 10
    keep_smaller_batches: True
    transforms:
      - DecodeImage:
          channel_first: False
      - MinMaxResize:
          min_dimensions: [32, 32]
          max_dimensions: [672, 192]  
      - LatexTestTransform:
      - NormalizeImage:
          mean: [0.7931, 0.7931, 0.7931]
          std: [0.1738, 0.1738, 0.1738]
          order: 'hwc'
      - LatexImageFormat:
      - KeepKeys:
          keep_keys: ['image']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1
    num_workers: 0
    collate_fn: LaTeXOCRCollator
--- a/doc/datasets/pme_demo/0000013.png
+++ b/doc/datasets/pme_demo/0000013.png
--- a/doc/datasets/pme_demo/0000295.png
+++ b/doc/datasets/pme_demo/0000295.png
--- a/doc/datasets/pme_demo/0000562.png
+++ b/doc/datasets/pme_demo/0000562.png
--- a/doc/doc_ch/algorithm_overview.md
+++ b/doc/doc_ch/algorithm_overview.md
@ -137,6 +137,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型，**欢迎广
 已支持的公式识别算法列表（戳链接获取使用教程）：
 - [x]  [CAN](./algorithm_rec_can.md)
 - [x]  [LaTeX-OCR](./algorithm_rec_latex_ocr.md)
 在CROHME手写公式数据集上，算法效果如下：
@ -144,6 +145,13 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型，**欢迎广
 | ----- | ----- | ----- | ----- | ----- |
 |CAN|DenseNet|[rec_d28_can.yml](../../configs/rec/rec_d28_can.yml)|51.72%|[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_d28_can_train.tar)|
 在LaTeX-OCR印刷公式数据集上，算法效果如下：
 | 模型        | 骨干网络       |配置文件 | BLEU score  | normed edit distance  |  ExpRate  |下载链接|
 |-----------|------------| ----- |:-----------:|:---------------------:|:---------:| ----- |
 | LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
 <a name="2"></a>
 ## 2. 端到端算法
--- a/doc/doc_ch/algorithm_rec_latex_ocr.md
+++ b/doc/doc_ch/algorithm_rec_latex_ocr.md
@ -0,0 +1,171 @@
 # 印刷数学公式识别算法-LaTeX-OCR
 - [1. 算法简介](#1)
 - [2. 环境配置](#2)
 - [3. 模型训练、评估、预测](#3)
    - [3.1 pickle 标签文件生成](#3-1)
    - [3.2 训练](#3-2)
    - [3.3 评估](#3-3)
    - [3.4 预测](#3-4)
 - [4. 推理部署](#4)
    - [4.1 Python推理](#4-1)
    - [4.2 C++推理](#4-2)
    - [4.3 Serving服务化部署](#4-3)
    - [4.4 更多推理部署](#4-4)
 - [5. FAQ](#5)
 <a name="1"></a>
 ## 1. 算法简介
 原始项目：
 > [https://github.com/lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)
 <a name="model"></a>
 `LaTeX-OCR`使用[`LaTeX-OCR印刷公式数据集`](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)进行训练，在对应测试集上的精度如下：
 | 模型        | 骨干网络       |配置文件 | BLEU score  | normed edit distance  |  ExpRate  |下载链接|
 |-----------|------------| ----- |:-----------:|:---------------------:|:---------:| ----- |
 | LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
 <a name="2"></a>
 ## 2. 环境配置
 请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境，参考[《项目克隆》](./clone.md)克隆项目代码。
 <a name="3"></a>
 ## 3. 模型训练、评估、预测
 <a name="3-1"></a>
 ### 3.1 pickle 标签文件生成
 从[谷歌云盘](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)中下载 formulae.zip 和 math.txt，之后，使用如下命令，生成 pickle 标签文件。
 ```shell
 # 创建 LaTeX-OCR 数据集目录
 mkdir -p train_data/LaTeXOCR
 # 解压formulae.zip ，并拷贝math.txt
 unzip -d train_data/LaTeXOCR path/formulae.zip
 cp path/math.txt train_data/LaTeXOCR
 # 将原始的 .txt 文件转换为 .pkl 文件，从而对不同尺度的图像进行分组
 # 训练集转换
 python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/train --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
 # 验证集转换
 python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/val --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
 # 测试集转换
 python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/test --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
 ```
 ### 3.2 模型训练
 请参考[文本识别训练教程](./recognition.md)。PaddleOCR对代码进行了模块化，训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](../../configs/rec/rec_latex_ocr.yml)。
 #### 启动训练
 具体地，在完成数据准备后，便可以启动训练，训练命令如下：
 ```shell
 #单卡训练 (默认训练方式)
 python3 tools/train.py -c configs/rec/rec_latex_ocr.yml
 #多卡训练，通过--gpus参数指定卡号
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_latex_ocr.yml
 ```
 **注意：**
 - 默认每训练22个epoch（60000次iteration）进行1次评估，若您更改训练的batch_size，或更换数据集，请在训练时作出如下修改
 ```
 python3 tools/train.py -c configs/rec/rec_latex_ocr.yml -o Global.eval_batch_step=[0,{length_of_dataset//batch_size*22}]
 ```
 <a name="3-2"></a>
 ### 3.3 评估
 可下载已训练完成的[模型文件](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)，使用如下命令进行评估：
 ```shell
 # 注意将pretrained_model的路径设置为本地路径。若使用自行训练保存的模型，请注意修改路径和文件名为{path/to/weights}/{model_name}。
 # 验证集评估
 python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True
 # 测试集评估
 python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True Eval.dataset.data=./train_data/LaTeXOCR/latexocr_test.pkl
 ```
 <a name="3-3"></a>
 ### 3.4 预测
 使用如下命令进行单张图片预测：
 ```shell
 # 注意将pretrained_model的路径设置为本地路径。
 python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml  -o  Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True Global.infer_img='./doc/datasets/pme_demo/0000013.png' Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
 # 预测文件夹下所有图像时，可修改infer_img为文件夹，如 Global.infer_img='./doc/datasets/pme_demo/'。
 ```
 <a name="4"></a>
 ## 4. 推理部署
 <a name="4-1"></a>
 ### 4.1 Python推理
 首先将训练得到best模型，转换成inference model。这里以训练完成的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar) )，可以使用如下命令进行转换：
 ```shell
 # 注意将pretrained_model的路径设置为本地路径。
 python3 tools/export_model.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Global.save_inference_dir=./inference/rec_latex_ocr_infer/ Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True
 # 目前的静态图模型支持的最大输出长度为512
 ```
 **注意：**
 - 如果您是在自己的数据集上训练的模型，并且调整了字典文件，请检查配置文件中的`rec_char_dict_path`是否为所需要的字典文件。
 - [转换后模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_infer.tar)
 转换成功后，在目录下有三个文件：
 ```
 /inference/rec_latex_ocr_infer/
    ├── inference.pdiparams         # 识别inference模型的参数文件
    ├── inference.pdiparams.info    # 识别inference模型的参数信息，可忽略
    └── inference.pdmodel           # 识别inference模型的program文件
 ```
 执行如下命令进行模型推理：
 ```shell
 python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.png' --rec_algorithm="LaTeXOCR" --rec_batch_num=1 --rec_model_dir="./inference/rec_latex_ocr_infer/"  --rec_char_dict_path="./ppocr/utils/dict/latex_ocr_tokenizer.json"
 # 预测文件夹下所有图像时，可修改image_dir为文件夹，如 --image_dir='./doc/datasets/pme_demo/'。
 ```
 &nbsp;
 ![测试图片样例](../datasets/pme_demo/0000295.png)
 执行命令后，上面图像的预测结果（识别的文本）会打印到屏幕上，示例如下：
 ```shell
 Predicts of ./doc/datasets/pme_demo/0000295.png:\zeta_{0}(\nu)=-{\frac{\nu\varrho^{-2\nu}}{\pi}}\int_{\mu}^{\infty}d\omega\int_{C_{+}}d z{\frac{2z^{2}}{(z^{2}+\omega^{2})^{\nu+1}}}{\tilde{\Psi}}(\omega;z)e^{i\epsilon z}~~~,
 ```
 **注意**：
 - 需要注意预测图像为**白底黑字**，即手写公式部分为黑色，背景为白色的图片。
 - 在推理时需要设置参数`rec_char_dict_path`指定字典，如果您修改了字典，请修改该参数为您的字典文件。
 - 如果您修改了预处理方法，需修改`tools/infer/predict_rec.py`中 LaTeX-OCR 的预处理为您的预处理方法。
 <a name="4-2"></a>
 ### 4.2 C++推理部署
 由于C++预处理后处理还未支持 LaTeX-OCR，所以暂未支持
 <a name="4-3"></a>
 ### 4.3 Serving服务化部署
 暂不支持
 <a name="4-4"></a>
 ### 4.4 更多推理部署
 暂不支持
 <a name="5"></a>
 ## 5. FAQ
 1. LaTeX-OCR 数据集来自于[LaTeXOCR源repo](https://github.com/lukas-blecher/LaTeX-OCR) 。
--- a/doc/doc_en/algorithm_overview_en.md
+++ b/doc/doc_en/algorithm_overview_en.md
@ -137,6 +137,8 @@ On the TextZoom public dataset, the effect of the algorithm is as follows:
 Supported formula recognition algorithms (Click the link to get the tutorial):
 - [x]  [CAN](./algorithm_rec_can_en.md)
 - [x]  [LaTeX-OCR](./algorithm_rec_latex_ocr_en.md)
 On the CROHME handwritten formula dataset, the effect of the algorithm is as follows:
@ -145,6 +147,13 @@ On the CROHME handwritten formula dataset, the effect of the algorithm is as fol
 |CAN|DenseNet|[rec_d28_can.yml](../../configs/rec/rec_d28_can.yml)|51.72%|[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_d28_can_train.tar)|
 On the LaTeX-OCR printed formula dataset, the effect of the algorithm is as follows:
 | Model       | Backbone |config| BLEU score  | normed edit distance  |  ExpRate  |Download link|
 |-----------|----------| ---- |:-----------:|:---------------------:|:---------:| ----- |
 | LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
 <a name="2"></a>
 ## 2. End-to-end OCR Algorithms
--- a/doc/doc_en/algorithm_rec_latex_ocr_en.md
+++ b/doc/doc_en/algorithm_rec_latex_ocr_en.md
@ -0,0 +1,127 @@
 # LaTeX-OCR
 - [1. Introduction](#1)
 - [2. Environment](#2)
 - [3. Model Training / Evaluation / Prediction](#3)
    - [3.1 Pickle File Generation](#3-1)
    - [3.2 Training](#3-2)
    - [3.3 Evaluation](#3-3)
    - [3.4 Prediction](#3-4)
 - [4. Inference and Deployment](#4)
    - [4.1 Python Inference](#4-1)
    - [4.2 C++ Inference](#4-2)
    - [4.3 Serving](#4-3)
    - [4.4 More](#4-4)
 - [5. FAQ](#5)
 <a name="1"></a>
 ## 1. Introduction
 Original Project:
 > [https://github.com/lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)
 Using LaTeX-OCR printed mathematical expression recognition datasets for training, and evaluating on its test sets, the algorithm reproduction effect is as follows:
 | Model       | Backbone |config| BLEU score  | normed edit distance  |  ExpRate  |Download link|
 |-----------|----------| ---- |:-----------:|:---------------------:|:---------:| ----- |
 | LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
 <a name="2"></a>
 ## 2. Environment
 Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
 <a name="3"></a>
 ## 3. Model Training / Evaluation / Prediction
 Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
 Pickle File Generation:
 Download formulae.zip and math.txt in [Google Drive](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO), and then use the following command to generate the pickle file.
 ```shell
 # Create a LaTeX-OCR dataset directory
 mkdir -p train_data/LaTeXOCR
 # Unzip formulae.zip and copy math.txt
 unzip -d train_data/LaTeXOCR path/formulae.zip
 cp path/math.txt train_data/LaTeXOCR
 # Convert the original .txt file to a .pkl file to group images of different scales
 # Training set conversion
 python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/train --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
 # Validation set conversion
 python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/val --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
 # Test set conversion
 python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/test --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
 ```
 Training:
 Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
 ```
 #Single GPU training (Default training method)
 python3 tools/train.py -c configs/rec/rec_latex_ocr.yml
 #Multi GPU training, specify the gpu number through the --gpus parameter
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_latex_ocr.yml
 ```
 Evaluation:
 ```
 # GPU evaluation
 # Validation set evaluation
 python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True
 # Test set evaluation
 python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True Eval.dataset.data=./train_data/LaTeXOCR/latexocr_test.pkl
 ```
 Prediction:
 ```
 # The configuration file used for prediction must match the training
 python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml  -o  Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True Global.infer_img='./doc/datasets/pme_demo/0000013.png' Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
 ```
 <a name="4"></a>
 ## 4. Inference and Deployment
 <a name="4-1"></a>
 ### 4.1 Python Inference
 First, the model saved during the LaTeX-OCR printed mathematical expression recognition training process is converted into an inference model. you can use the following command to convert:
 ```
 python3 tools/export_model.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Global.save_inference_dir=./inference/rec_latex_ocr_infer/ Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True
 # The default output max length of the model is 512.
 ```
 For LaTeX-OCR printed mathematical expression recognition model inference, the following commands can be executed:
 ```
 python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.png' --rec_algorithm="LaTeXOCR" --rec_batch_num=1 --rec_model_dir="./inference/rec_latex_ocr_infer/"  --rec_char_dict_path="./ppocr/utils/dict/latex_ocr_tokenizer.json"
 ```
 <a name="4-2"></a>
 ### 4.2 C++ Inference
 Not supported
 <a name="4-3"></a>
 ### 4.3 Serving
 Not supported
 <a name="4-4"></a>
 ### 4.4 More
 Not supported
 <a name="5"></a>
 ## 5. FAQ
 ```
--- a/ppocr/data/init.py
+++ b/ppocr/data/init.py
@ -38,6 +38,7 @@ from ppocr.data.lmdb_dataset import LMDBDataSet, LMDBDataSetSR, LMDBDataSetTable
 from ppocr.data.pgnet_dataset import PGDataSet
 from ppocr.data.pubtab_dataset import PubTabDataSet
 from ppocr.data.multi_scale_sampler import MultiScaleSampler
 from ppocr.data.latexocr_dataset import LaTeXOCRDataSet
 # for PaddleX dataset_type
 TextDetDataset = SimpleDataSet
@ -45,6 +46,7 @@ TextRecDataset = SimpleDataSet
 MSTextRecDataset = MultiScaleDataSet
 PubTabTableRecDataset = PubTabDataSet
 KieDataset = SimpleDataSet
 LaTeXOCRDataSet = LaTeXOCRDataSet
 __all__ = ["build_dataloader", "transform", "create_operators", "set_signal_handlers"]
@ -94,6 +96,7 @@ def build_dataloader(config, mode, device, logger, seed=None):
        "MSTextRecDataset",
        "PubTabTableRecDataset",
        "KieDataset",
        "LaTeXOCRDataSet",
    ]
    module_name = config[mode]["dataset"]["name"]
    assert module_name in support_dict, Exception(
--- a/ppocr/data/collate_fn.py
+++ b/ppocr/data/collate_fn.py
@ -116,3 +116,18 @@ class DyMaskCollator(object):
            label_masks[i][:l] = 1
        return images, image_masks, labels, label_masks
 class LaTeXOCRCollator(object):
    """
    batch: [
        image [batch_size, channel, maxHinbatch, maxWinbatch]
        label [batch_size, maxLabelLen]
        label_mask [batch_size, maxLabelLen]
        ...
    ]
    """
    def __call__(self, batch):
        images, labels, attention_mask = batch[0]
        return images, labels, attention_mask
--- a/ppocr/data/imaug/init.py
+++ b/ppocr/data/imaug/init.py
@ -61,6 +61,7 @@ from .fce_aug import *
 from .fce_targets import FCENetTargets
 from .ct_process import *
 from .drrg_targets import DRRGTargets
 from .latex_ocr_aug import *
 def transform(data, ops=None):
--- a/ppocr/data/imaug/label_ops.py
+++ b/ppocr/data/imaug/label_ops.py
@ -25,6 +25,8 @@ import json
 import copy
 import random
 from random import sample
 from collections import defaultdict
 from tokenizers import Tokenizer as TokenizerFast
 from ppocr.utils.logging import get_logger
 from ppocr.data.imaug.vqa.augment import order_by_tbyx
@ -1770,3 +1772,106 @@ class CPPDLabelEncode(BaseRecLabelEncode):
        if len(text_list) == 0:
            return None, None, None
        return text_list, text_node_index, text_node_num
 class LatexOCRLabelEncode(object):
    def __init__(
        self,
        rec_char_dict_path,
        **kwargs,
    ):
        self.tokenizer = TokenizerFast.from_file(rec_char_dict_path)
        self.model_input_names = ["input_ids", "token_type_ids", "attention_mask"]
        self.pad_token_id = 0
        self.bos_token_id = 1
        self.eos_token_id = 2
    def _convert_encoding(
        self,
        encoding,
        return_token_type_ids=None,
        return_attention_mask=None,
        return_overflowing_tokens=False,
        return_special_tokens_mask=False,
        return_offsets_mapping=False,
        return_length=False,
        verbose=True,
    ):
        if return_token_type_ids is None:
            return_token_type_ids = "token_type_ids" in self.model_input_names
        if return_attention_mask is None:
            return_attention_mask = "attention_mask" in self.model_input_names
        if return_overflowing_tokens and encoding.overflowing is not None:
            encodings = [encoding] + encoding.overflowing
        else:
            encodings = [encoding]
        encoding_dict = defaultdict(list)
        for e in encodings:
            encoding_dict["input_ids"].append(e.ids)
            if return_token_type_ids:
                encoding_dict["token_type_ids"].append(e.type_ids)
            if return_attention_mask:
                encoding_dict["attention_mask"].append(e.attention_mask)
            if return_special_tokens_mask:
                encoding_dict["special_tokens_mask"].append(e.special_tokens_mask)
            if return_offsets_mapping:
                encoding_dict["offset_mapping"].append(e.offsets)
            if return_length:
                encoding_dict["length"].append(len(e.ids))
        return encoding_dict, encodings
    def encode(
        self,
        text,
        text_pair=None,
        return_token_type_ids=False,
        add_special_tokens=True,
        is_split_into_words=False,
    ):
        batched_input = text
        encodings = self.tokenizer.encode_batch(
            batched_input,
            add_special_tokens=add_special_tokens,
            is_pretokenized=is_split_into_words,
        )
        tokens_and_encodings = [
            self._convert_encoding(
                encoding=encoding,
                return_token_type_ids=False,
                return_attention_mask=None,
                return_overflowing_tokens=False,
                return_special_tokens_mask=False,
                return_offsets_mapping=False,
                return_length=False,
                verbose=True,
            )
            for encoding in encodings
        ]
        sanitized_tokens = {}
        for key in tokens_and_encodings[0][0].keys():
            stack = [e for item, _ in tokens_and_encodings for e in item[key]]
            sanitized_tokens[key] = stack
        return sanitized_tokens
    def __call__(self, eqs):
        topk = self.encode(eqs)
        for k, p in zip(topk, [[self.bos_token_id, self.eos_token_id], [1, 1]]):
            process_seq = [[p[0]] + x + [p[1]] for x in topk[k]]
            max_length = 0
            for seq in process_seq:
                max_length = max(max_length, len(seq))
            labels = np.zeros((len(process_seq), max_length), dtype="int64")
            for idx, seq in enumerate(process_seq):
                l = len(seq)
                labels[idx][:l] = seq
            topk[k] = labels
        return (
            np.array(topk["input_ids"]).astype(np.int64),
            np.array(topk["attention_mask"]).astype(np.int64),
            max_length,
        )
--- a/ppocr/data/imaug/latex_ocr_aug.py
+++ b/ppocr/data/imaug/latex_ocr_aug.py
@ -0,0 +1,179 @@
 # copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 This code is refer from:
 https://github.com/lukas-blecher/LaTeX-OCR/blob/main/pix2tex/dataset/transforms.py
 """
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals
 import math
 import cv2
 import numpy as np
 import albumentations as A
 from PIL import Image
 class LatexTrainTransform:
    def __init__(self, bitmap_prob=0.04, **kwargs):
        # your init code
        self.bitmap_prob = bitmap_prob
        self.train_transform = A.Compose(
            [
                A.Compose(
                    [
                        A.ShiftScaleRotate(
                            shift_limit=0,
                            scale_limit=(-0.15, 0),
                            rotate_limit=1,
                            border_mode=0,
                            interpolation=3,
                            value=[255, 255, 255],
                            p=1,
                        ),
                        A.GridDistortion(
                            distort_limit=0.1,
                            border_mode=0,
                            interpolation=3,
                            value=[255, 255, 255],
                            p=0.5,
                        ),
                    ],
                    p=0.15,
                ),
                A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15, p=0.3),
                A.GaussNoise(10, p=0.2),
                A.RandomBrightnessContrast(0.05, (-0.2, 0), True, p=0.2),
                A.ImageCompression(95, p=0.3),
                A.ToGray(always_apply=True),
            ]
        )
    def __call__(self, data):
        img = data["image"]
        if np.random.random() < self.bitmap_prob:
            img[img != 255] = 0
        img = self.train_transform(image=img)["image"]
        data["image"] = img
        return data
 class LatexTestTransform:
    def __init__(self, **kwargs):
        # your init code
        self.test_transform = A.Compose(
            [
                A.ToGray(always_apply=True),
            ]
        )
    def __call__(self, data):
        img = data["image"]
        img = self.test_transform(image=img)["image"]
        data["image"] = img
        return data
 class MinMaxResize:
    def __init__(self, min_dimensions=[32, 32], max_dimensions=[672, 192], **kwargs):
        # your init code
        self.min_dimensions = min_dimensions
        self.max_dimensions = max_dimensions
        # pass
    def pad_(self, img, divable=32):
        threshold = 128
        data = np.array(img.convert("LA"))
        if data[..., -1].var() == 0:
            data = (data[..., 0]).astype(np.uint8)
        else:
            data = (255 - data[..., -1]).astype(np.uint8)
        data = (data - data.min()) / (data.max() - data.min()) * 255
        if data.mean() > threshold:
            # To invert the text to white
            gray = 255 * (data < threshold).astype(np.uint8)
        else:
            gray = 255 * (data > threshold).astype(np.uint8)
            data = 255 - data
        coords = cv2.findNonZero(gray)  # Find all non-zero points (text)
        a, b, w, h = cv2.boundingRect(coords)  # Find minimum spanning bounding box
        rect = data[b : b + h, a : a + w]
        im = Image.fromarray(rect).convert("L")
        dims = []
        for x in [w, h]:
            div, mod = divmod(x, divable)
            dims.append(divable * (div + (1 if mod > 0 else 0)))
        padded = Image.new("L", dims, 255)
        padded.paste(im, (0, 0, im.size[0], im.size[1]))
        return padded
    def minmax_size_(self, img, max_dimensions, min_dimensions):
        if max_dimensions is not None:
            ratios = [a / b for a, b in zip(img.size, max_dimensions)]
            if any([r > 1 for r in ratios]):
                size = np.array(img.size) // max(ratios)
                img = img.resize(tuple(size.astype(int)), Image.BILINEAR)
        if min_dimensions is not None:
            # hypothesis: there is a dim in img smaller than min_dimensions, and return a proper dim >= min_dimensions
            padded_size = [
                max(img_dim, min_dim)
                for img_dim, min_dim in zip(img.size, min_dimensions)
            ]
            if padded_size != list(img.size):  # assert hypothesis
                padded_im = Image.new("L", padded_size, 255)
                padded_im.paste(img, img.getbbox())
                img = padded_im
        return img
    def __call__(self, data):
        img = data["image"]
        h, w = img.shape[:2]
        if (
            self.min_dimensions[0] <= w <= self.max_dimensions[0]
            and self.min_dimensions[1] <= h <= self.max_dimensions[1]
        ):
            return data
        else:
            im = Image.fromarray(np.uint8(img))
            im = self.minmax_size_(
                self.pad_(im), self.max_dimensions, self.min_dimensions
            )
            im = np.array(im)
            im = np.dstack((im, im, im))
            data["image"] = im
            return data
 class LatexImageFormat:
    def __init__(self, **kwargs):
        # your init code
        pass
    def __call__(self, data):
        img = data["image"]
        im_h, im_w = img.shape[:2]
        divide_h = math.ceil(im_h / 16) * 16
        divide_w = math.ceil(im_w / 16) * 16
        img = img[:, :, 0]
        img = np.pad(
            img, ((0, divide_h - im_h), (0, divide_w - im_w)), constant_values=(1, 1)
        )
        img_expanded = img[:, :, np.newaxis].transpose(2, 0, 1)
        data["image"] = img_expanded
        return data
--- a/ppocr/data/latexocr_dataset.py
+++ b/ppocr/data/latexocr_dataset.py
@ -0,0 +1,172 @@
 # copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 This code is refer from:
 https://github.com/lukas-blecher/LaTeX-OCR/blob/main/pix2tex/dataset/dataset.py
 """
 import numpy as np
 import cv2
 import math
 import os
 import json
 import pickle
 import random
 import traceback
 import paddle
 from paddle.io import Dataset
 from .imaug.label_ops import LatexOCRLabelEncode
 from .imaug import transform, create_operators
 class LaTeXOCRDataSet(Dataset):
    def __init__(self, config, mode, logger, seed=None):
        super(LaTeXOCRDataSet, self).__init__()
        self.logger = logger
        self.mode = mode.lower()
        global_config = config["Global"]
        dataset_config = config[mode]["dataset"]
        loader_config = config[mode]["loader"]
        pkl_path = dataset_config.pop("data")
        self.min_dimensions = dataset_config.pop("min_dimensions")
        self.max_dimensions = dataset_config.pop("max_dimensions")
        self.batchsize = dataset_config.pop("batch_size_per_pair")
        self.keep_smaller_batches = dataset_config.pop("keep_smaller_batches")
        self.max_seq_len = global_config.pop("max_seq_len")
        self.rec_char_dict_path = global_config.pop("rec_char_dict_path")
        self.tokenizer = LatexOCRLabelEncode(self.rec_char_dict_path)
        file = open(pkl_path, "rb")
        data = pickle.load(file)
        temp = {}
        for k in data:
            if (
                self.min_dimensions[0] <= k[0] <= self.max_dimensions[0]
                and self.min_dimensions[1] <= k[1] <= self.max_dimensions[1]
            ):
                temp[k] = data[k]
        self.data = temp
        self.do_shuffle = loader_config["shuffle"]
        self.seed = seed
        if self.mode == "train" and self.do_shuffle:
            random.seed(self.seed)
        self.pairs = []
        for k in self.data:
            info = np.array(self.data[k], dtype=object)
            p = (
                paddle.randperm(len(info))
                if self.mode == "train" and self.do_shuffle
                else paddle.arange(len(info))
            )
            for i in range(0, len(info), self.batchsize):
                batch = info[p[i : i + self.batchsize]]
                if len(batch.shape) == 1:
                    batch = batch[None, :]
                if len(batch) < self.batchsize and not self.keep_smaller_batches:
                    continue
                self.pairs.append(batch)
        if self.do_shuffle:
            self.pairs = np.random.permutation(np.array(self.pairs, dtype=object))
        else:
            self.pairs = np.array(self.pairs, dtype=object)
        self.size = len(self.pairs)
        self.set_epoch_as_seed(self.seed, dataset_config)
        self.ops = create_operators(dataset_config["transforms"], global_config)
        self.ext_op_transform_idx = dataset_config.get("ext_op_transform_idx", 2)
        self.need_reset = True
    def set_epoch_as_seed(self, seed, dataset_config):
        if self.mode == "train":
            try:
                border_map_id = [
                    index
                    for index, dictionary in enumerate(dataset_config["transforms"])
                    if "MakeBorderMap" in dictionary
                ][0]
                shrink_map_id = [
                    index
                    for index, dictionary in enumerate(dataset_config["transforms"])
                    if "MakeShrinkMap" in dictionary
                ][0]
                dataset_config["transforms"][border_map_id]["MakeBorderMap"][
                    "epoch"
                ] = (seed if seed is not None else 0)
                dataset_config["transforms"][shrink_map_id]["MakeShrinkMap"][
                    "epoch"
                ] = (seed if seed is not None else 0)
            except Exception as E:
                print(E)
                return
    def shuffle_data_random(self):
        random.seed(self.seed)
        random.shuffle(self.data_lines)
        return
    def __getitem__(self, idx):
        batch = self.pairs[idx]
        eqs, ims = batch.T
        try:
            max_width, max_height, max_length = 0, 0, 0
            images_transform = []
            for img_path in ims:
                data = {
                    "img_path": img_path,
                }
                with open(data["img_path"], "rb") as f:
                    img = f.read()
                    data["image"] = img
                    item = transform(data, self.ops)
                    images_transform.append(np.array(item[0]))
            image_concat = np.concatenate(images_transform, axis=0)[:, np.newaxis, :, :]
            images_transform = image_concat.astype(np.float32)
            labels, attention_mask, max_length = self.tokenizer(list(eqs))
            if self.max_seq_len < max_length:
                rnd_idx = (
                    np.random.randint(self.__len__())
                    if self.mode == "train"
                    else (idx + 1) % self.__len__()
                )
                return self.__getitem__(rnd_idx)
            return (images_transform, labels, attention_mask)
        except:
            self.logger.error(
                "When parsing line {}, error happened with msg: {}".format(
                    data["img_path"], traceback.format_exc()
                )
            )
            outs = None
        if outs is None:
            # during evaluation, we should fix the idx to get same results for many times of evaluation.
            rnd_idx = (
                np.random.randint(self.__len__())
                if self.mode == "train"
                else (idx + 1) % self.__len__()
            )
            return self.__getitem__(rnd_idx)
        return outs
    def __len__(self):
        return self.size
--- a/ppocr/losses/init.py
+++ b/ppocr/losses/init.py
@ -45,6 +45,7 @@ from .rec_satrn_loss import SATRNLoss
 from .rec_nrtr_loss import NRTRLoss
 from .rec_parseq_loss import ParseQLoss
 from .rec_cppd_loss import CPPDLoss
 from .rec_latexocr_loss import LaTeXOCRLoss
 # cls loss
 from .cls_loss import ClsLoss
@ -107,6 +108,7 @@ def build_loss(config):
        "NRTRLoss",
        "ParseQLoss",
        "CPPDLoss",
        "LaTeXOCRLoss",
    ]
    config = copy.deepcopy(config)
    module_name = config.pop("name")
--- a/ppocr/losses/rec_latexocr_loss.py
+++ b/ppocr/losses/rec_latexocr_loss.py
@ -0,0 +1,47 @@
 # copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 This code is refer from:
 https://github.com/lucidrains/x-transformers/blob/main/x_transformers/autoregressive_wrapper.py
 """
 import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
 import numpy as np
 class LaTeXOCRLoss(nn.Layer):
    """
    LaTeXOCR adopt CrossEntropyLoss for network training.
    """
    def __init__(self):
        super(LaTeXOCRLoss, self).__init__()
        self.ignore_index = -100
        self.cross = nn.CrossEntropyLoss(
            reduction="mean", ignore_index=self.ignore_index
        )
    def forward(self, preds, batch):
        word_probs = preds
        labels = batch[1][:, 1:]
        word_loss = self.cross(
            paddle.reshape(word_probs, [-1, word_probs.shape[-1]]),
            paddle.reshape(labels, [-1]),
        )
        loss = word_loss
        return {"loss": loss}
--- a/ppocr/metrics/init.py
+++ b/ppocr/metrics/init.py
@ -22,7 +22,7 @@ import copy
 __all__ = ["build_metric"]
 from .det_metric import DetMetric, DetFCEMetric
-from .rec_metric import RecMetric, CNTMetric, CANMetric
+from .rec_metric import RecMetric, CNTMetric, CANMetric, LaTeXOCRMetric
 from .cls_metric import ClsMetric
 from .e2e_metric import E2EMetric
 from .distillation_metric import DistillationMetric
@ -50,6 +50,7 @@ def build_metric(config):
        "CTMetric",
        "CNTMetric",
        "CANMetric",
        "LaTeXOCRMetric",
    ]
    config = copy.deepcopy(config)
--- a/ppocr/metrics/bleu.py
+++ b/ppocr/metrics/bleu.py
@ -0,0 +1,240 @@
 # copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 This code is refer from:
 https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py
 """
 import re
 import math
 import collections
 from functools import lru_cache
 def _get_ngrams(segment, max_order):
    """Extracts all n-grams upto a given maximum order from an input segment.
    Args:
      segment: text segment from which n-grams will be extracted.
      max_order: maximum length in tokens of the n-grams returned by this
          methods.
    Returns:
      The Counter containing all n-grams upto max_order in segment
      with a count of how many times each n-gram occurred.
    """
    ngram_counts = collections.Counter()
    for order in range(1, max_order + 1):
        for i in range(0, len(segment) - order + 1):
            ngram = tuple(segment[i : i + order])
            ngram_counts[ngram] += 1
    return ngram_counts
 def compute_bleu(reference_corpus, translation_corpus, max_order=4, smooth=False):
    """Computes BLEU score of translated segments against one or more references.
    Args:
      reference_corpus: list of lists of references for each translation. Each
          reference should be tokenized into a list of tokens.
      translation_corpus: list of translations to score. Each translation
          should be tokenized into a list of tokens.
      max_order: Maximum n-gram order to use when computing BLEU score.
      smooth: Whether or not to apply Lin et al. 2004 smoothing.
    Returns:
      3-Tuple with the BLEU score, n-gram precisions, geometric mean of n-gram
      precisions and brevity penalty.
    """
    matches_by_order = [0] * max_order
    possible_matches_by_order = [0] * max_order
    reference_length = 0
    translation_length = 0
    for references, translation in zip(reference_corpus, translation_corpus):
        reference_length += min(len(r) for r in references)
        translation_length += len(translation)
        merged_ref_ngram_counts = collections.Counter()
        for reference in references:
            merged_ref_ngram_counts |= _get_ngrams(reference, max_order)
        translation_ngram_counts = _get_ngrams(translation, max_order)
        overlap = translation_ngram_counts & merged_ref_ngram_counts
        for ngram in overlap:
            matches_by_order[len(ngram) - 1] += overlap[ngram]
        for order in range(1, max_order + 1):
            possible_matches = len(translation) - order + 1
            if possible_matches > 0:
                possible_matches_by_order[order - 1] += possible_matches
    precisions = [0] * max_order
    for i in range(0, max_order):
        if smooth:
            precisions[i] = (matches_by_order[i] + 1.0) / (
                possible_matches_by_order[i] + 1.0
            )
        else:
            if possible_matches_by_order[i] > 0:
                precisions[i] = (
                    float(matches_by_order[i]) / possible_matches_by_order[i]
                )
            else:
                precisions[i] = 0.0
    if min(precisions) > 0:
        p_log_sum = sum((1.0 / max_order) * math.log(p) for p in precisions)
        geo_mean = math.exp(p_log_sum)
    else:
        geo_mean = 0
    ratio = float(translation_length) / reference_length
    if ratio > 1.0:
        bp = 1.0
    else:
        bp = math.exp(1 - 1.0 / ratio)
    bleu = geo_mean * bp
    return (bleu, precisions, bp, ratio, translation_length, reference_length)
 class BaseTokenizer:
    """A base dummy tokenizer to derive from."""
    def signature(self):
        """
        Returns a signature for the tokenizer.
        :return: signature string
        """
        return "none"
    def __call__(self, line):
        """
        Tokenizes an input line with the tokenizer.
        :param line: a segment to tokenize
        :return: the tokenized line
        """
        return line
 class TokenizerRegexp(BaseTokenizer):
    def signature(self):
        return "re"
    def __init__(self):
        self._re = [
            # language-dependent part (assuming Western languages)
            (re.compile(r"([\{-\~\[-\` -\&\(-\+\:-\@\/])"), r" \1 "),
            # tokenize period and comma unless preceded by a digit
            (re.compile(r"([^0-9])([\.,])"), r"\1 \2 "),
            # tokenize period and comma unless followed by a digit
            (re.compile(r"([\.,])([^0-9])"), r" \1 \2"),
            # tokenize dash when preceded by a digit
            (re.compile(r"([0-9])(-)"), r"\1 \2 "),
            # one space only between words
            # NOTE: Doing this in Python (below) is faster
            # (re.compile(r'\s+'), r' '),
        ]
    @lru_cache(maxsize=2**16)
    def __call__(self, line):
        """Common post-processing tokenizer for `13a` and `zh` tokenizers.
        :param line: a segment to tokenize
        :return: the tokenized line
        """
        for _re, repl in self._re:
            line = _re.sub(repl, line)
        # no leading or trailing spaces, single space within words
        # return ' '.join(line.split())
        # This line is changed with regards to the original tokenizer (seen above) to return individual words
        return line.split()
 class Tokenizer13a(BaseTokenizer):
    def signature(self):
        return "13a"
    def __init__(self):
        self._post_tokenizer = TokenizerRegexp()
    @lru_cache(maxsize=2**16)
    def __call__(self, line):
        """Tokenizes an input line using a relatively minimal tokenization
        that is however equivalent to mteval-v13a, used by WMT.
        :param line: a segment to tokenize
        :return: the tokenized line
        """
        # language-independent part:
        line = line.replace("<skipped>", "")
        line = line.replace("-\n", "")
        line = line.replace("\n", " ")
        if "&" in line:
            line = line.replace("&quot;", '"')
            line = line.replace("&amp;", "&")
            line = line.replace("&lt;", "<")
            line = line.replace("&gt;", ">")
        return self._post_tokenizer(f" {line} ")
 def compute_blue_score(
    predictions, references, tokenizer=Tokenizer13a(), max_order=4, smooth=False
 ):
    # if only one reference is provided make sure we still use list of lists
    if isinstance(references[0], str):
        references = [[ref] for ref in references]
    references = [[tokenizer(r) for r in ref] for ref in references]
    predictions = [tokenizer(p) for p in predictions]
    score = compute_bleu(
        reference_corpus=references,
        translation_corpus=predictions,
        max_order=max_order,
        smooth=smooth,
    )
    (bleu, precisions, bp, ratio, translation_length, reference_length) = score
    return bleu
 def cal_distance(word1, word2):
    m = len(word1)
    n = len(word2)
    if m * n == 0:
        return m + n
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            a = dp[i - 1][j] + 1
            b = dp[i][j - 1] + 1
            c = dp[i - 1][j - 1]
            if word1[i - 1] != word2[j - 1]:
                c += 1
            dp[i][j] = min(a, b, c)
    return dp[m][n]
 def compute_edit_distance(prediction, label):
    prediction = prediction.strip().split(" ")
    label = label.strip().split(" ")
    distance = cal_distance(prediction, label)
    return distance
--- a/ppocr/metrics/rec_metric.py
+++ b/ppocr/metrics/rec_metric.py
@ -17,6 +17,7 @@ from difflib import SequenceMatcher
 import numpy as np
 import string
 from .bleu import compute_blue_score, compute_edit_distance
 class RecMetric(object):
@ -177,3 +178,121 @@ class CANMetric(object):
        self.exp_right = []
        self.word_total_length = 0
        self.exp_total_num = 0
 class LaTeXOCRMetric(object):
    def __init__(self, main_indicator="exp_rate", cal_blue_score=False, **kwargs):
        self.main_indicator = main_indicator
        self.cal_blue_score = cal_blue_score
        self.edit_right = []
        self.exp_right = []
        self.blue_right = []
        self.e1_right = []
        self.e2_right = []
        self.e3_right = []
        self.editdistance_total_length = 0
        self.exp_total_num = 0
        self.edit_dist = 0
        self.exp_rate = 0
        if self.cal_blue_score:
            self.blue_score = 0
        self.e1 = 0
        self.e2 = 0
        self.e3 = 0
        self.reset()
        self.epoch_reset()
    def __call__(self, preds, batch, **kwargs):
        for k, v in kwargs.items():
            epoch_reset = v
            if epoch_reset:
                self.epoch_reset()
        word_pred = preds
        word_label = batch
        line_right, e1, e2, e3 = 0, 0, 0, 0
        lev_dist = []
        for labels, prediction in zip(word_label, word_pred):
            if prediction == labels:
                line_right += 1
            distance = compute_edit_distance(prediction, labels)
            lev_dist.append(Levenshtein.normalized_distance(prediction, labels))
            if distance <= 1:
                e1 += 1
            if distance <= 2:
                e2 += 1
            if distance <= 3:
                e3 += 1
        batch_size = len(lev_dist)
        self.edit_dist = sum(lev_dist)  # float
        self.exp_rate = line_right  # float
        if self.cal_blue_score:
            self.blue_score = compute_blue_score(word_pred, word_label)
        self.e1 = e1
        self.e2 = e2
        self.e3 = e3
        exp_length = len(word_label)
        self.edit_right.append(self.edit_dist)
        self.exp_right.append(self.exp_rate)
        if self.cal_blue_score:
            self.blue_right.append(self.blue_score * batch_size)
        self.e1_right.append(self.e1)
        self.e2_right.append(self.e2)
        self.e3_right.append(self.e3)
        self.editdistance_total_length = self.editdistance_total_length + exp_length
        self.exp_total_num = self.exp_total_num + exp_length
    def get_metric(self):
        """
        return {
            'edit distance': 0,
            "blue_score": 0,
            "exp_rate": 0,
        }
        """
        cur_edit_distance = sum(self.edit_right) / self.exp_total_num
        cur_exp_rate = sum(self.exp_right) / self.exp_total_num
        if self.cal_blue_score:
            cur_blue_score = sum(self.blue_right) / self.editdistance_total_length
        cur_exp_1 = sum(self.e1_right) / self.exp_total_num
        cur_exp_2 = sum(self.e2_right) / self.exp_total_num
        cur_exp_3 = sum(self.e3_right) / self.exp_total_num
        self.reset()
        if self.cal_blue_score:
            return {
                "blue_score ": cur_blue_score,
                "edit distance ": cur_edit_distance,
                "exp_rate ": cur_exp_rate,
                "exp_rate<=1 ": cur_exp_1,
                "exp_rate<=2 ": cur_exp_2,
                "exp_rate<=3 ": cur_exp_3,
            }
        else:
            return {
                "edit distance": cur_edit_distance,
                "exp_rate": cur_exp_rate,
                "exp_rate<=1 ": cur_exp_1,
                "exp_rate<=2 ": cur_exp_2,
                "exp_rate<=3 ": cur_exp_3,
            }
    def reset(self):
        self.edit_dist = 0
        self.exp_rate = 0
        if self.cal_blue_score:
            self.blue_score = 0
        self.e1 = 0
        self.e2 = 0
        self.e3 = 0
    def epoch_reset(self):
        self.edit_right = []
        self.exp_right = []
        if self.cal_blue_score:
            self.blue_right = []
        self.e1_right = []
        self.e2_right = []
        self.e3_right = []
        self.editdistance_total_length = 0
        self.exp_total_num = 0
--- a/ppocr/modeling/backbones/init.py
+++ b/ppocr/modeling/backbones/init.py
@ -59,6 +59,8 @@ def build_backbone(config, model_type):
        from .rec_vitstr import ViTSTR
        from .rec_resnet_rfl import ResNetRFL
        from .rec_densenet import DenseNet
        from .rec_resnetv2 import ResNetV2
        from .rec_hybridvit import HybridTransformer
        from .rec_shallow_cnn import ShallowCNN
        from .rec_lcnetv3 import PPLCNetV3
        from .rec_hgnet import PPHGNet_small
@ -89,6 +91,8 @@ def build_backbone(config, model_type):
            "ViT",
            "RepSVTR",
            "SVTRv2",
            "ResNetV2",
            "HybridTransformer",
        ]
    elif model_type == "e2e":
        from .e2e_resnet_vd_pg import ResNet
--- a/ppocr/modeling/backbones/rec_hybridvit.py
+++ b/ppocr/modeling/backbones/rec_hybridvit.py
@ -0,0 +1,529 @@
 # copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 This code is refer from:
 https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer_hybrid.py
 """
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 from itertools import repeat
 import collections
 import math
 from functools import partial
 import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
 from ppocr.modeling.backbones.rec_resnetv2 import (
    ResNetV2,
    StdConv2dSame,
    DropPath,
    get_padding,
 )
 from paddle.nn.initializer import (
    TruncatedNormal,
    Constant,
    Normal,
    KaimingUniform,
    XavierUniform,
 )
 normal_ = Normal(mean=0.0, std=1e-6)
 zeros_ = Constant(value=0.0)
 ones_ = Constant(value=1.0)
 kaiming_normal_ = KaimingUniform(nonlinearity="relu")
 trunc_normal_ = TruncatedNormal(std=0.02)
 xavier_uniform_ = XavierUniform()
 def _ntuple(n):
    def parse(x):
        if isinstance(x, collections.abc.Iterable):
            return x
        return tuple(repeat(x, n))
    return parse
 to_1tuple = _ntuple(1)
 to_2tuple = _ntuple(2)
 to_3tuple = _ntuple(3)
 to_4tuple = _ntuple(4)
 to_ntuple = _ntuple
 class Conv2dAlign(nn.Conv2D):
    """Conv2d with Weight Standardization. Used for BiT ResNet-V2 models.
    Paper: `Micro-Batch Training with Batch-Channel Normalization and Weight Standardization` -
        https://arxiv.org/abs/1903.10520v2
    """
    def __init__(
        self,
        in_channel,
        out_channels,
        kernel_size,
        stride=1,
        padding=0,
        dilation=1,
        groups=1,
        bias=True,
        eps=1e-6,
    ):
        super().__init__(
            in_channel,
            out_channels,
            kernel_size,
            stride=stride,
            padding=padding,
            dilation=dilation,
            groups=groups,
            bias_attr=bias,
            weight_attr=True,
        )
        self.eps = eps
    def forward(self, x):
        x = F.conv2d(
            x,
            self.weight,
            self.bias,
            self._stride,
            self._padding,
            self._dilation,
            self._groups,
        )
        return x
 class HybridEmbed(nn.Layer):
    """CNN Feature Map Embedding
    Extract feature map from CNN, flatten, project to embedding dim.
    """
    def __init__(
        self,
        backbone,
        img_size=224,
        patch_size=1,
        feature_size=None,
        in_chans=3,
        embed_dim=768,
    ):
        super().__init__()
        assert isinstance(backbone, nn.Layer)
        img_size = to_2tuple(img_size)
        patch_size = to_2tuple(patch_size)
        self.img_size = img_size
        self.patch_size = patch_size
        self.backbone = backbone
        feature_dim = 1024
        feature_size = (42, 12)
        patch_size = (1, 1)
        assert (
            feature_size[0] % patch_size[0] == 0
            and feature_size[1] % patch_size[1] == 0
        )
        self.grid_size = (
            feature_size[0] // patch_size[0],
            feature_size[1] // patch_size[1],
        )
        self.num_patches = self.grid_size[0] * self.grid_size[1]
        self.proj = nn.Conv2D(
            feature_dim,
            embed_dim,
            kernel_size=patch_size,
            stride=patch_size,
            weight_attr=True,
            bias_attr=True,
        )
    def forward(self, x):
        x = self.backbone(x)
        if isinstance(x, (list, tuple)):
            x = x[-1]  # last feature if backbone outputs list/tuple of features
        x = self.proj(x).flatten(2).transpose([0, 2, 1])
        return x
 class myLinear(nn.Linear):
    def __init__(self, in_channel, out_channels, weight_attr=True, bias_attr=True):
        super().__init__(
            in_channel, out_channels, weight_attr=weight_attr, bias_attr=bias_attr
        )
    def forward(self, x):
        return paddle.matmul(x, self.weight, transpose_y=True) + self.bias
 class Attention(nn.Layer):
    def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0):
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = head_dim**-0.5
        self.qkv = nn.Linear(dim, dim * 3, bias_attr=qkv_bias)
        self.attn_drop = nn.Dropout(attn_drop)
        self.proj = myLinear(dim, dim, weight_attr=True, bias_attr=True)
        self.proj_drop = nn.Dropout(proj_drop)
    def forward(self, x):
        B, N, C = x.shape
        qkv = (
            self.qkv(x)
            .reshape([B, N, 3, self.num_heads, C // self.num_heads])
            .transpose([2, 0, 3, 1, 4])
        )
        q, k, v = qkv.unbind(0)  # make torchscript happy (cannot use tensor as tuple)
        attn = (q @ k.transpose([0, 1, 3, 2])) * self.scale
        attn = F.softmax(attn, axis=-1)
        attn = self.attn_drop(attn)
        x = (attn @ v).transpose([0, 2, 1, 3]).reshape([B, N, C])
        x = self.proj(x)
        x = self.proj_drop(x)
        return x
 class Mlp(nn.Layer):
    """MLP as used in Vision Transformer, MLP-Mixer and related networks"""
    def __init__(
        self,
        in_features,
        hidden_features=None,
        out_features=None,
        act_layer=nn.GELU,
        drop=0.0,
    ):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        drop_probs = to_2tuple(drop)
        self.fc1 = nn.Linear(in_features, hidden_features)
        self.act = act_layer()
        self.drop1 = nn.Dropout(drop_probs[0])
        self.fc2 = nn.Linear(hidden_features, out_features)
        self.drop2 = nn.Dropout(drop_probs[1])
    def forward(self, x):
        x = self.fc1(x)
        x = self.act(x)
        x = self.drop1(x)
        x = self.fc2(x)
        x = self.drop2(x)
        return x
 class Block(nn.Layer):
    def __init__(
        self,
        dim,
        num_heads,
        mlp_ratio=4.0,
        qkv_bias=False,
        drop=0.0,
        attn_drop=0.0,
        drop_path=0.0,
        act_layer=nn.GELU,
        norm_layer=nn.LayerNorm,
    ):
        super().__init__()
        self.norm1 = norm_layer(dim)
        self.attn = Attention(
            dim,
            num_heads=num_heads,
            qkv_bias=qkv_bias,
            attn_drop=attn_drop,
            proj_drop=drop,
        )
        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()
        self.norm2 = norm_layer(dim)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(
            in_features=dim,
            hidden_features=mlp_hidden_dim,
            act_layer=act_layer,
            drop=drop,
        )
    def forward(self, x):
        x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
 class HybridTransformer(nn.Layer):
    """Implementation of HybridTransformer.
    Args:
      x: input images with shape [N, 1, H, W]
      label: LaTeX-OCR labels with shape [N, L] , L is the max sequence length
      attention_mask: LaTeX-OCR attention mask with shape [N, L]  , L is the max sequence length
    Returns:
      The encoded features with shape [N, 1, H//16, W//16]
    """
    def __init__(
        self,
        backbone_layers=[2, 3, 7],
        input_channel=1,
        is_predict=False,
        is_export=False,
        img_size=(224, 224),
        patch_size=16,
        num_classes=1000,
        embed_dim=768,
        depth=12,
        num_heads=12,
        mlp_ratio=4.0,
        qkv_bias=True,
        representation_size=None,
        distilled=False,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.0,
        embed_layer=None,
        norm_layer=None,
        act_layer=None,
        weight_init="",
        **kwargs,
    ):
        super(HybridTransformer, self).__init__()
        self.num_classes = num_classes
        self.num_features = self.embed_dim = (
            embed_dim  # num_features for consistency with other models
        )
        self.num_tokens = 2 if distilled else 1
        norm_layer = norm_layer or partial(nn.LayerNorm, epsilon=1e-6)
        act_layer = act_layer or nn.GELU
        self.height, self.width = img_size
        self.patch_size = patch_size
        backbone = ResNetV2(
            layers=backbone_layers,
            num_classes=0,
            global_pool="",
            in_chans=input_channel,
            preact=False,
            stem_type="same",
            conv_layer=StdConv2dSame,
            is_export=is_export,
        )
        min_patch_size = 2 ** (len(backbone_layers) + 1)
        self.patch_embed = HybridEmbed(
            img_size=img_size,
            patch_size=patch_size // min_patch_size,
            in_chans=input_channel,
            embed_dim=embed_dim,
            backbone=backbone,
        )
        num_patches = self.patch_embed.num_patches
        self.cls_token = paddle.create_parameter([1, 1, embed_dim], dtype="float32")
        self.dist_token = (
            paddle.create_parameter(
                [1, 1, embed_dim],
                dtype="float32",
            )
            if distilled
            else None
        )
        self.pos_embed = paddle.create_parameter(
            [1, num_patches + self.num_tokens, embed_dim], dtype="float32"
        )
        self.pos_drop = nn.Dropout(p=drop_rate)
        zeros_(self.cls_token)
        if self.dist_token is not None:
            zeros_(self.dist_token)
        zeros_(self.pos_embed)
        dpr = [
            x.item() for x in paddle.linspace(0, drop_path_rate, depth)
        ]  # stochastic depth decay rule
        self.blocks = nn.Sequential(
            *[
                Block(
                    dim=embed_dim,
                    num_heads=num_heads,
                    mlp_ratio=mlp_ratio,
                    qkv_bias=qkv_bias,
                    drop=drop_rate,
                    attn_drop=attn_drop_rate,
                    drop_path=dpr[i],
                    norm_layer=norm_layer,
                    act_layer=act_layer,
                )
                for i in range(depth)
            ]
        )
        self.norm = norm_layer(embed_dim)
        # Representation layer
        if representation_size and not distilled:
            self.num_features = representation_size
            self.pre_logits = nn.Sequential(
                ("fc", nn.Linear(embed_dim, representation_size)), ("act", nn.Tanh())
            )
        else:
            self.pre_logits = nn.Identity()
        # Classifier head(s)
        self.head = (
            nn.Linear(self.num_features, num_classes)
            if num_classes > 0
            else nn.Identity()
        )
        self.head_dist = None
        if distilled:
            self.head_dist = (
                nn.Linear(self.embed_dim, self.num_classes)
                if num_classes > 0
                else nn.Identity()
            )
        self.init_weights(weight_init)
        self.out_channels = embed_dim
        self.is_predict = is_predict
        self.is_export = is_export
    def init_weights(self, mode=""):
        assert mode in ("jax", "jax_nlhb", "nlhb", "")
        head_bias = -math.log(self.num_classes) if "nlhb" in mode else 0.0
        trunc_normal_(self.pos_embed)
        trunc_normal_(self.cls_token)
        self.apply(_init_vit_weights)
    def _init_weights(self, m):
        # this fn left here for compat with downstream users
        _init_vit_weights(m)
    def load_pretrained(self, checkpoint_path, prefix=""):
        raise NotImplementedError
    def no_weight_decay(self):
        return {"pos_embed", "cls_token", "dist_token"}
    def get_classifier(self):
        if self.dist_token is None:
            return self.head
        else:
            return self.head, self.head_dist
    def reset_classifier(self, num_classes, global_pool=""):
        self.num_classes = num_classes
        self.head = (
            nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()
        )
        if self.num_tokens == 2:
            self.head_dist = (
                nn.Linear(self.embed_dim, self.num_classes)
                if num_classes > 0
                else nn.Identity()
            )
    def forward_features(self, x):
        B, c, h, w = x.shape
        x = self.patch_embed(x)
        cls_tokens = self.cls_token.expand(
            [B, -1, -1]
        )  # stole cls_tokens impl from Phil Wang, thanks
        x = paddle.concat((cls_tokens, x), axis=1)
        h, w = h // self.patch_size, w // self.patch_size
        repeat_tensor = (
            paddle.arange(h) * (self.width // self.patch_size - w)
        ).reshape([-1, 1])
        repeat_tensor = paddle.repeat_interleave(
            repeat_tensor, paddle.to_tensor(w), axis=1
        ).reshape([-1])
        pos_emb_ind = repeat_tensor + paddle.arange(h * w)
        pos_emb_ind = paddle.concat(
            (paddle.zeros([1], dtype="int64"), pos_emb_ind + 1), axis=0
        ).cast(paddle.int64)
        x += self.pos_embed[:, pos_emb_ind]
        x = self.pos_drop(x)
        for blk in self.blocks:
            x = blk(x)
        x = self.norm(x)
        return x
    def forward(self, input_data):
        if self.training:
            x, label, attention_mask = input_data
        else:
            if isinstance(input_data, list):
                x = input_data[0]
            else:
                x = input_data
        x = self.forward_features(x)
        x = self.head(x)
        if self.training:
            return x, label, attention_mask
        else:
            return x
 def _init_vit_weights(
    module: nn.Layer, name: str = "", head_bias: float = 0.0, jax_impl: bool = False
 ):
    """ViT weight initialization
    * When called without n, head_bias, jax_impl args it will behave exactly the same
      as my original init for compatibility with prev hparam / downstream use cases (ie DeiT).
    * When called w/ valid n (module name) and jax_impl=True, will (hopefully) match JAX impl
    """
    if isinstance(module, nn.Linear):
        if name.startswith("head"):
            zeros_(module.weight)
            constant_ = Constant(value=head_bias)
            constant_(module.bias, head_bias)
        elif name.startswith("pre_logits"):
            zeros_(module.bias)
        else:
            if jax_impl:
                xavier_uniform_(module.weight)
                if module.bias is not None:
                    if "mlp" in name:
                        normal_(module.bias)
                    else:
                        zeros_(module.bias)
            else:
                trunc_normal_(module.weight)
                if module.bias is not None:
                    zeros_(module.bias)
    elif jax_impl and isinstance(module, nn.Conv2D):
        # NOTE conv was left to pytorch default in my original init
        if module.bias is not None:
            zeros_(module.bias)
    elif isinstance(module, (nn.LayerNorm, nn.GroupNorm, nn.BatchNorm2D)):
        zeros_(module.bias)
        ones_(module.weight)
--- a/ppocr/modeling/backbones/rec_resnetv2.py
+++ b/ppocr/modeling/backbones/rec_resnetv2.py
--- a/ppocr/modeling/heads/init.py
+++ b/ppocr/modeling/heads/init.py
@ -40,6 +40,7 @@ def build_head(config):
    from .rec_visionlan_head import VLHead
    from .rec_rfl_head import RFLHead
    from .rec_can_head import CANHead
    from .rec_latexocr_head import LaTeXOCRHead
    from .rec_satrn_head import SATRNHead
    from .rec_parseq_head import ParseQHead
    from .rec_cppd_head import CPPDHead
@ -81,6 +82,7 @@ def build_head(config):
        "RFLHead",
        "DRRGHead",
        "CANHead",
        "LaTeXOCRHead",
        "SATRNHead",
        "PFHeadLocal",
        "ParseQHead",
--- a/ppocr/modeling/heads/rec_latexocr_head.py
+++ b/ppocr/modeling/heads/rec_latexocr_head.py
--- a/ppocr/postprocess/init.py
+++ b/ppocr/postprocess/init.py
@ -42,6 +42,7 @@ from .rec_postprocess import (
    SATRNLabelDecode,
    ParseQLabelDecode,
    CPPDLabelDecode,
    LaTeXOCRDecode,
 )
 from .cls_postprocess import ClsPostProcess
 from .pg_postprocess import PGPostProcess
@ -96,6 +97,7 @@ def build_post_process(config, global_config=None):
        "SATRNLabelDecode",
        "ParseQLabelDecode",
        "CPPDLabelDecode",
        "LaTeXOCRDecode",
    ]
    if config["name"] == "PSEPostProcess":
--- a/ppocr/postprocess/rec_postprocess.py
+++ b/ppocr/postprocess/rec_postprocess.py
@ -15,6 +15,7 @@
 import numpy as np
 import paddle
 from paddle.nn import functional as F
 from tokenizers import Tokenizer as TokenizerFast
 import re
@ -1210,3 +1211,53 @@ class CPPDLabelDecode(NRTRLabelDecode):
    def add_special_char(self, dict_character):
        dict_character = ["</s>"] + dict_character
        return dict_character
 class LaTeXOCRDecode(object):
    """Convert between latex-symbol and symbol-index"""
    def __init__(self, rec_char_dict_path, **kwargs):
        super(LaTeXOCRDecode, self).__init__()
        self.tokenizer = TokenizerFast.from_file(rec_char_dict_path)
    def post_process(self, s):
        text_reg = r"(\\(operatorname|mathrm|text|mathbf)\s?\*? {.*?})"
        letter = "[a-zA-Z]"
        noletter = "[\W_^\d]"
        names = [x[0].replace(" ", "") for x in re.findall(text_reg, s)]
        s = re.sub(text_reg, lambda match: str(names.pop(0)), s)
        news = s
        while True:
            s = news
            news = re.sub(r"(?!\\ )(%s)\s+?(%s)" % (noletter, noletter), r"\1\2", s)
            news = re.sub(r"(?!\\ )(%s)\s+?(%s)" % (noletter, letter), r"\1\2", news)
            news = re.sub(r"(%s)\s+?(%s)" % (letter, noletter), r"\1\2", news)
            if news == s:
                break
        return s
    def decode(self, tokens):
        if len(tokens.shape) == 1:
            tokens = tokens[None, :]
        dec = [self.tokenizer.decode(tok) for tok in tokens]
        dec_str_list = [
            "".join(detok.split(" "))
            .replace("Ġ", " ")
            .replace("[EOS]", "")
            .replace("[BOS]", "")
            .replace("[PAD]", "")
            .strip()
            for detok in dec
        ]
        return [self.post_process(dec_str) for dec_str in dec_str_list]
    def __call__(self, preds, label=None, mode="eval", *args, **kwargs):
        if mode == "train":
            preds_idx = np.array(preds.argmax(axis=2))
            text = self.decode(preds_idx)
        else:
            text = self.decode(np.array(preds))
        if label is None:
            return text
        label = self.decode(np.array(label))
        return text, label
--- a/ppocr/utils/dict/latex_ocr_tokenizer.json
+++ b/ppocr/utils/dict/latex_ocr_tokenizer.json
--- a/ppocr/utils/formula_utils/math_txt2pkl.py
+++ b/ppocr/utils/formula_utils/math_txt2pkl.py
@ -0,0 +1,70 @@
 # copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import pickle
 from tqdm import tqdm
 import os
 import cv2
 import imagesize
 from collections import defaultdict
 import glob
 from os.path import join
 import argparse
 def txt2pickle(images, equations, save_dir):
    save_p = os.path.join(save_dir, "latexocr_{}.pkl".format(images.split("/")[-1]))
    min_dimensions = (32, 32)
    max_dimensions = (672, 192)
    max_length = 512
    data = defaultdict(lambda: [])
    if images is not None and equations is not None:
        images_list = [
            path.replace("\\", "/") for path in glob.glob(join(images, "*.png"))
        ]
        indices = [int(os.path.basename(img).split(".")[0]) for img in images_list]
        eqs = open(equations, "r").read().split("\n")
        for i, im in tqdm(enumerate(images_list), total=len(images_list)):
            width, height = imagesize.get(im)
            if (
                min_dimensions[0] <= width <= max_dimensions[0]
                and min_dimensions[1] <= height <= max_dimensions[1]
            ):
                data[(width, height)].append((eqs[indices[i]], im))
        data = dict(data)
        with open(save_p, "wb") as file:
            pickle.dump(data, file)
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--image_dir",
        type=str,
        default=".",
        help="Input_label or input path to be converted",
    )
    parser.add_argument(
        "--mathtxt_path",
        type=str,
        default=".",
        help="Input_label or input path to be converted",
    )
    parser.add_argument(
        "--output_dir", type=str, default="out_label.txt", help="Output file name"
    )
    args = parser.parse_args()
    txt2pickle(args.image_dir, args.mathtxt_path, args.output_dir)
--- a/requirements.txt
+++ b/requirements.txt
@ -12,3 +12,6 @@ cython
 Pillow
 pyyaml
 requests
 albumentations==1.4.10
 tokenizers==0.19.1
 imagesize
--- a/tools/eval.py
+++ b/tools/eval.py
@ -105,6 +105,8 @@ def main():
    if "model_type" in config["Architecture"].keys():
        if config["Architecture"]["algorithm"] == "CAN":
            model_type = "can"
        elif config["Architecture"]["algorithm"] == "LaTeXOCR":
            model_type = "latexocr"
        else:
            model_type = config["Architecture"]["model_type"]
    else:
--- a/tools/export_model.py
+++ b/tools/export_model.py
@ -131,6 +131,11 @@ def export_single_model(
            ]
        ]
        model = to_static(model, input_spec=other_shape)
    elif arch_config["algorithm"] == "LaTeXOCR":
        other_shape = [
            paddle.static.InputSpec(shape=[None, 1, None, None], dtype="float32"),
        ]
        model = to_static(model, input_spec=other_shape)
    elif arch_config["algorithm"] in ["LayoutLM", "LayoutLMv2", "LayoutXLM"]:
        input_spec = [
            paddle.static.InputSpec(shape=[None, 512], dtype="int64"),  # input_ids
--- a/tools/infer/predict_rec.py
+++ b/tools/infer/predict_rec.py
@ -133,6 +133,11 @@ class TextRecognizer(object):
                "character_dict_path": args.rec_char_dict_path,
                "use_space_char": args.use_space_char,
            }
        elif self.rec_algorithm == "LaTeXOCR":
            postprocess_params = {
                "name": "LaTeXOCRDecode",
                "rec_char_dict_path": args.rec_char_dict_path,
            }
        elif self.rec_algorithm == "ParseQ":
            postprocess_params = {
                "name": "ParseQLabelDecode",
@ -450,6 +455,90 @@ class TextRecognizer(object):
        return img
    def pad_(self, img, divable=32):
        threshold = 128
        data = np.array(img.convert("LA"))
        if data[..., -1].var() == 0:
            data = (data[..., 0]).astype(np.uint8)
        else:
            data = (255 - data[..., -1]).astype(np.uint8)
        data = (data - data.min()) / (data.max() - data.min()) * 255
        if data.mean() > threshold:
            # To invert the text to white
            gray = 255 * (data < threshold).astype(np.uint8)
        else:
            gray = 255 * (data > threshold).astype(np.uint8)
            data = 255 - data
        coords = cv2.findNonZero(gray)  # Find all non-zero points (text)
        a, b, w, h = cv2.boundingRect(coords)  # Find minimum spanning bounding box
        rect = data[b : b + h, a : a + w]
        im = Image.fromarray(rect).convert("L")
        dims = []
        for x in [w, h]:
            div, mod = divmod(x, divable)
            dims.append(divable * (div + (1 if mod > 0 else 0)))
        padded = Image.new("L", dims, 255)
        padded.paste(im, (0, 0, im.size[0], im.size[1]))
        return padded
    def minmax_size_(
        self,
        img,
        max_dimensions,
        min_dimensions,
    ):
        if max_dimensions is not None:
            ratios = [a / b for a, b in zip(img.size, max_dimensions)]
            if any([r > 1 for r in ratios]):
                size = np.array(img.size) // max(ratios)
                img = img.resize(tuple(size.astype(int)), Image.BILINEAR)
        if min_dimensions is not None:
            # hypothesis: there is a dim in img smaller than min_dimensions, and return a proper dim >= min_dimensions
            padded_size = [
                max(img_dim, min_dim)
                for img_dim, min_dim in zip(img.size, min_dimensions)
            ]
            if padded_size != list(img.size):  # assert hypothesis
                padded_im = Image.new("L", padded_size, 255)
                padded_im.paste(img, img.getbbox())
                img = padded_im
        return img
    def norm_img_latexocr(self, img):
        # CAN only predict gray scale image
        shape = (1, 1, 3)
        mean = [0.7931, 0.7931, 0.7931]
        std = [0.1738, 0.1738, 0.1738]
        scale = 255.0
        min_dimensions = [32, 32]
        max_dimensions = [672, 192]
        mean = np.array(mean).reshape(shape).astype("float32")
        std = np.array(std).reshape(shape).astype("float32")
        im_h, im_w = img.shape[:2]
        if (
            min_dimensions[0] <= im_w <= max_dimensions[0]
            and min_dimensions[1] <= im_h <= max_dimensions[1]
        ):
            pass
        else:
            img = Image.fromarray(np.uint8(img))
            img = self.minmax_size_(self.pad_(img), max_dimensions, min_dimensions)
            img = np.array(img)
            im_h, im_w = img.shape[:2]
            img = np.dstack([img, img, img])
        img = (img.astype("float32") * scale - mean) / std
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        divide_h = math.ceil(im_h / 16) * 16
        divide_w = math.ceil(im_w / 16) * 16
        img = np.pad(
            img, ((0, divide_h - im_h), (0, divide_w - im_w)), constant_values=(1, 1)
        )
        img = img[:, :, np.newaxis].transpose(2, 0, 1)
        img = img.astype("float32")
        return img
    def __call__(self, img_list):
        img_num = len(img_list)
        # Calculate the aspect ratio of all text bars
@ -552,6 +641,10 @@ class TextRecognizer(object):
                    word_label_list = []
                    norm_img_mask_batch.append(norm_image_mask)
                    word_label_list.append(word_label)
                elif self.rec_algorithm == "LaTeXOCR":
                    norm_img = self.norm_img_latexocr(img_list[indices[ino]])
                    norm_img = norm_img[np.newaxis, :]
                    norm_img_batch.append(norm_img)
                else:
                    norm_img = self.resize_norm_img(
                        img_list[indices[ino]], max_wh_ratio
@ -666,6 +759,29 @@ class TextRecognizer(object):
                    if self.benchmark:
                        self.autolog.times.stamp()
                    preds = outputs
            elif self.rec_algorithm == "LaTeXOCR":
                inputs = [norm_img_batch]
                if self.use_onnx:
                    input_dict = {}
                    input_dict[self.input_tensor.name] = norm_img_batch
                    outputs = self.predictor.run(self.output_tensors, input_dict)
                    preds = outputs
                else:
                    input_names = self.predictor.get_input_names()
                    input_tensor = []
                    for i in range(len(input_names)):
                        input_tensor_i = self.predictor.get_input_handle(input_names[i])
                        input_tensor_i.copy_from_cpu(inputs[i])
                        input_tensor.append(input_tensor_i)
                    self.input_tensor = input_tensor
                    self.predictor.run()
                    outputs = []
                    for output_tensor in self.output_tensors:
                        output = output_tensor.copy_to_cpu()
                        outputs.append(output)
                    if self.benchmark:
                        self.autolog.times.stamp()
                    preds = outputs
            else:
                if self.use_onnx:
                    input_dict = {}
@ -692,6 +808,9 @@ class TextRecognizer(object):
                    wh_ratio_list=wh_ratio_list,
                    max_wh_ratio=max_wh_ratio,
                )
            elif self.postprocess_params["name"] == "LaTeXOCRDecode":
                preds = [p.reshape([-1]) for p in preds]
                rec_result = self.postprocess_op(preds)
            else:
                rec_result = self.postprocess_op(preds)
            for rno in range(len(rec_result)):
--- a/tools/infer_rec.py
+++ b/tools/infer_rec.py
@ -183,6 +183,8 @@ def main():
            elif isinstance(post_result, list) and isinstance(post_result[0], int):
                # for RFLearning CNT branch
                info = str(post_result[0])
            elif config["Architecture"]["algorithm"] == "LaTeXOCR":
                info = str(post_result[0])
            else:
                if len(post_result[0]) >= 2:
                    info = post_result[0][0] + "\t" + str(post_result[0][1])
--- a/tools/program.py
+++ b/tools/program.py
@ -324,6 +324,8 @@ def train(
                        preds = model(batch)
                    elif algorithm in ["CAN"]:
                        preds = model(batch[:3])
                    elif algorithm in ["LaTeXOCR"]:
                        preds = model(batch)
                    else:
                        preds = model(images)
                preds = to_float32(preds)
@ -339,6 +341,8 @@ def train(
                    preds = model(batch)
                elif algorithm in ["CAN"]:
                    preds = model(batch[:3])
                elif algorithm in ["LaTeXOCR"]:
                    preds = model(batch)
                else:
                    preds = model(images)
                loss = loss_class(preds, batch)
@ -360,6 +364,10 @@ def train(
                elif algorithm in ["CAN"]:
                    model_type = "can"
                    eval_class(preds[0], batch[2:], epoch_reset=(idx == 0))
                elif algorithm in ["LaTeXOCR"]:
                    model_type = "latexocr"
                    post_result = post_process_class(preds, batch[1], mode="train")
                    eval_class(post_result[0], post_result[1], epoch_reset=(idx == 0))
                else:
                    if config["Loss"]["name"] in [
                        "MultiLoss",
@ -600,6 +608,8 @@ def eval(
                        preds = model(batch)
                    elif model_type in ["can"]:
                        preds = model(batch[:3])
                    elif model_type in ["latexocr"]:
                        preds = model(batch)
                    elif model_type in ["sr"]:
                        preds = model(batch)
                        sr_img = preds["sr_img"]
@ -614,6 +624,8 @@ def eval(
                    preds = model(batch)
                elif model_type in ["can"]:
                    preds = model(batch[:3])
                elif model_type in ["latexocr"]:
                    preds = model(batch)
                elif model_type in ["sr"]:
                    preds = model(batch)
                    sr_img = preds["sr_img"]
@ -640,6 +652,9 @@ def eval(
                eval_class(preds, batch_numpy)
            elif model_type in ["can"]:
                eval_class(preds[0], batch_numpy[2:], epoch_reset=(idx == 0))
            elif model_type in ["latexocr"]:
                post_result = post_process_class(preds, batch[1], "eval")
                eval_class(post_result[0], post_result[1], epoch_reset=(idx == 0))
            else:
                post_result = post_process_class(preds, batch_numpy[1])
                eval_class(post_result, batch_numpy)
@ -777,6 +792,7 @@ def preprocess(is_train=False):
        "SVTR_HGNet",
        "ParseQ",
        "CPPD",
        "LaTeXOCR",
    ]
    if use_xpu: