 671415bbd8
			
		
	
	
		671415bbd8
		
			
		
	
	
	
	
		
			
			* add rec vitstr algorithm. * fix cpu_thread and precision * fix svtr tipc * modify vitstr name * modify vitstr config batchsize * [New Rec] add vitstr and ABINet * add rec_resnet45 * svtr ch large model * [application] svtr ch model * [application] svtr ch model * [application] svtr ch model * add abinet_rec_aug and trained model * aug p infe * fix ci export bug * fix abinet ci bug
		
			
				
	
	
	
		
			7.4 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	SVTR
- 1. Introduction
- 2. Environment
- 3. Model Training / Evaluation / Prediction
- 4. Inference and Deployment
- 5. FAQ
1. Introduction
Paper:
SVTR: Scene Text Recognition with a Single Visual Model Yongkun Du and Zhineng Chen and Caiyan Jia Xiaoting Yin and Tianlun Zheng and Chenxia Li and Yuning Du and Yu-Gang Jiang IJCAI, 2022
The accuracy (%) and model files of SVTR on the public dataset of scene text recognition are as follows:
- Chinese dataset from Chinese Benckmark , and the Chinese training evaluation strategy of SVTR follows the paper.
| Model | IC13 857 | SVT | IIIT5k 3000 | IC15 1811 | SVTP | CUTE80 | Avg_6 | IC15 2077 | IC13 1015 | IC03 867 | IC03 860 | Avg_10 | Chinese scene_test | Download link | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVTR Tiny | 96.85 | 91.34 | 94.53 | 83.99 | 85.43 | 89.24 | 90.87 | 80.55 | 95.37 | 95.27 | 95.70 | 90.13 | 67.90 | English / Chinese | 
| SVTR Small | 95.92 | 93.04 | 95.03 | 84.70 | 87.91 | 92.01 | 91.63 | 82.72 | 94.88 | 96.08 | 96.28 | 91.02 | 69.00 | English / Chinese | 
| SVTR Base | 97.08 | 91.50 | 96.03 | 85.20 | 89.92 | 91.67 | 92.33 | 83.73 | 95.66 | 95.62 | 95.81 | 91.61 | 71.40 | English / - | 
| SVTR Large | 97.20 | 91.65 | 96.30 | 86.58 | 88.37 | 95.14 | 92.82 | 84.54 | 96.35 | 96.54 | 96.74 | 92.24 | 72.10 | English / Chinese | 
2. Environment
Please refer to "Environment Preparation" to configure the PaddleOCR environment, and refer to "Project Clone" to clone the project code.
Dataset Preparation
English dataset download Chinese dataset download
3. Model Training / Evaluation / Prediction
Please refer to Text Recognition Tutorial. PaddleOCR modularizes the code, and training different recognition models only requires changing the configuration file.
Training:
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
#Single GPU training (long training period, not recommended)
python3 tools/train.py -c configs/rec/rec_svtrnet.yml
#Multi GPU training, specify the gpu number through the --gpus parameter
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_svtrnet.yml
Evaluation:
You can download the model files and configuration files provided by SVTR: download link, take SVTR-T as an example, using the following command to evaluate:
# Download the tar archive containing the model files and configuration files of SVTR-T and extract it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar && tar xf rec_svtr_tiny_none_ctc_en_train.tar
# GPU evaluation
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c ./rec_svtr_tiny_none_ctc_en_train/rec_svtr_tiny_6local_6global_stn_en.yml -o Global.pretrained_model=./rec_svtr_tiny_none_ctc_en_train/best_accuracy
Prediction:
python3 tools/infer_rec.py -c ./rec_svtr_tiny_none_ctc_en_train/rec_svtr_tiny_6local_6global_stn_en.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./rec_svtr_tiny_none_ctc_en_train/best_accuracy
4. Inference and Deployment
4.1 Python Inference
First, the model saved during the SVTR text recognition training process is converted into an inference model. ( Model download link ), you can use the following command to convert:
python3 tools/export_model.py -c configs/rec/rec_svtrnet.yml -o Global.pretrained_model=./rec_svtr_tiny_none_ctc_en_train/best_accuracy  Global.save_inference_dir=./inference/rec_svtr_tiny_stn_en
Note:
- If you are training the model on your own dataset and have modified the dictionary file, please pay attention to modify the character_dict_pathin the configuration file to the modified dictionary file.
After the conversion is successful, there are three files in the directory:
/inference/rec_svtr_tiny_stn_en/
    ├── inference.pdiparams
    ├── inference.pdiparams.info
    └── inference.pdmodel
For SVTR text recognition model inference, the following commands can be executed:
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_svtr_tiny_stn_en/' --rec_algorithm='SVTR' --rec_image_shape='3,64,256' --rec_char_dict_path='./ppocr/utils/ic15_dict.txt'
After executing the command, the prediction result (recognized text and score) of the image above is printed to the screen, an example is as follows: The result is as follows:
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9999998807907104)
4.2 C++ Inference
Not supported
4.3 Serving
Not supported
4.4 More
Not supported
5. FAQ
- Since most of the operators used by SVTRare matrix multiplication, in the GPU environment, the speed has an advantage, but in the environment where mkldnn is enabled on the CPU,SVTRhas no advantage over the optimized convolutional network.
Citation
@article{Du2022SVTR,
  title     = {SVTR: Scene Text Recognition with a Single Visual Model},
  author    = {Du, Yongkun and Chen, Zhineng and Jia, Caiyan and Yin, Xiaoting and Zheng, Tianlun and Li, Chenxia and Du, Yuning and Jiang, Yu-Gang},
  booktitle = {IJCAI},
  year      = {2022},
  url       = {https://arxiv.org/abs/2205.00159}
}
