Please refer to ["Environment Preparation"](../../ppocr/environment.en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](../../ppocr/blog/clone.en.md)to clone the project code.
Please refer to [Text Recognition Tutorial](../../ppocr/model_train/recognition.en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
### Training
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
```bash linenums="1"
# Single GPU training (long training period, not recommended)
You can download the model files and configuration files provided by `SVTR`: [download link](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar), take `SVTR-T` as an example, using the following command to evaluate:
```bash linenums="1"
# Download the tar archive containing the model files and configuration files of SVTR-T and extract it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar && tar xf rec_svtr_tiny_none_ctc_en_train.tar
First, the model saved during the SVTR text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar) ), you can use the following command to convert:
**Note:** If you are training the model on your own dataset and have modified the dictionary file, please pay attention to modify the `character_dict_path` in the configuration file to the modified dictionary file.
After the conversion is successful, there are three files in the directory:
```text linenums="1"
/inference/rec_svtr_tiny_stn_en/
├── inference.pdiparams
├── inference.pdiparams.info
└── inference.pdmodel
```
For SVTR text recognition model inference, the following commands can be executed:
After executing the command, the prediction result (recognized text and score) of the image above is printed to the screen, an example is as follows:
```bash linenums="1"
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9999998807907104)
```
### 4.2 C++ Inference
Not supported
### 4.3 Serving
Not supported
### 4.4 More
Not supported
## 5. FAQ
* 1. Speed situation on CPU and GPU
* Since most of the operators used by `SVTR` are matrix multiplication, in the GPU environment, the speed has an advantage, but in the environment where mkldnn is enabled on the CPU, `SVTR` has no advantage over the optimized convolutional network.
* 2. SVTR model convert to ONNX failed
* Ensure `paddle2onnx` and `onnxruntime` versions are up to date, refer to [SVTR model to onnx step-by-step example](https://github.com/PaddlePaddle/PaddleOCR/issues/7821#issuecomment-) for the convert onnx command. 1271214273).
* 3. SVTR model convert to ONNX is successful but the inference result is incorrect
* The possible reason is that the model parameter `out_char_num` is not set correctly, it should be set to W//4, W//8 or W//12, please refer to [Section 3.3.3 of SVTR, a high-precision Chinese scene text recognition model](https://aistudio.baidu.com/aistudio/) projectdetail/5073182?contributionType=1).
* 4. Optimization of long text recognition
* Refer to [Section 3.3 of SVTR, a high-precision Chinese scene text recognition model](https://aistudio.baidu.com/aistudio/projectdetail/5073182?contributionType=1).
* 5. Notes on the reproduction of the paper results
* Dataset using provided by [ABINet](https://github.com/FangShancheng/ABINet).
* By default, 4 cards of GPUs are used for training, the default Batchsize of a single card is 512, and the total Batchsize is 2048, corresponding to a learning rate of 0.0005. When modifying the Batchsize or changing the number of GPU cards, the learning rate should be modified in equal proportion.
* 6. Exploration Directions for further optimization
* Learning rate adjustment: adjusting to twice the default to keep Batchsize unchanged; or reducing Batchsize to 1/2 the default to keep the learning rate unchanged.
* Data augmentation strategies: optionally `RecConAug` and `RecAug`.
* If STN is not used, `Local` of `mixer` can be replaced by `Conv` and `local_mixer` can all be modified to `[5, 5]`.
* Grid search for optimal `embed_dim`, `depth`, `num_heads` configurations.
* Use the `Post-Normalization strategy`, which is to modify the model configuration `prenorm` to `True`.
## Citation
```bibtex
@article{Du2022SVTR,
title = {SVTR: Scene Text Recognition with a Single Visual Model},
author = {Du, Yongkun and Chen, Zhineng and Jia, Caiyan and Yin, Xiaoting and Zheng, Tianlun and Li, Chenxia and Du, Yuning and Jiang, Yu-Gang},