mirror of
				https://github.com/PaddlePaddle/PaddleOCR.git
				synced 2025-11-04 03:39:22 +00:00 
			
		
		
		
	fix dead link
This commit is contained in:
		
							parent
							
								
									7fec9ed62c
								
							
						
					
					
						commit
						fd1bddf5a2
					
				@ -19,7 +19,34 @@ The table ocr flow chart is as follows
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 2.1 Train
 | 
			
		||||
TBD
 | 
			
		||||
 | 
			
		||||
In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents
 | 
			
		||||
 | 
			
		||||
#### data preparation  
 | 
			
		||||
The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。
 | 
			
		||||
 | 
			
		||||
#### Start training  
 | 
			
		||||
*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false*
 | 
			
		||||
```shell
 | 
			
		||||
# single GPU training
 | 
			
		||||
python3 tools/train.py -c configs/table/table_mv3.yml
 | 
			
		||||
# multi-GPU training
 | 
			
		||||
# Set the GPU ID used by the '--gpus' parameter.
 | 
			
		||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file.
 | 
			
		||||
For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md).
 | 
			
		||||
 | 
			
		||||
#### load trained model and continue training
 | 
			
		||||
 | 
			
		||||
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
 | 
			
		||||
 | 
			
		||||
```shell
 | 
			
		||||
python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
 | 
			
		||||
 | 
			
		||||
### 2.2 Eval
 | 
			
		||||
First cd to the PaddleOCR/ppstructure directory
 | 
			
		||||
 | 
			
		||||
@ -19,6 +19,8 @@
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 2.1 训练
 | 
			
		||||
在这一章节中,我们仅介绍表格结构模型的训练,[文字检测](../../doc/doc_ch/detection.md)和[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。
 | 
			
		||||
 | 
			
		||||
#### 数据准备  
 | 
			
		||||
训练数据使用公开数据集[PubTabNet](https://arxiv.org/abs/1911.10683),可以从[官网](https://github.com/ibm-aur-nlp/PubTabNet)下载。PubTabNet数据集包含约50万张表格数据的图像,以及图像对应的html格式的注释。
 | 
			
		||||
 | 
			
		||||
@ -31,7 +33,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml
 | 
			
		||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
上述指令中,通过-c 选择训练使用configs/table/table_mv3.yml配置文件。有关配置文件的详细解释,请参考[链接](./config.md)。
 | 
			
		||||
上述指令中,通过-c 选择训练使用configs/table/table_mv3.yml配置文件。有关配置文件的详细解释,请参考[链接](../../doc/doc_ch/config.md)。
 | 
			
		||||
 | 
			
		||||
#### 断点训练
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user