Update pubtab dataset script reference (#15799)

* Update pubtab dataset script reference

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* Update docs/datasets/table_datasets.en.md

Co-authored-by: Wang Xin <xinwang614@gmail.com>

* Update table_datasets.en.md

* Update table_datasets.md

---------

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
This commit is contained in:
Emmanuel Ferdman 2025-06-21 17:15:33 +03:00 committed by GitHub
parent 428f1cefd9
commit d54dfa4c85
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 6 additions and 6 deletions

View File

@ -11,9 +11,9 @@ Here are the commonly used table recognition datasets, which are being updated c
| dataset | Image download link | PPOCR format annotation download link |
|---|---|---|
| PubTabNet |<https://github.com/ibm-aur-nlp/PubTabNet>| jsonl format, which can be loaded directly with [pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py) |
| TAL Table Recognition Competition Dataset |<https://ai.100tal.com/dataset>| jsonl format, which can be loaded directly with [pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py) |
| WTW Chinese scene table dataset |<https://github.com/wangwen-whu/WTW-Dataset>| Conversion is required to load with [pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py)|
| PubTabNet |<https://github.com/ibm-aur-nlp/PubTabNet>| jsonl format, which can be loaded directly with [pubtab_dataset.py](https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/data/pubtab_dataset.py) |
| TAL Table Recognition Competition Dataset |<https://ai.100tal.com/dataset>| jsonl format, which can be loaded directly with [pubtab_dataset.py](https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/data/pubtab_dataset.py) |
| WTW Chinese scene table dataset |<https://github.com/wangwen-whu/WTW-Dataset>| Conversion is required to load with [pubtab_dataset.py](https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/data/pubtab_dataset.py)|
## 1. PubTabNet

View File

@ -12,9 +12,9 @@ typora-copy-images-to: images
| 数据集名称 |图片下载地址| PPOCR标注下载地址 |
|---|---|---|
| PubTabNet |<https://github.com/ibm-aur-nlp/PubTabNet>| jsonl格式可直接用[pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py)加载 |
| 好未来表格识别竞赛数据集 |<https://ai.100tal.com/dataset>| jsonl格式可直接用[pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py)加载 |
| WTW中文场景表格数据集 |<https://github.com/wangwen-whu/WTW-Dataset>| 需要进行转换后才能用[pubtab_dataset.py](../../../ppocr/data/pubtab_dataset.py)加载 |
| PubTabNet |<https://github.com/ibm-aur-nlp/PubTabNet>| jsonl格式可直接用[pubtab_dataset.py](https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/data/pubtab_dataset.py)加载 |
| 好未来表格识别竞赛数据集 |<https://ai.100tal.com/dataset>| jsonl格式可直接用[pubtab_dataset.py](https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/data/pubtab_dataset.py)加载 |
| WTW中文场景表格数据集 |<https://github.com/wangwen-whu/WTW-Dataset>| 需要进行转换后才能用[pubtab_dataset.py](https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/data/pubtab_dataset.py)加载 |
## 1. PubTabNet数据集