PaddleOCR/doc/doc_ch/dataset/ocr_datasets.md
2022-04-26 22:30:22 +08:00

23 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## OCR数据集
- [1. 文本检测](#1)
- [2. 文本识别](#2)
这里整理了OCR中常用的公开数据集持续更新中欢迎各位小伙伴贡献数据集
<a name="1"></a>
#### 1. 文本检测
| 数据集名称 |图片下载地址| PPOCR标注下载地址 |
|---|---|---|
| ICDAR 2015 |https://rrc.cvc.uab.es/?ch=4&com=downloads| [train](https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt) / [test](https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt) |
| ctw1500 |https://paddleocr.bj.bcebos.com/dataset/ctw1500.zip| 图片下载地址中已包含 |
| total text |https://paddleocr.bj.bcebos.com/dataset/total_text.tar| 图片下载地址中已包含 |
<a name="2"></a>
#### 2. 文本识别
| 数据集名称 | 图片下载地址 | PPOCR标注下载地址 |
|---|---|---------------------------------------------------------------------|
| en benchmark(MJ, SJ, IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE.) | [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) | LMDB格式可直接用[lmdb_dataset.py](../../../ppocr/data/lmdb_dataset.py)加载 |