TDC560

Citation Author(s):: Yan Li (University of International Relations)
Submitted by:: YAN LI
Last updated:: Sat, 04/02/2022 - 02:01
DOI:: 10.21227/9jbk-kx89

58 views

Categories:

Artificial Intelligence

ACCESS DATASET CITE

Abstract

TDC560 dataset contains 560 difficult images, one of which are selected from the testing set of CTW1500 and TD500, others are generated by ourselves with text-line annotations. In the selecting process, we sort images with the extreme spatial distances between characters and words. Additionally, to bridge hard text-line detection to real world, we rich existing diverse image sources with our own data, which has two significant merits: (1)we expand abundant images containing Chinese texts which is relatively lack in previous benchmarks such as CTW1500; (2) we collect various types of images such as scene text, design text and some hard stylish text. Some visualization results are demonstrated in the paper.

Instructions:

This dataset is used for arbitrary-shaped texts detection. The images are in the TDC560_image folder, while the corresponding annotations are in TDC560_label_circum folder. The annotations are labeld by label-me tool with clockwise points.