SignboardText

Citation Author(s):
Thanh Duc
Ngo
University of Information Technology, Vietnam National University Ho Chi Minh City, Vietnam
Tien
Do
University of Information Technology, Vietnam National University Ho Chi Minh City, Vietnam
Thua
Nguyen
University of Information Technology, Vietnam National University Ho Chi Minh City, Vietnam
Thuyen
Tran
University of Information Technology, Vietnam National University Ho Chi Minh City, Vietnam
Duy-Dinh
Le
University of Information Technology, Vietnam National University Ho Chi Minh City, Vietnam
Submitted by:
Thanh Ngo
Last updated:
Thu, 05/16/2024 - 11:30
DOI:
10.21227/krzq-sc45
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Scene text detection and recognition have attracted much attention in recent years because of their potential applications. Detecting and recognizing texts in images may suffer from scene complexity and text variations. Some of these problematic cases are included in popular benchmark datasets, but only to a limited extent. In this work, we investigate the problem of scene text detection and recognition in a domain with extreme challenges. We focus on in-the-wild signboard images in which text commonly appears in different fonts, sizes, artistic styles, or languages with cluttered backgrounds. We contribute an in-the-wild signboard dataset with 79K text instances on both line-level and word-level across 2,104 scene images.

Instructions: 

This dataset contains images and annotations for scene text detection and recognition. It is made up of two parts: (1) 1,175 images manually labeled with a total of 59,588 text instances at the line and word levels; and (2) 929 signboard images collected from the VinText, Total-Text, and ICDAR15 datasets. Each text instance in the first part of our dataset has a quadrilateral bounding box and a ground truth character sequence associated with it. In the second part, images are selected if they contain signboards. This portion of the dataset comprises 20,261 text instances at word levels. This brings the total text instances in our final dataset up to 79,814. Following the ICDAR15 standard, we annotated each image with all of the text instances, polygons, and content that were present. Manual annotations were done on each and every image.