OCR | IEEE DataPort

SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets

We present the SinOCR and SinFUND datasets, two comprehensive resources designed to advance Optical Character Recognition (OCR) and form understanding for the Sinhala language. SinOCR, the first publicly available and the most extensive dataset for Sinhala OCR to date, includes 100,000 images featuring printed text in 200 different Sinhala fonts and 1,135 images of handwritten text, capturing a wide spectrum of writing styles.

Categories:

LASCID: Latin and Arabic Scene Character Image Dataset

In international contexts, natural scenes may include text in multiple languages. Especially, Latin and Arabic scene character image dataset is essential for training models to accurately detect and recognize text regions within real-world images. This is crucial for applications such as text translation, image search, content analysis, and autonomous vehicles that need to interpret text in different languages.

Categories:

CharImageDB: Character Image Dataset

In contemporary digital environments, the development of a high-resolution synthetic Latin character dataset holds paramount significance across various real-world applications within the domains of computer vision and artificial intelligence. This relevance extends from tasks such as image restoration to the implementation of sophisticated recognition systems.

Categories: