OCR

We present the SinOCR and SinFUND datasets, two comprehensive resources designed to advance Optical Character Recognition (OCR) and form understanding for the Sinhala language. SinOCR, the first publicly available and the most extensive dataset for Sinhala OCR to date, includes 100,000 images featuring printed text in 200 different Sinhala fonts and 1,135 images of handwritten text, capturing a wide spectrum of writing styles.

Categories:
409 Views

In international contexts, natural scenes may include text in multiple languages. Especially, Latin and Arabic scene character image dataset is essential for training models to accurately detect and recognize text regions within real-world images. This is crucial for applications such as text translation, image search, content analysis, and autonomous vehicles that need to interpret text in different languages.

Categories:
343 Views

In contemporary digital environments, the development of a high-resolution synthetic Latin character dataset holds paramount significance across various real-world applications within the domains of  computer vision and artificial intelligence. This relevance extends from tasks such as image restoration to the implementation of sophisticated recognition systems.

Categories:
427 Views