We present the SinOCR and SinFUND datasets, two comprehensive resources designed to advance Optical Character Recognition (OCR) and form understanding for the Sinhala language. SinOCR, the first publicly available and the most extensive dataset for Sinhala OCR to date, includes 100,000 images featuring printed text in 200 different Sinhala fonts and 1,135 images of handwritten text, capturing a wide spectrum of writing styles.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Kavishka Gunathilaka, Danusha Hewagama, Supul Pushpakumara, Thanuja Ambegoda, "SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets", IEEE Dataport, 2024. [Online]. Available: http://dx.doi.org/10.21227/hhez-0r18. Accessed: Dec. 02, 2024.
@data{hhez-0r18-24,
doi = {10.21227/hhez-0r18},
url = {http://dx.doi.org/10.21227/hhez-0r18},
author = {Kavishka Gunathilaka; Danusha Hewagama; Supul Pushpakumara; Thanuja Ambegoda },
publisher = {IEEE Dataport},
title = {SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets},
year = {2024} }
TY - DATA
T1 - SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets
AU - Kavishka Gunathilaka; Danusha Hewagama; Supul Pushpakumara; Thanuja Ambegoda
PY - 2024
PB - IEEE Dataport
UR - 10.21227/hhez-0r18
ER -
Kavishka Gunathilaka, Danusha Hewagama, Supul Pushpakumara, Thanuja Ambegoda. (2024). SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets. IEEE Dataport. http://dx.doi.org/10.21227/hhez-0r18
Kavishka Gunathilaka, Danusha Hewagama, Supul Pushpakumara, Thanuja Ambegoda, 2024. SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets. Available at: http://dx.doi.org/10.21227/hhez-0r18.
Kavishka Gunathilaka, Danusha Hewagama, Supul Pushpakumara, Thanuja Ambegoda. (2024). "SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets." Web.
1. Kavishka Gunathilaka, Danusha Hewagama, Supul Pushpakumara, Thanuja Ambegoda. SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets [Internet]. IEEE Dataport; 2024. Available from : http://dx.doi.org/10.21227/hhez-0r18
Kavishka Gunathilaka, Danusha Hewagama, Supul Pushpakumara, Thanuja Ambegoda. "SinOCR and SinFUND - Sinhala OCR and Form Understanding Datasets." doi: 10.21227/hhez-0r18