A Benchmark Dataset for Manipuri Meetei-Mayek Handwritten Character Recognition

Name: A Benchmark Dataset for Manipuri Meetei-Mayek Handwritten Character Recognition
Creator: Pangambam Singh
License: https://creativecommons.org/licenses/by/4.0/

Citation Author(s):: Pangambam Singh (Banaras Hindu University)
Submitted by:: Pangambam Singh
Last updated:: Fri, 09/27/2019 - 12:25
DOI:: 10.21227/fwax-yr43
Data Format:: *.mat
Links:: Recognition of Meetei Mayek characters using hybrid feature generated from dist…

A neural network based handwritten Meitei Mayek alphabet optical character reco…

An OCR system for the Meetei Mayek script - IEEE Conference Publication

Handwritten Manipuri Meetei-Mayek Classification Using Convolutional Neural Net…

Meitei language

1253 views

Categories:

Keywords:

Optical character recognition

Handwritten character recognition

Natural Language Processing

Manipuri

Meetei Mayek

ACCESS DATASET CITE

Abstract

A benchmark dataset is always required for any classification or recognition system. To the best of our knowledge, no benchmark dataset exists for handwritten character recognition of Manipuri Meetei-Mayek script in public domain so far. Manipuri, also referred to as Meeteilon or sometimes Meiteilon, is a Sino-Tibetan language and also one of the Eight Scheduled languages of Indian Constitution. It is the official language and lingua franca of the southeastern Himalayan state of Manipur, in northeastern India. This language is also used by a significant number of people as their communicating language over the north-east India, and some parts of Bangladesh and Myanmar. It is the most widely spoken language in Northeast India after Bengali and Assamese languages. In this work, we introduce a handwritten Manipuri Meetei-Mayek character dataset which consists of more than 5000 data samples which were collected from a diverse population group that belongs to different age groups (from 4 years to 60 years), genders, educational backgrounds, occupations, communities from three different districts of Manipur, India (Imphal East District, Thoubal District and Kangpokpi District) during March and April 2019. Each individual was asked to write down all the Manipuri characters on one A4-size paper. The recorded responses are scanned with the help of a scanner and then each character is manually segmented from the scanned images. This dataset consists of segmented scanned images of handwritten Manipuri Meetei-Mayek characters (Mapi Mayek, Lonsum Mayek, Cheitap Mayek, Cheising Mayek, Khutam Mayek) of size 128X128 pixels in .JPG format as well as in .MAT format.

Instructions:

Cite this dataset as: Pangambam Singh, "A Benchmark Dataset for Manipuri Meetei-Mayek Handwritten Character Recognition", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/fwax-yr43.