DIDA is a new image-based historical handwritten digit dataset and collected from the Swedish historical handwritten document images between the year 1800 and 1940. It is the largest historical handwritten digit dataset which is introduced to the Optical Character Recognition (OCR) community to help the researchers to test their optical handwritten character recognition methods. To generate DIDA, 250,000 single digits and 200,000 multi-digits are cropped from 75,000 different document images. 


The AOLAH databases are contributions from Aswan faculty of engineering to help researchers in the field of online handwriting recognition to build a powerful system to recognize Arabic handwritten script. AOLAH stands for Aswan On-Line Arabic Handwritten where “Aswan” is the small beautiful city located at the south of Egypt, “On-Line” means that the databases are collected the same time as they are written, “Arabic” cause these databases are just collected for Arabic characters, and “Handwritten” written by the natural human hand.


* There are two databases; first database is for Arabic characters, it consists of 2,520 sample files written by 90 writers using simulation of a stylus pen and a touch screen. The second database is for Arabic characters’ strokes, it consists of 1,530 sample files for 17 strokes. The second database is extracted from the previous accepted database by extracting strokes from characters.
* Writers are volunteers from Aswan faculty of engineering with ages from 18 to 20 years old.
* Natural writings with unrestricted writing styles.
* Each volunteer writes the 28 characters of Arabic script using the GUI.
* It can be used for Arabic online characters recognition.
* The developed tools for collecting the data is code acts as a simulation of a stylus pen and a touch screen, pre-processing data samples of characters are also available for researchers.
* The database is available free of charge (for academic and research purposes) to the researchers.
* The databases available here are the training databases.


This dataset comes up as a benchmark dataset for machines to automatically recognizing the handwritten assamese digists (numerals) by extracting useful features by analyzing the structure. The Assamese language comprises of a total of 10 digits from 0 to 9. We have collected a total of 516 handwritten digits from 52 native assamese people irrespective of their age (12-86 years), gender, educational background etc. The digits are captured in .jpeg format using a paint mobile application developed by us which automatically saves the images in the internal storage of the mobile.


The dataset consists of 60285 character image files which has been randomly divided into 54239 (90%) images as training set 6046 (10%) images as test set. The collection of data samples was carried out in two phases. The first phase consists of distributing a tabular form and asking people to write the characters five times each. Filled-in forms were collected from around 200 different individuals in the age group 12-23 years. The second phase was the collection of handwritten sheets such as answer sheets and classroom notes from students in the same age group.


A benchmark dataset is always required for any classification or recognition system. To the best of our knowledge, no benchmark dataset exists for handwritten character recognition of Manipuri Meetei-Mayek script in public domain so far. Manipuri, also referred to as Meeteilon or sometimes Meiteilon, is a Sino-Tibetan language and also one of the Eight Scheduled languages of Indian Constitution. It is the official language and lingua franca of the southeastern Himalayan state of Manipur, in northeastern India.