Datasets
Standard Dataset
Offline Handwritten Text Images for Gender Prediction
- Citation Author(s):
- Submitted by:
- Aryan Verma
- Last updated:
- Mon, 02/27/2023 - 11:09
- DOI:
- 10.21227/gart-a309
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
One of the most consequential creations in the human evolution phase is handwriting. Due to writing, today we are conveying our reflections, making business pacts, rendering an understandable world and making hitherto tasks austerer. Determining gender using offline handwriting is an applied research problem in forensics, psychology, and security applications, and with technological evolution, the need is growing. The general problem of gender detection from handwriting poses many difficulties resulting from interpersonal and intrapersonal differences. A major one is a need for more data which we aim to curb with this dataset. This dataset includes handwritten text samples in Hindi and English from 170 people, of which 137 are men and 33 are women. Each sample contains seven handwritten text images, including a number, quotes, college names, and a person's name in both languages. These images contain various text forms by the same user, which is necessary for robust and effective gender detection from offline handwritten texts. This makes an aggregate of 1190 hand-collected images. This dataset aims to develop an automated gender classification system, which can help create a real-world impact.
The dataset contains images from handwritten text samples of 170 participants. These handwritten text samples are in Hindi and English languages. These participants submitted the data through a google form in which these people were asked to submit seven different texts.
The main file of the dataset is named dataset.zip. This file contains two subfolders which are male and female. The male directory contains samples from 137 men, and the female directory contains samples from 33 women. Each sample contains seven handwritten text images, including a number image, a Hindi quote, an English quote, a college name in Hindi, and college name in English, a person's name in Hindi, and a Person's name in English. These images contain various text forms by the same user, which is necessary for robust and effective gender detection from offline handwritten texts. The total number of images is 1190, and all the images are in .jpg format.
Comments
want this dataset to try out AI/ML algos.
For ML algos