DEVANAGARI CAPTCHA DATASET OF 1 Million Images : A challenge Test

Citation Author(s):
SANJAY
PATE
Kavayitri Bahinabai chaudhari North Maharashtra University, Jalgaon
PROF.DR.RAKESH
RAMTEKE
Kavayitri Bahinabai chaudhari North Maharashtra University, Jalgaon
Submitted by:
SANJAY PATE
Last updated:
Mon, 04/10/2023 - 23:18
DOI:
10.21227/qtmn-m570
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

CAPTCHA (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). Only humans can successfully complete this test; current computer systems cannot. It is utilized in several applications for both human and machine identification. Text-based CAPTCHAs are the most typical type used on websites. Most of the letters in this protected CAPTCHA script are in English, it is challenging for rural residents who only speak their native tongues to pass the test. Devanagari characters have more complex characters than standard English characters and numeral-based CAPTCHAs, which makes machine recognition much more difficult. The majority of official websites in India only offer information in Devanagari. Unfortunately, websites do not use Devanagari CAPTCHAs.As a result, we have created a new text-based CAPTCHA in Devanagari script in this article. A computer/printed font and handwritten Devanagari character(34 each) and number(10 each) , in total 44+44 = 88 character images are used to design CAPTCHA. General CAPTCHA generation principles are used to add noise to the image using digital image processing techniques. Size of each CAPTCHA image is 250 X 90 pixels. 04 (Four) types of Character Sets are used – Printed Alphabet(34), Handwritten Alphabet(34), Printed Digit(10), and Handwritten Digit(10). Generated 11 Classes from these 04 combinations. The string length of the CAPTCHA image considered here is FIVE, SIX, and SEVEN ( 5, 6, 7). For each class – 03 (THREE) subclasses are created depending upon string length. In total there are 11 classes X 3 subclasses = 33 subclasses. So 33 types of CAPTCHA images were generated. For each class, 10,000 CAPTCHA images were created. For 11 Classes X 10,000 images , a Devanagari CAPTCHA Data set of 1,10,000 ( One Million Ten Thousand) images were created using Python. To make the CAPTCHA image less recognized or not easily broken. Passing a test with identifying Devanagari alphabets is difficult. It is beneficial to researchers who are investigating captcha recognition in this area. This dataset is helpful to researcher to design OCR for recognize Devanagari CAPTCHA and break it.

Comments

WWWWWWW

Submitted by jimmy jim on Thu, 09/05/2024 - 10:18