Skip to main content

Datasets

Standard Dataset

Spam SMS in Dravidian Languages

Citation Author(s):
Ramanujam Elangovan (National Institute of Technology Silchar, Assam, India)
Abirami A M (Thiagarajar College of Engineering, Madurai, Tamil Nadu, India)
Submitted by:
Ramanujam Elangovan
Last updated:
DOI:
10.21227/dcym-pd69
Data Format:
Research Article Link:
No Ratings Yet

Abstract

The Dravidian Spam SMS dataset has Spam and Ham messages in English, Tamil, Telugu, Kannada, and Malayalam languages. Nearly 7700 messages were collected by sending friends and other contacts a Google form. Language experts (reading and writing skills) were used to label the messages of corresponding languages carefully. The dataset also includes the Tamil verbatim messages written in English. For example, “Nee Nalama”. The Ham messages are mostly normal. Spam messages include business, annoying, and unnecessary messages an anonymous user sends. Detailed information on the dataset is given in the image. The dataset does not have the user's personal or banking information like the other datasets. 

Instructions:

The dataset is in excel format and it has two columns the message and its type. 

Please let me access this dataset, can you mail it on rushil.anil.nair@gmail.com
Rushil Nair Sat, 11/11/2023 - 06:10 Permalink

Hi Good Day, I would like to access this dataset for my study, I am realy appreciate if you can give me access to download it.

Thanks,

Abd Razak Hosen Fri, 05/17/2024 - 01:03 Permalink

More from this Author