Datasets
Standard Dataset
Spam SMS in Dravidian Languages
- Citation Author(s):
- Submitted by:
- Ramanujam Elangovan
- Last updated:
- Fri, 06/02/2023 - 01:11
- DOI:
- 10.21227/dcym-pd69
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
The Dravidian Spam SMS dataset has Spam and Ham messages in English, Tamil, Telugu, Kannada, and Malayalam languages. Nearly 7700 messages were collected by sending friends and other contacts a Google form. Language experts (reading and writing skills) were used to label the messages of corresponding languages carefully. The dataset also includes the Tamil verbatim messages written in English. For example, “Nee Nalama”. The Ham messages are mostly normal. Spam messages include business, annoying, and unnecessary messages an anonymous user sends. Detailed information on the dataset is given in the image. The dataset does not have the user's personal or banking information like the other datasets.
The dataset is in excel format and it has two columns the message and its type.
Comments
I want to use this dataset for learning.
Please let me access this dataset, can you mail it on rushil.anil.nair@gmail.com
Hi Good Day, I would like to access this dataset for my study, I am realy appreciate if you can give me access to download it.
Thanks,