Spam SMS in Dravidian Languages
The Dravidian Spam SMS dataset has Spam and Ham messages in English, Tamil, Telugu, Kannada, and Malayalam languages. Nearly 7700 messages were collected by sending friends and other contacts a Google form. Language experts (reading and writing skills) were used to label the messages of corresponding languages carefully. The dataset also includes the Tamil verbatim messages written in English. For example, “Nee Nalama”. The Ham messages are mostly normal. Spam messages include business, annoying, and unnecessary messages an anonymous user sends. Detailed information on the dataset is given in the image. The dataset does not have the user's personal or banking information like the other datasets.
The dataset is in excel format and it has two columns the message and its type.