Spam SMS in Hindi Language

Citation Author(s):
Rajkamal Tutu
Ponnekanty Y
National Institute of Technology Silchar
Ashutosh
Sahoo
National Institute of Technology Silchar
Faizal Shanavas
Puthiyaveettil
National Institute of Technology Silchar
Ramanujam
Elangovan
National Institute of Technology Silchar
Abirami
A M
Thiagarajar College of Engineering
Submitted by:
Ramanujam Elangovan
Last updated:
Wed, 12/18/2024 - 08:51
DOI:
10.21227/5y8x-n678
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The Hindi Spam SMS Dataset comprises 3,894 messages, each labeled as either spam or ham. This dataset was meticulously curated with contributions from students who encountered these messages daily. The messages were collected from their experiences and those shared by friends and peers, ensuring a diverse and realistic representation of SMS communication in Hindi. It offers a representative sample of real-world Hindi text messages for analysis. The dataset primarily contains messages written in Hindi, reflecting its origin's linguistic and cultural context. The ham messages include normal conversations, while spam messages typically consist of unsolicited promotional content, irrelevant information, or annoying messages from anonymous users. Importantly, this dataset has been curated with privacy considerations and does not include sensitive personal or financial information, distinguishing it from other datasets in this domain.

Instructions: 

Dataset has two columns : Message and Label

Dataset Files

    Files have not been uploaded for this dataset