Skip to main content

Datasets

Standard Dataset

Spam SMS in Hindi Language

Citation Author(s):
Rajkamal Tutu Ponnekanty Y (National Institute of Technology Silchar)
Ashutosh Sahoo (National Institute of Technology Silchar)
Faizal Shanavas Puthiyaveettil (National Institute of Technology Silchar)
Ramanujam Elangovan (National Institute of Technology Silchar)
Abirami A M (Thiagarajar College of Engineering)
Submitted by:
Ramanujam Elangovan
Last updated:
DOI:
10.21227/5y8x-n678
Data Format:
No Ratings Yet

Abstract

The Hindi Spam SMS Dataset comprises 3,894 messages, each labeled as either spam or ham. This dataset was meticulously curated with contributions from students who encountered these messages daily. The messages were collected from their experiences and those shared by friends and peers, ensuring a diverse and realistic representation of SMS communication in Hindi. It offers a representative sample of real-world Hindi text messages for analysis. The dataset primarily contains messages written in Hindi, reflecting its origin's linguistic and cultural context. The ham messages include normal conversations, while spam messages typically consist of unsolicited promotional content, irrelevant information, or annoying messages from anonymous users. Importantly, this dataset has been curated with privacy considerations and does not include sensitive personal or financial information, distinguishing it from other datasets in this domain.

Instructions:

Dataset has two columns : Message and Label

Dataset Files

Files have not been uploaded for this dataset

More from this Author