Datasets
Standard Dataset
Spam SMS in Hindi Language
- Citation Author(s):
- Submitted by:
- Ramanujam Elangovan
- Last updated:
- Wed, 12/18/2024 - 08:51
- DOI:
- 10.21227/5y8x-n678
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The Hindi Spam SMS Dataset comprises 3,894 messages, each labeled as either spam or ham. This dataset was meticulously curated with contributions from students who encountered these messages daily. The messages were collected from their experiences and those shared by friends and peers, ensuring a diverse and realistic representation of SMS communication in Hindi. It offers a representative sample of real-world Hindi text messages for analysis. The dataset primarily contains messages written in Hindi, reflecting its origin's linguistic and cultural context. The ham messages include normal conversations, while spam messages typically consist of unsolicited promotional content, irrelevant information, or annoying messages from anonymous users. Importantly, this dataset has been curated with privacy considerations and does not include sensitive personal or financial information, distinguishing it from other datasets in this domain.
Dataset has two columns : Message and Label