Comprehensive Hindi Hostile Post Detection Dataset (CM-HTHPD)

Citation Author(s):
Santosh
Rajak
National Institute of Technology, silchar
Submitted by:
santosh Rajak
Last updated:
Sat, 03/02/2024 - 08:45
DOI:
10.21227/zxtz-k625
Data Format:
License:
5
1 rating - Please login to submit your rating.

Abstract 

The Comprehensive Hindi Hostile Post Detection Dataset (CM-HTHPD) is collection of Twitter posts written in the Hindi language, focusing on various forms of hostile content. The dataset was gathered using the Twitter Developer API and subsequently annotated manually with sentiment labels using the Label Studio platform. The dataset is primarily aimed at facilitating research and analysis in the domain of hostile content detection and sentiment analysis in Hindi-language social media discourse. The size of the dataset is approx 8300.

Instructions: 

Content: The dataset consists of the following columns:

Tweet: Contains the text of the Twitter post.

User: Provides the username associated with each Twitter post, enabling user-based analysis.

Sentiment: Indicates the sentiment category of each post, including Hate speech, Defamation, Offensive language, Abusive content, and Non-Hostile expressions.

Comments

NA

Submitted by Sourav Choudhary on Wed, 03/20/2024 - 22:15

NA

Submitted by Anubhav Hooda on Mon, 08/26/2024 - 08:57

Good quality dataset for sentiment analysis in Hindi.

Submitted by Nurul Choudhury on Wed, 09/25/2024 - 07:38

Dataset Files

    Files have not been uploaded for this dataset