Comprehensive Hindi Hostile Post Detection Dataset (CM-HTHPD)

Name: Comprehensive Hindi Hostile Post Detection Dataset (CM-HTHPD)
Creator: santosh Rajak
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Machine Learning

Citation Author(s):: Santosh Rajak (National Institute of Technology, silchar)
Submitted by:: santosh Rajak
Last updated:: Sat, 03/02/2024 - 13:45
DOI:: 10.21227/zxtz-k625
Data Format:: *.csv

116 views

Categories:

Machine Learning

Keywords:

Hostile Post

Twitter

Hindi(Devanagari Script)

ACCESS DATASET CITE

Abstract

The Comprehensive Hindi Hostile Post Detection Dataset (CM-HTHPD) is collection of Twitter posts written in the Hindi language, focusing on various forms of hostile content. The dataset was gathered using the Twitter Developer API and subsequently annotated manually with sentiment labels using the Label Studio platform. The dataset is primarily aimed at facilitating research and analysis in the domain of hostile content detection and sentiment analysis in Hindi-language social media discourse. The size of the dataset is approx 8300.

Instructions:

Content: The dataset consists of the following columns:

Tweet: Contains the text of the Twitter post.

User: Provides the username associated with each Twitter post, enabling user-based analysis.

Sentiment: Indicates the sentiment category of each post, including Hate speech, Defamation, Offensive language, Abusive content, and Non-Hostile expressions.

Sourav Choudhary Thu, 03/21/2024 - 02:15 Permalink

Anubhav Hooda Mon, 08/26/2024 - 12:57 Permalink

Good quality dataset for sentiment analysis in Hindi.

Nurul Choudhury Wed, 09/25/2024 - 11:38 Permalink

Please give me access to the dataset as it is required for my research project.I will definitely cite your dataset and follow all the privacy and copyright agreements.

Angana Chakraborty Sat, 01/18/2025 - 04:11 Permalink

I would be grateful if you could provide your email address so that I may share the dataset with you.

santosh Rajak Fri, 02/21/2025 - 06:55 Permalink

please give me access to data set as it is required for my phd work .i wiil surely cite your dataset and also follow all privacy and agrreements .please give me as soon as possible my email id is nehatyagiakg@gmail.com

neha tyagi Mon, 03/17/2025 - 03:04 Permalink