Hateful Forms

Citation Author(s):
Submitted by:
Seema Nagar
Last updated:
Wed, 10/18/2023 - 02:50
0 ratings - Please login to submit your rating.


Using this data, we conduct an extensive investigation into the phenomenon of homophily in the generation of hate speech on Twitter, shedding light on an essential aspect of understanding online hate speech dynamics. We introduce innovative methods to detect multiple forms of hate speech, including manifestations of racism and sexism. Furthermore, we propose and validate novel measures for quantifying familiarity and similarity on Twitter, providing a comprehensive framework for understanding the interactions among users. Leveraging this empirical data from Twitter, our study demonstrates the presence of homophily and explores its variations across different categories of hate speech, encompassing themes related to gender, race, ethnicity, politics, and nationalism. The metrics and insights presented in this research offer valuable contributions to the development of targeted strategies aimed at reducing the prevalence of hate speech in online environments.


Description of each file in the dataset


Section 1: Data files used where classification of tweets is at binary level, hateful or not

  1. The file tweets_30K.csv contains manually annotated tweets at a binary level
  2. The file graph.edgelist contains the retweet graph among the users

○      Contains the edges from the source to a target node, where the edge from A to B, means B (user) has retweeted A’s tweet.


Section 2:

  1. The directory called Annotated Tweets for each hateful form contain manually annotated Tweets for each hateful form

○      Consists of 5 files, one for each hateful form

○      Each file shows the manually labelled tweets for each hateful form

  1. The file original_seeds.csv contains manually identified hashtags for each hateful form
  2. The file tweets_hatefulforms.csv contains tweets used for detecting various hateful forms, tweets having hashtags


I would like the research community to benefit from the manual annotation we have performed, and advance research to detect hateful forms and hate speech.

Submitted by Seema Nagar on Wed, 10/18/2023 - 02:53