Abstract 

In today's world of online communication and digital media, hate speech has become an alarming problem worldwide. With the advancement of the internet, while people enjoy numerous benefits, there's also a dark side where individuals are subjected to horrendous bullying through hate speech. Tragically, some instances even lead to extreme actions like suicide or self-destructive behavior.

Despite significant research efforts in popular languages like English, German, and French, Bengali lags far behind in these developments. Bengali, being a complex language, faces a scarcity of sufficient data for research in this field. While various methods exist to analyze text online, most predominantly cater to languages like English, overlooking Bengali. However, hate speech in Bengali is a serious and prevalent issue, especially on platforms like Facebook and YouTube. Even television shows sometimes feature comments that are offensive and unsuitable for all audiences. The challenge lies in identifying and combating hate speech in Bengali due to the lack of effective tools in this area, underscoring the need for further research.

A significant hurdle has been the scarcity of Bengali hate speech datasets prior to the creation of this one. This binary dataset comprises approximately 140,000 speeches, among which 68,000 are identified as hateful and 71,000 as non-hateful. It stands as one of the largest repositories for Bengali hate speech online. The dataset was compiled by amalgamating various sources and adjusting labels to denote hate speech presence accurately.

 

The availability of such data is instrumental in empowering researchers and computer algorithms to develop more effective methods for identifying and curbing hate speech online. This initiative marks a crucial step towards fostering a safer and more compassionate internet environment for all users.