Text Classification | IEEE DataPort

MarathiSarc

Sarcasm detection involves predicting whether a given text is sarcastic, a challenging task in sentiment analysis. While significant research has been conducted for languages like English, Czech, and Italian, limited work exists for Indian languages such as Hindi, Tamil, and Bengali. Marathi, being the third most spoken language in India, has seen little progress in sarcasm detection, mainly due to the lack of suitable datasets.

Categories:

Machine Learning

COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations

Please cite the following paper when using this dataset:

Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).

Abstract:

Categories:

Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis

To download this dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13896353

Please cite the following paper when using this dataset:

Categories:

Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis

To download the dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13738598

Please cite the following paper when using this dataset:

N. Thakur, “Mpox narrative on Instagram: A labeled multilingual dataset of Instagram posts on mpox for sentiment, hate speech, and anxiety analysis,” arXiv [cs.LG], 2024, URL: https://arxiv.org/abs/2409.05292

Abstract

Categories:

Bangla SMS Dataset for Smishing Detection

Categories:

Security

Chinese electric power audit text dataset

Categories:

Machine Learning