Datasets
Standard Dataset
ClimateMiSt: Climate Change Misinformation and Stance Detection Dataset
- Citation Author(s):
- Submitted by:
- YeonJung Choi
- Last updated:
- Sat, 09/21/2024 - 02:35
- DOI:
- 10.21227/cdaz-jh77
- License:
- Categories:
- Keywords:
Abstract
Climate change has been a worldwide concern for more than 50 years now and climate change misinformation has also been a critical issue as it questions the causes and effects of climate change, hence disturbing climate action. Climate misinformation has been a major obstacle to mitigating climate change and its effects, and it even aggravated the issue and polarized the public. In this paper, we introduce a new climate change misinformation and stance detection dataset namely ClimateMiSt, consisting of both social media data and news article data with manually verified labels. Social media data is collected from Twitter between January 1st, 2022 to September 30th, 2022 and news articles data is collected from 10 different sources. In total, our dataset contains 146,670 tweets and 4,353 news articles. We manually annotate 2,008 tweets to both veracity annotation (e.g., misinformation/non-misinformation) and stance annotation (e.g., favor/against). We provide several exploratory analyses on our dataset and compare the outcomes of annotations for both social media posts and news articles in the ClimateMiSt. Moreover, we implement state-of-the-art baseline models for both misinformation and stance detection on our dataset and discover that the utilization of a knowledge graph based on reliable news articles enhances the misinformation detection performance whereas the vanilla text classification model outperforms on stance detection task. To the best of our knowledge, ClimateMiSt is the first construction of a climate change dataset that consists of both veracity and stance annotations collected from both news articles and social media. Our novel dataset can be used for climate change misinformation and stance detection, and further contribute to relevant research pertaining to climate change.
ClimateMiSt: Climate Change Misinformation and Stance Detection Dataset
There are four files in this dataset: reliable_submission.csv, unreliable_submission.csv, tweet_full_submission.csv, and tweet_annot_submission.csv
The detailed description of each data is illustrated below.
- reliable_submission.csv
- This file contains 546 news article contents from 3 different reliable sources (i.e., factcheck.org, politifact.com, washingtonpost.com). It can be used for a knowledge graph construction or misinformation/stance detection along with unreliable news articles.
- unreliable submission.csv
- This file contains 3,797 news article URLs from 6 unreliable news media. Unreliable news article content can be extracted using these URLs and this dataset can be used for misinformation/stance detection along with reliable news articles.
- tweet_full_submission.csv
- This file contains ids of 146,670 tweets including the annotated tweets. Two annotation types are available: veracity (0 or 1) and stance ('favor' or 'against). The unannotated tweets do not have any labels. This dataset can be used for misinformation/stance detection.
- tweet_annot_submission.csv
- This file contains ids of 2,008 tweets that have both veracity and stance annotations. This dataset can be used for misinformation/stance detection.