COVIFN is a CoVID-19-specific dataset that consists of fact-checked fake news scraped from Poynter and true news from news publishers' verified portals. The dataset was pre-processed, the removal of special characters and non-vital information is performed.
The file contains columns such as:
Date: publish date of news article
country: country the article is about
text: the news article content
label: fake or real news label
URL: the fact-checked site
source: original news source site
The CoVID19-FNIR dataset contains news stories related to CoVID-19 pandemic fact-checked by expert fact-checkers. CoVID19-FNIR is a CoVID-19-specific dataset consisting of fact-checked fake news scraped from Poynter and true news from the verified Twitter handles of news publishers. The data samples were collected from India, The United States of America, and European regions and consist of online posts from social media platforms between February 2020 to June 2020. The dataset went through prepossessing steps that include removing special characters and non-vital information.
The CoVID19-FNIR.zip folder contains the whole dataset. The folder has two files; (1) fakeNews.csv, and (2) trueNews.csv. The data in .csv files contain the news article and the corresponding fake rating collected from the USA, India, and Europe regions. A more detailed description of the data is given in the CoVID19-FNIR_Documentation.pdf file.
Acknowledgment: This data collection and documentation was supported in part by the NSF: CO-WY AMP program, the Social Justice Research Center, and McNair Scholars Program, University of Wyoming, USA.
Please cite: Julio A. Saenz, Sindhu Reddy Kalathur Gopal, Diksha Shukla, June 12, 2021, "Covid-19 Fake News Infodemic Research Dataset (CoVID19-FNIR Dataset)", IEEE Dataport, doi: https://dx.doi.org/10.21227/b5bt-5244.
This database is provided for the Fake News Detection task. In addition to being used in other tasks of detecting fake news, it can be specifically used to detect fake news using the Natural Language Inference (NLI).
This dataset is designed and stored to be compatible for use with both the LIAR test dataset and FakeNewsNet (PolitiFact) datasets as evaluation data. There are two folders, each containing three CSV files.
1- 15212 training samples, 1058 validation samples, and 1054 test samples are the same as (FakeNewsNet PolitiFact) data. The classes of this data are ”real” and ”fake”.
2. 15052 training samples, 1265 validation samples, and 1266 test samples, which is the same as the LIAR test data. The classes in this data are ”pants-fire”, ”false”, and ”barely true”, ”half-true”, ”mostly-true” and ”true”.
The DataSet columns:
id: matches the id in the PolitiFact website API (unique for each sample)
date: The time each article was published in the PolitiFact website
speaker: The person or organization to whom the Statement relates
statement: A claim published in the media by a person or an organization and has been investigated in the PolitiFact article.
sources: The sources used to analyze each Statement
paragraph_based_content: content stored as paragraphed in a list
fullText_based_content: Full text using pasted paragraphs
label: The class for each sample
This dataset provides a labeled fake news data, which can be used to have a deep study of fake news.