Datasets
Standard Dataset
DFND : Dravidian_Fake News Data
- Citation Author(s):
- Submitted by:
- Eduri Raja
- Last updated:
- Tue, 06/20/2023 - 07:23
- DOI:
- 10.21227/nj13-t949
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
DFND is a Dravidian fake news dataset for detecting fake news in Dravidian languages, namely Telugu, Kannada, Tamil, and Malayalam. We collected the data from different sources: for real news articles, we scrapped the data from various news websites like Eenadu, Dinamalar, Kannadaprabha, Malayala manorama, etc.; for fake news articles, we scrapped the data from various fact-checking websites like factly, factcrescendo, etc. We collected the data from January 2021 to December 2022. After collecting the data, data preprocessing was performed through our designed script; the data annotation on preprocessed data was performed through corresponding language experts to mentioned Dravidian languages. The DFND dataset is preprocessed. This dataset contains more than 27,000 news articles which consist of 50% fake and 50% real news articles.
The DFND.zip folder contains the whole Dravidian languages dataset. The folder has four files: (1) Telugu, (2) Tamil, (3) Kannada, and (4) Malayalam. Each folder has two files: (1) fake.csv and (2) true.csv.
The Dataset has two columns: text and label.
text: A claim published in the media by a person or an organization.
label: The class for each sample.
Documentation
Attachment | Size |
---|---|
Document.pdf | 48.81 KB |
Comments
None