DFND : Dravidian_Fake News Data

Citation Author(s):: Eduri Raja (National Institute of Technology Silchar, India)

Badal Soni (National Institute of Technology Silchar, India)

Samir Kumar Borgohain (National Institute of Technology Silchar, India)
Submitted by:: Eduri Raja
Last updated:: Tue, 06/20/2023 - 11:23
DOI:: 10.21227/nj13-t949
Data Format:: .csv

1973 views

Categories:

Keywords:

NLP

Fake news

Dravidian Languages

Low-resource languages

ACCESS DATASET CITE

Abstract

DFND is a Dravidian fake news dataset for detecting fake news in Dravidian languages, namely Telugu, Kannada, Tamil, and Malayalam. We collected the data from different sources: for real news articles, we scrapped the data from various news websites like Eenadu, Dinamalar, Kannadaprabha, Malayala manorama, etc.; for fake news articles, we scrapped the data from various fact-checking websites like factly, factcrescendo, etc. We collected the data from January 2021 to December 2022. After collecting the data, data preprocessing was performed through our designed script; the data annotation on preprocessed data was performed through corresponding language experts to mentioned Dravidian languages. The DFND dataset is preprocessed. This dataset contains more than 27,000 news articles which consist of 50% fake and 50% real news articles.

Instructions:

The DFND.zip folder contains the whole Dravidian languages dataset. The folder has four files: (1) Telugu, (2) Tamil, (3) Kannada, and (4) Malayalam. Each folder has two files: (1) fake.csv and (2) true.csv.

The Dataset has two columns: text and label.

text: A claim published in the media by a person or an organization.

label: The class for each sample.

None

Rashmi Rachh Tue, 08/06/2024 - 04:49 Permalink