Dataset for "Integrating Deep Learning Approaches for Identifying News Reprint Relation"

Citation Author(s):: Yin Luo

Fangfang Wang

Jun Chen

Lei Wang

Daniel Dajun Zeng
Submitted by:: Fangfang Wang
Last updated:: Wed, 05/18/2022 - 02:17
DOI:: 10.21227/vwam-dn13
Research Article Link:: Integrating Deep Learning Approaches for Identifying News Reprint Relation

222 views

Categories:

Computational Intelligence

ACCESS DATASET CITE

Abstract

# of original news:30；
# of candidate news:25899；
# of reprinted news (no source label):4234 (537)

Instructions:

This dataset was constructed for news reprint relation identification. It crawled from more than 3000 new portals on a daily basis from January 1st, 2018 to June 30, 2018. It consists of 30 popular original news items in the field of finance, sports and technology and 25899 candidate news items which were chosen by keyword matching. The reprint relations between original news and its candidate news was manually labeled . If the candidate news reprints the original news, the reprint relation will be labelled as 1, otherwise the reprint relation will be labelled as 0.