Dataset for "Integrating Deep Learning Approaches for Identifying News Reprint Relation"

Citation Author(s):
Yin
Luo
Fangfang
Wang
Jun
Chen
Lei
Wang
Daniel Dajun
Zeng
Submitted by:
Fangfang Wang
Last updated:
Tue, 05/17/2022 - 22:17
DOI:
10.21227/vwam-dn13
Research Article Link:
License:
219 Views
Categories:
0
0 ratings - Please login to submit your rating.

Abstract 

 # of original news:30;
# of candidate news:25899;
# of reprinted news (no source label):4234 (537)

 

Instructions: 

This dataset was constructed for news reprint relation identification. It crawled from more than 3000 new portals on a daily basis from January 1st, 2018 to June 30, 2018. It consists of 30 popular original news items in the field of finance, sports and technology and 25899 candidate news items which were chosen by keyword matching. The reprint relations between original news and its candidate news was manually labeled . If the candidate news reprints the original news, the reprint relation will be labelled as 1, otherwise the reprint relation will be labelled as 0.