Dataset for "Integrating Deep Learning Approaches for Identifying News Reprint Relation"
# of original news:30；# of candidate news:25899；# of reprinted news (no source label):4234 (537)
This dataset was constructed for news reprint relation identification. It crawled from more than 3000 new portals on a daily basis from January 1st, 2018 to June 30, 2018. It consists of 30 popular original news items in the field of finance, sports and technology and 25899 candidate news items which were chosen by keyword matching. The reprint relations between original news and its candidate news was manually labeled . If the candidate news reprints the original news, the reprint relation will be labelled as 1, otherwise the reprint relation will be labelled as 0.