A dataset to extract event location and impact information from disaster-news articles

Citation Author(s):
Sumanta
Banerjee
NIT Silchar, India
Submitted by:
Sumanta Banerjee
Last updated:
Wed, 07/03/2024 - 07:46
DOI:
10.21227/hsdv-2t76
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Multi-label event classification label of each sample-document is done with nine bits. The first bit signifies whether an event is present or absent with 1 or 0 respectively. The remaining eight bits signifies presence or absence of (i) covid, (ii) flood, (iii) storm, (iv) heavy rain, (v) cloudburst, (vi) landslide, (vii) earthquake, (viii) Tsunami with 1 or 0. The location and the impact sentence classification labeling are similar. A sample-document is labeled with 40 bits where the i^th bit is 1 or 0 signifying whether the i^th sentence contains disaster-location / disaster-impact information or not. The documents longer or shorter than 40 (sentences) are truncated or padded.

Instructions: 

The csv file has source sentences and their document IDs. The pytorch tensor files (target_event_classes, target_impact_labels, target_location_labels).pt contains the target labels.

Source of all the news articles: https://www.thehindu.com/archive/, https://indianexpress.com/?s=archive, https://assamtribune.com/archive