Datasets
Standard Dataset
A dataset to extract event location and impact information from disaster-news articles
- Citation Author(s):
- Submitted by:
- Sumanta Banerjee
- Last updated:
- Wed, 07/03/2024 - 07:46
- DOI:
- 10.21227/hsdv-2t76
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Multi-label event classification label of each sample-document is done with nine bits. The first bit signifies whether an event is present or absent with 1 or 0 respectively. The remaining eight bits signifies presence or absence of (i) covid, (ii) flood, (iii) storm, (iv) heavy rain, (v) cloudburst, (vi) landslide, (vii) earthquake, (viii) Tsunami with 1 or 0. The location and the impact sentence classification labeling are similar. A sample-document is labeled with 40 bits where the i^th bit is 1 or 0 signifying whether the i^th sentence contains disaster-location / disaster-impact information or not. The documents longer or shorter than 40 (sentences) are truncated or padded.
The csv file has source sentences and their document IDs. The pytorch tensor files (target_event_classes, target_impact_labels, target_location_labels).pt contains the target labels.
Source of all the news articles: https://www.thehindu.com/archive/, https://indianexpress.com/?s=archive, https://assamtribune.com/archive
Comments
For testing camembert and gliner
Good