Event Geoparsing Indonesian News Dataset

- Citation Author(s):
-
Agung Dewandaru
- Submitted by:
- Agung Dewandaru
- Last updated:
- DOI:
- 10.21227/s5rh-rn19
- Data Format:
- Categories:
- Keywords:
Abstract
This dataset contains four types of geospatial events coverage in Indonesian news online portal: flood, traffic jam, earthquake, and fire. The corpus itself was composed of 926 manually annotated, disambiguated, and event extracted sentences that was filtered from 83 of 645,679 documents of our earlier news corpus based on four major geospatial events: flood, earthquake, fire, and accidents.
Source: detik.com, kompas.com, cnnindonesia.com
Instructions:
Download the dataset from Download Tab.
The main event extraction corpus is event-geoparsing-corpus.txt.
The disambiguation are listed on toponyms-disambiguated.txt.
event-geoparsing-corpus.txt Notes:
Each document inside Corpus is separated by ===
Each sentence within document is separated by empty line.
Regular line has four elements (word/POS Tag/Event/Argument):
e.g:
- Kerabat/NNP/O/O
- RSCM/NN/B-ORG/Hospital-Arg
For LOC entities, there are additional two fields: (latitude, longitude) / <administrative_level>
Jakarta/NNP/B-PLOC/Published-Arg/(-6.197602429787846, 106.83139222722116)/1
toponyms-disambiguated.txt :
Contains all toponyms (LOC) entities from corpus.txt, started with * (star symbol). Each star are having potential candidate referents.
The correct disambiguation is started with --> otherwise it is started with --
Every document is also separated by ===
full list of argument roles for each event subtypes:
Argument Roles | Description |
Subtype: FIRE-EVENT | |
1. Reporter-Arg 2. Published-Arg 3. DeathVictim-Arg 4. WoundVictim-Arg 5. Place-Arg 6. Facility-Arg 7. Officer-Arg 8. Time-Arg 9. Street-Arg 10. Official-Arg 11. Hospital-Arg 12. HouseBurnt-Arg 13. AffectedRT-Arg 14. DispatchedTrucks-Arg 15. AffectedFamily-Arg 16. MonetaryLoss-Arg | 1. News outlet (ARG) 2. City of publication (LOC) 3. How many people killed (ARG) 4. How many people wounded (ARG) 5. Geopolitical Entities of the place (LOC) 6. Building related (ORG) 7. Officer related (ORG) 8. Time of the Event (ARG) 9. Street of the Place (ARG) 10. Official related or official statement (ORG) 11. Hospital related (ORG) 12. House burnt number (ARG) 13. Number of RTs affected (ARG) 14. Number of Firetrucks dispatched (ARG) 15. Number of families affected (ARG) 16. Loss of money (ARG) |
Subtype: ACCIDENT-EVENT | |
1. Reporter-Arg 2. Published-Arg 3. Point-Arg 4. Vehicle-Arg 5. Plate-Arg 6. Place-Arg 7. Hospital-Arg 8. From-Arg 9. To-Arg 10. Time-Arg 11. AffectedVehicle-Arg 12. MonetaryLoss-Arg | 1. News company (ARG) 2. City of Publication (LOC) 3. Location offset of the Accident (ARG) 4. Type of Vehicle (ARG) 5. License Plate (ARG) 6. Place of accident (LOC) 7. Hospital related (ORG) 8. Origin of collided vehicle (LOC) 9. Destination of collided vehicle(LOC) 10. Time of the event (ARG) 11. Number of vehicles related (ARG) 12. Loss of money (ARG) |
Subtype: QUAKE-EVENT | |
1. Reporter-Arg 2. Duration-Arg 3. Central-Arg 4. Depth-Arg 5. Hospital-Arg 6. Time-Arg 7. AffectedFacility-Arg 8. AffectedHouse-Arg 9. AffectedPeople-Arg 10. Strength-Arg
| 1. News outlet (ARG) 2. Duration of the Quake(ARG) 3. Center of Quake(ARG) 4. Depth of Quake (LOC) 5. Hospital related (ORG) 6. Time of the event (ARG) 7. Number of affected facilities(ARG) 8. Number of affected House(ARG) 9. Number of affected people(ARG) 10. Quake's reported strength (ARG) |
Subtype: FLOOD-EVENT | |
1. Reporter-Arg 2. Cause-Arg 3. Height-Arg 4. Place-Arg 5. AffectedDistrict-Arg 6. AffectedHouse-Arg 7. AffectedVillage-Arg 8. AffectedFamily-Arg 9. AffectedCity-Arg 10. AffectedPeople-Arg 11. Hospital-Arg 12. Time-Arg 13. Facility-Arg 14. AffectedFields-Arg
| 1. News outlet (ARG) 2. Cause of the Flood (ARG) or (EVE of RAIN-EVENT or LANDSLIDE-EVENT) 3. Height of Water(ARG) 4. Place of Flood (LOC) 5. Affected number of Districts 6. Affected number of Houses 7. Affected number of Villages 8. Affected number of Families 9. Affected number of Cities 10. Affected number of People 11. Hospital related (ORG) 12. Time of the event (ARG) 13. Facility affected by flood 14. Area of fields (farms) (ARG)
|