Datasets
Open Access
Event Geoparsing Indonesian News Dataset
- Citation Author(s):
- Submitted by:
- Agung Dewandaru
- Last updated:
- Fri, 05/29/2020 - 10:40
- DOI:
- 10.21227/s5rh-rn19
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset contains four types of geospatial events coverage in Indonesian news online portal: flood, traffic jam, earthquake, and fire. The corpus itself was composed of 926 manually annotated, disambiguated, and event extracted sentences that was filtered from 83 of 645,679 documents of our earlier news corpus based on four major geospatial events: flood, earthquake, fire, and accidents.
Source: detik.com, kompas.com, cnnindonesia.com
Download the dataset from Download Tab.
The main event extraction corpus is event-geoparsing-corpus.txt.
The disambiguation are listed on toponyms-disambiguated.txt.
event-geoparsing-corpus.txt Notes:
Each document inside Corpus is separated by ===
Each sentence within document is separated by empty line.
Regular line has four elements (word/POS Tag/Event/Argument):
e.g:
- Kerabat/NNP/O/O
- RSCM/NN/B-ORG/Hospital-Arg
For LOC entities, there are additional two fields: (latitude, longitude) / <administrative_level>
Jakarta/NNP/B-PLOC/Published-Arg/(-6.197602429787846, 106.83139222722116)/1
toponyms-disambiguated.txt :
Contains all toponyms (LOC) entities from corpus.txt, started with * (star symbol). Each star are having potential candidate referents.
The correct disambiguation is started with --> otherwise it is started with --
Every document is also separated by ===
full list of argument roles for each event subtypes:
Argument Roles
Description
Subtype: FIRE-EVENT
1. Reporter-Arg
2. Published-Arg
3. DeathVictim-Arg
4. WoundVictim-Arg
5. Place-Arg
6. Facility-Arg
7. Officer-Arg
8. Time-Arg
9. Street-Arg
10. Official-Arg
11. Hospital-Arg
12. HouseBurnt-Arg
13. AffectedRT-Arg
14. DispatchedTrucks-Arg
15. AffectedFamily-Arg
16. MonetaryLoss-Arg
1. News outlet (ARG)
2. City of publication (LOC)
3. How many people killed (ARG)
4. How many people wounded (ARG)
5. Geopolitical Entities of the place (LOC)
6. Building related (ORG)
7. Officer related (ORG)
8. Time of the Event (ARG)
9. Street of the Place (ARG)
10. Official related or official statement (ORG)
11. Hospital related (ORG)
12. House burnt number (ARG)
13. Number of RTs affected (ARG)
14. Number of Firetrucks dispatched (ARG)
15. Number of families affected (ARG)
16. Loss of money (ARG)
Subtype: ACCIDENT-EVENT
1. Reporter-Arg
2. Published-Arg
3. Point-Arg
4. Vehicle-Arg
5. Plate-Arg
6. Place-Arg
7. Hospital-Arg
8. From-Arg
9. To-Arg
10. Time-Arg
11. AffectedVehicle-Arg
12. MonetaryLoss-Arg
1. News company (ARG)
2. City of Publication (LOC)
3. Location offset of the Accident (ARG)
4. Type of Vehicle (ARG)
5. License Plate (ARG)
6. Place of accident (LOC)
7. Hospital related (ORG)
8. Origin of collided vehicle (LOC)
9. Destination of collided vehicle(LOC)
10. Time of the event (ARG)
11. Number of vehicles related (ARG)
12. Loss of money (ARG)
Subtype: QUAKE-EVENT
1. Reporter-Arg
2. Duration-Arg
3. Central-Arg
4. Depth-Arg
5. Hospital-Arg
6. Time-Arg
7. AffectedFacility-Arg
8. AffectedHouse-Arg
9. AffectedPeople-Arg
10. Strength-Arg
1. News outlet (ARG)
2. Duration of the Quake(ARG)
3. Center of Quake(ARG)
4. Depth of Quake (LOC)
5. Hospital related (ORG)
6. Time of the event (ARG)
7. Number of affected facilities(ARG)
8. Number of affected House(ARG)
9. Number of affected people(ARG)
10. Quake's reported strength (ARG)
Subtype: FLOOD-EVENT
1. Reporter-Arg
2. Cause-Arg
3. Height-Arg
4. Place-Arg
5. AffectedDistrict-Arg
6. AffectedHouse-Arg
7. AffectedVillage-Arg
8. AffectedFamily-Arg
9. AffectedCity-Arg
10. AffectedPeople-Arg
11. Hospital-Arg
12. Time-Arg
13. Facility-Arg
14. AffectedFields-Arg
1. News outlet (ARG)
2. Cause of the Flood (ARG) or (EVE of RAIN-EVENT or LANDSLIDE-EVENT)
3. Height of Water(ARG)
4. Place of Flood (LOC)
5. Affected number of Districts
6. Affected number of Houses
7. Affected number of Villages
8. Affected number of Families
9. Affected number of Cities
10. Affected number of People
11. Hospital related (ORG)
12. Time of the event (ARG)
13. Facility affected by flood
14. Area of fields (farms) (ARG)
Dataset Files
- event-geoparsing-corpus.txt (309.97 kB)
- toponyms-disambiguated.txt (221.87 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.