Event Geoparsing Indonesian News Dataset

Citation Author(s):
Agung
Dewandaru
Submitted by:
Agung Dewandaru
Last updated:
Fri, 05/29/2020 - 10:40
DOI:
10.21227/s5rh-rn19
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset contains four types of geospatial events coverage in Indonesian news online portal: flood, traffic jam, earthquake, and fire. The corpus itself was composed of 926 manually annotated, disambiguated, and event extracted sentences that was filtered from 83 of 645,679 documents of our earlier news corpus based on four major geospatial events: flood, earthquake, fire, and accidents

Source: detik.com, kompas.com, cnnindonesia.com

Instructions: 

Download the dataset from Download Tab.

 

 

The main event extraction corpus is event-geoparsing-corpus.txt.

The disambiguation are listed on toponyms-disambiguated.txt.

 

 

event-geoparsing-corpus.txt Notes:

Each document inside Corpus is separated by ===

Each sentence within document is separated by empty line.

Regular line has four elements (word/POS Tag/Event/Argument):

e.g:

- Kerabat/NNP/O/O 

- RSCM/NN/B-ORG/Hospital-Arg

For LOC entities, there are additional two fields: (latitude, longitude) / <administrative_level> 

Jakarta/NNP/B-PLOC/Published-Arg/(-6.197602429787846, 106.83139222722116)/1

toponyms-disambiguated.txt :

Contains all toponyms (LOC) entities from corpus.txt, started with * (star symbol). Each star are having potential candidate referents. 

The correct disambiguation is started with --> otherwise it is started with --

Every document is also separated by ===

 

full list of argument roles for each event subtypes:

 

Argument Roles

Description

Subtype: FIRE-EVENT

1.        Reporter-Arg

2.        Published-Arg

3.        DeathVictim-Arg

4.        WoundVictim-Arg

5.        Place-Arg

6.        Facility-Arg

7.        Officer-Arg

8.        Time-Arg

9.        Street-Arg

10.      Official-Arg

11.      Hospital-Arg

12.      HouseBurnt-Arg

13.      AffectedRT-Arg

14.      DispatchedTrucks-Arg

15.      AffectedFamily-Arg

16.      MonetaryLoss-Arg

1.        News outlet (ARG)

2.        City of publication (LOC)

3.        How many people killed (ARG)

4.        How many people wounded (ARG)

5.        Geopolitical Entities of the place (LOC)

6.        Building related (ORG)

7.        Officer related (ORG)

8.        Time of the Event (ARG)

9.        Street of the Place (ARG)

10.      Official related or official statement (ORG)

11.      Hospital related (ORG)

12.      House burnt number (ARG)

13.      Number of RTs affected (ARG)

14.      Number of Firetrucks dispatched (ARG)

15.      Number of families affected (ARG)

16.      Loss of money (ARG)

Subtype: ACCIDENT-EVENT

1.        Reporter-Arg

2.        Published-Arg

3.        Point-Arg

4.        Vehicle-Arg

5.        Plate-Arg

6.        Place-Arg

7.        Hospital-Arg

8.        From-Arg

9.        To-Arg

10.      Time-Arg

11.      AffectedVehicle-Arg

12.      MonetaryLoss-Arg

1.        News company (ARG)

2.        City of Publication (LOC)

3.        Location offset of the Accident (ARG)

4.        Type of Vehicle (ARG)

5.        License Plate (ARG)

6.        Place of accident (LOC)

7.        Hospital related (ORG)

8.        Origin of collided vehicle (LOC)

9.        Destination of collided vehicle(LOC)

10.      Time of the event (ARG)

11.      Number of vehicles related (ARG)

12.      Loss of money (ARG)

Subtype: QUAKE-EVENT

1.        Reporter-Arg

2.        Duration-Arg

3.        Central-Arg

4.        Depth-Arg

5.        Hospital-Arg

6.        Time-Arg

7.        AffectedFacility-Arg

8.        AffectedHouse-Arg

9.        AffectedPeople-Arg

10.      Strength-Arg

 

1.        News outlet (ARG)

2.        Duration of the Quake(ARG)

3.        Center of Quake(ARG)

4.        Depth of Quake (LOC)

5.        Hospital related (ORG)

6.        Time of the event (ARG)

7.        Number of affected facilities(ARG)

8.        Number of affected House(ARG)

9.        Number of affected people(ARG)

10.      Quake's reported strength (ARG)

Subtype: FLOOD-EVENT

1.        Reporter-Arg

2.        Cause-Arg         

3.        Height-Arg

4.        Place-Arg

5.        AffectedDistrict-Arg

6.        AffectedHouse-Arg

7.        AffectedVillage-Arg

8.        AffectedFamily-Arg

9.        AffectedCity-Arg

10.      AffectedPeople-Arg

11.      Hospital-Arg

12.      Time-Arg

13.      Facility-Arg

14.      AffectedFields-Arg

 

1.        News outlet (ARG)

2.        Cause of the Flood (ARG) or (EVE of RAIN-EVENT or LANDSLIDE-EVENT)

3.        Height of Water(ARG)

4.        Place of Flood (LOC)

5.        Affected number of Districts

6.        Affected number of Houses

7.        Affected number of Villages

8.        Affected number of Families

9.        Affected number of Cities

10.      Affected number of People

11.      Hospital related (ORG)

12.      Time of the event (ARG)

13.      Facility affected by flood

14.      Area of fields (farms) (ARG)

 

 

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.