Entities Extraction for OSINT

Citation Author(s):
Muhammad
Ayub
Submitted by:
Muhammad Ayub
Last updated:
Fri, 07/12/2024 - 03:07
DOI:
10.21227/6z7k-nw14
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The dataset created focuses on the Pakistan Military by collecting five types of entities from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was utilized for annotation, ensuring accurate labeling of data. Post-annotation, the data underwent cleaning and balancing processes. The final dataset comprises 660 neutral and 660 anti-military sentiment samples, totaling 1320 samples. This balanced dataset serves as a valuable resource for sentiment analysis, providing insights into public sentiment regarding military-related topics.

Instructions: 

The dataset created focuses on the Pakistan Military, comprising five types of entities collected from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was used for annotation, followed by thorough data cleaning and balancing. The dataset includes 660 neutral and 660 anti-military sentiment samples, resulting in a total of 1320 samples. This balanced dataset provides a valuable resource for sentiment analysis, enabling researchers to gain insights into public sentiment towards military-related topics.