Entities Extraction for OSINT

Citation Author(s):: Muhammad Ayub
Submitted by:: Muhammad Ayub
Last updated:: Fri, 07/12/2024 - 07:07
DOI:: 10.21227/6z7k-nw14

462 views

Categories:

Keywords:

Entities Extraction

NER

OSINT

ACCESS DATASET CITE

Abstract

The dataset created focuses on the Pakistan Military by collecting five types of entities from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was utilized for annotation, ensuring accurate labeling of data. Post-annotation, the data underwent cleaning and balancing processes. The final dataset comprises 660 neutral and 660 anti-military sentiment samples, totaling 1320 samples. This balanced dataset serves as a valuable resource for sentiment analysis, providing insights into public sentiment regarding military-related topics.

Instructions:

The dataset created focuses on the Pakistan Military, comprising five types of entities collected from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was used for annotation, followed by thorough data cleaning and balancing. The dataset includes 660 neutral and 660 anti-military sentiment samples, resulting in a total of 1320 samples. This balanced dataset provides a valuable resource for sentiment analysis, enabling researchers to gain insights into public sentiment towards military-related topics.