Skip to main content

Datasets

Standard Dataset

Entities Extraction for OSINT

Citation Author(s):
Muhammad Ayub
Submitted by:
Muhammad Ayub
Last updated:
DOI:
10.21227/6z7k-nw14
No Ratings Yet

Abstract

The dataset created focuses on the Pakistan Military by collecting five types of entities from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was utilized for annotation, ensuring accurate labeling of data. Post-annotation, the data underwent cleaning and balancing processes. The final dataset comprises 660 neutral and 660 anti-military sentiment samples, totaling 1320 samples. This balanced dataset serves as a valuable resource for sentiment analysis, providing insights into public sentiment regarding military-related topics.

Instructions:

The dataset created focuses on the Pakistan Military, comprising five types of entities collected from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was used for annotation, followed by thorough data cleaning and balancing. The dataset includes 660 neutral and 660 anti-military sentiment samples, resulting in a total of 1320 samples. This balanced dataset provides a valuable resource for sentiment analysis, enabling researchers to gain insights into public sentiment towards military-related topics.