Datasets
Standard Dataset
Entities Extraction for OSINT
- Citation Author(s):
- Submitted by:
- Muhammad Ayub
- Last updated:
- Fri, 07/12/2024 - 03:07
- DOI:
- 10.21227/6z7k-nw14
- License:
- Categories:
- Keywords:
Abstract
The dataset created focuses on the Pakistan Military by collecting five types of entities from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was utilized for annotation, ensuring accurate labeling of data. Post-annotation, the data underwent cleaning and balancing processes. The final dataset comprises 660 neutral and 660 anti-military sentiment samples, totaling 1320 samples. This balanced dataset serves as a valuable resource for sentiment analysis, providing insights into public sentiment regarding military-related topics.
The dataset created focuses on the Pakistan Military, comprising five types of entities collected from Wikipedia: weapons, ranks, dates, operations, and locations. An open-source NER annotator was used for annotation, followed by thorough data cleaning and balancing. The dataset includes 660 neutral and 660 anti-military sentiment samples, resulting in a total of 1320 samples. This balanced dataset provides a valuable resource for sentiment analysis, enabling researchers to gain insights into public sentiment towards military-related topics.