Troid: Temporal and Cross-Sectional Android Dataset and Its Applications

Citation Author(s):
Ali
Al Kinoon
Abdulazi
Alghamd
Ahod
Alghuried
David
Mohaisen
Submitted by:
Ali Al Kinoon
Last updated:
Tue, 06/04/2024 - 17:04
DOI:
10.21227/95my-tf46
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Numerous studies have focused on exploring Android malware in recent years, covering areas such as malware detection and application analysis. As a result, there is a pressing need for a reliable and scalable malware dataset to support the development and evaluation of effective malware studies. Although several benchmarks for Android malware datasets are widely used in research, they have significant limitations. Firstly, many of these datasets are outdated and do not capture current malware trends. Additionally, some have become obsolete or inaccessible, limiting their usefulness. Secondly, most datasets only contain the apps themselves (APKs), lacking important meta features like content rating, ad coverage, user ratings, and privacy policies. This omission restricts the potential applications of these datasets. This paper introduces a reliable Android malware dataset called \ours{} and sourced from the Google Play Store Market, covering the period from 2019 to 2023. To label malicious apps, we use VirusTotal and track their availability and removal status on the Google Play Store. We curate a meticulous Android malware dataset with 5,028 samples using this method. We augment our dataset with various features, including privacy policies, metadata, control flow graphs, permissions, API calls, strings, function names, hex dumps, and labels. We believe this benchmark dataset will greatly support various research efforts, including Android malware classification and detection, static program analysis, and privacy policy analysis.