The dataset stores a random sampling distribution with cardinality of support of 4,294,967,296 (i.e., two raised to the power of thirty-two). Specifically, the source generator is fixed as a symmetric-key cryptographic function with 64-bit input and 32-bit output. A total of 17,179,869,184 (i.e., two raised to the power of thirty-four) randomly chosen inputs are used to produce the sampling distribution as the dataset. The integer-valued sampling distribution is formatted as 4,294,967,296 (i.e., two raised to the power of thirty-two) entries, and each entry occupies one byte in storage.


The big dataset file is 4GB in size. The dataset contains 4,294,967,296 entries and each entry occupies one byte in storage. The MD5 checksum is 4ee9 a09a a509 fd70 4152 2fd2 f263 ae25. The SHA256 checksum is d9a4 fb8d d9f0 de29 b1e2 3316 c78d 8e65 4ec7 d60f 7ebc ec9e ee57 6fa2 e392 3b57. Note that the above hash checksum results are displayed in groups of four digits.


This dataset is a result of my research production into machine learning in android security. The data was obtained by a process that consisted to map a binary vector of permissions used for each application analyzed {1=used, 0=no used}. Moreover, the samples of malware/benign were devided by "Type"; 1 malware and 0 non-malware.

When I did my research, the datasets of malware and benign Android applications were not available, then I give to the community a part of my research results for the future works.