Datasets
Standard Dataset
Binary Files for Analysis
- Citation Author(s):
- Submitted by:
- yun zhang
- Last updated:
- Thu, 05/30/2024 - 02:28
- DOI:
- 10.21227/nz0p-jt19
- License:
- Categories:
- Keywords:
Abstract
The dataset is compiled from different versions of multiple projects across six architectures (ARM-32, ARM-64, MIPS-32, MIPS-64, X86-32, X86-64) and four compilation optimization levels (O0, O1, O2, O3), totaling 36,864 binary files. Each file corresponds to a specific combination of architecture and optimization level, providing a wide range of samples for analyzing and researching the properties and characteristics of binary files.
These binary files are primarily used for research in cross-architecture and cross-version similarity detection. For example, by analyzing the same project's version across different architectures, one can explore code similarity in various hardware environments, which is significant for reverse engineering and binary code porting. Additionally, by comparing binary files of different versions of the same project, cross-version similarity detection can be performed, aiding in understanding functional improvements and changes between versions. This research not only enhances the understanding of binary code but also provides crucial support for software security analysis, vulnerability detection, and malware identification.
The first file name indicates the architecture of the binary file, such as arm-32, and the second file name indicates the project, version, and optimization level of the binary file. For example, binutils-2.30-O0 indicates that the binary file in this folder is compiled with binutils version 2.30 at the optimization level O0.
Comments
Thanks