Datasets
Standard Dataset
Dataset for REMaQE evaluation
- Citation Author(s):
- Submitted by:
- Meet Udeshi
- Last updated:
- Sun, 05/12/2024 - 16:11
- DOI:
- 10.21227/r7e6-bk48
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Dataset for evaluation of REMaQE
This dataset contains 3,137 randomly generated math equations, that have been compiled into ARM 32-bit HardFloat binaries using C and Simulink implementations, with 4 different optimization levels.
The binaries are reverse engineered to math equations by REMaQE, and this dataset covers a wide variety of binary implementations to evaluate REMaQE's performance.
Each generated equation is implemented as a C function and a Simulink model.
The C function is compiled for ARM32-HF target using the GCC compiler (arm-linux-gnueabhihf-gcc).
The Simulink model is compiled for the same target using Simulink's code generation feature.
Four optimization levels from "-O0" to "-O3" are used during compilation to obtain a variety of implementations.
The file "REMaQEv2_limitations.zip" contains additional binaries that represent the limitations of REMaQE.
# Dataset for REMaQE
Each generated equation is present in its own folder in one of the `batch*` folders.
There are 3,137 equations and 25,096 binaries.
For each equation, the files present are:
- `expressions.json`: Generated math equations
- `simplified.json`: Simplified math equations
- `code.c`: C code
- `O[0-3]`: Folder for each optimization level, containing:
- `c_accuracy.json`: equivalence match results
- `c_bin.diss`: C binary disassembly
- `c_bin.elf`: C binary
- `c_build.log`: C build log
- `c_reversed.json`: REMaQE reversed result of C binary
- `error.log`: Error log
- `simulink.mdl`: Simulink model
- `simulink_accuracy.json`: Simulink equivalence match results
- `simulink_bin.diss`: Simulink binary disassembly
- `simulink_bin.elf`: Simulink binary
- `simulink_build/`: Simulink build directory
- `simulink_reversed.json`: REMaQE reversed result of Simulink binary
The `summary.json` file contains data about all the equations like number of ops, execution time, etc.
## Build
The `build_c.sh` and `build_simulink.sh` scripts build the binaries.
## Schema
The JSON schema of each generated file.
"expressions.json": {
"inputs": [<input-names>],
"outputs": [<output-names>],
"constants": {
<constant-name>: <value>
},
"expressions": {
<block-name>: <block-expression>
}
}
"simplified.json": {
<output-name>: {
"simplified": <expression>,
"good": <true/false>,
"error": <error-msg>
},
<next-output>...
}
"*_reversed.json": {
"inputs": {
<input-name>: <storage-desc>,
...
},
"outputs": {
<output-name>: {
"storage": <storage-desc>,
"expression": <output-expression>,
"simplified": <simplified-expression>,
"parsed": <true/false>,
"error": <error-msg>
},
...
},
"constants": {
<constant-name>: {
"storage": <storage-desc>,
"value": <value>
}
}
}
"accuracy.json": {
"matched": <true/false - aggregate>,
"error": <error-msg>,
<rev_output>: {
<gt_output>: {
"matched": <true/false>,
"mismatches": [<list of mismatches>],
"symbol_map": {
<rev_input>: <gt_input>,
...
}
}
},
<next rev_output>...
}