Dataset for REMaQE evaluation

Citation Author(s):
Meet
Udeshi
NYU Tandon School of Engineering
Prashanth
Krishnamurthy
NYU Tandon School of Engineering
Hammond
Pearce
UNSW Sydney
Ramesh
Karri
NYU Tandon School of Engineering
Farshad
Khorrami
NYU Tandon School of Engineering
Submitted by:
Meet Udeshi
Last updated:
Sun, 05/12/2024 - 16:11
DOI:
10.21227/r7e6-bk48
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Dataset for evaluation of REMaQE

This dataset contains 3,137 randomly generated math equations, that have been compiled into ARM 32-bit HardFloat binaries using C and Simulink implementations, with 4 different optimization levels.
The binaries are reverse engineered to math equations by REMaQE, and this dataset covers a wide variety of binary implementations to evaluate REMaQE's performance.
Each generated equation is implemented as a C function and a Simulink model.
The C function is compiled for ARM32-HF target using the GCC compiler (arm-linux-gnueabhihf-gcc).
The Simulink model is compiled for the same target using Simulink's code generation feature.
Four optimization levels from "-O0" to "-O3" are used during compilation to obtain a variety of implementations.

The file "REMaQEv2_limitations.zip" contains additional binaries that represent the limitations of REMaQE.

Instructions: 

# Dataset for REMaQE

Each generated equation is present in its own folder in one of the `batch*` folders.
There are 3,137 equations and 25,096 binaries.

For each equation, the files present are:

- `expressions.json`: Generated math equations
- `simplified.json`: Simplified math equations
- `code.c`: C code
- `O[0-3]`: Folder for each optimization level, containing:
- `c_accuracy.json`: equivalence match results
- `c_bin.diss`: C binary disassembly
- `c_bin.elf`: C binary
- `c_build.log`: C build log
- `c_reversed.json`: REMaQE reversed result of C binary
- `error.log`: Error log
- `simulink.mdl`: Simulink model
- `simulink_accuracy.json`: Simulink equivalence match results
- `simulink_bin.diss`: Simulink binary disassembly
- `simulink_bin.elf`: Simulink binary
- `simulink_build/`: Simulink build directory
- `simulink_reversed.json`: REMaQE reversed result of Simulink binary

The `summary.json` file contains data about all the equations like number of ops, execution time, etc.

## Build

The `build_c.sh` and `build_simulink.sh` scripts build the binaries.

## Schema

The JSON schema of each generated file.

"expressions.json": {
"inputs": [<input-names>],
"outputs": [<output-names>],
"constants": {
<constant-name>: <value>
},
"expressions": {
<block-name>: <block-expression>
}
}

"simplified.json": {
<output-name>: {
"simplified": <expression>,
"good": <true/false>,
"error": <error-msg>
},
<next-output>...
}

"*_reversed.json": {
"inputs": {
<input-name>: <storage-desc>,
...
},
"outputs": {
<output-name>: {
"storage": <storage-desc>,
"expression": <output-expression>,
"simplified": <simplified-expression>,
"parsed": <true/false>,
"error": <error-msg>
},
...
},
"constants": {
<constant-name>: {
"storage": <storage-desc>,
"value": <value>
}
}
}

"accuracy.json": {
"matched": <true/false - aggregate>,
"error": <error-msg>,
<rev_output>: {
<gt_output>: {
"matched": <true/false>,
"mismatches": [<list of mismatches>],
"symbol_map": {
<rev_input>: <gt_input>,
...
}
}
},
<next rev_output>...
}