Security Patch Variant

Name: Security Patch Variant
Creator: Lin Li
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Security

Citation Author(s):: Lin Wang
Submitted by:: Lin Li
Last updated:: Thu, 10/17/2024 - 07:10
DOI:: 10.21227/5qwj-tb49

64 views

Categories:

Security

Keywords:

Vulnerability Management

ACCESS DATASET CITE

Abstract

Security patches play a crucial role in the battle

against Open Source Software (OSS) vulnerabilities. Meanwhile,

to facilitate the development of OSS projects, both upstream and

downstream developers often maintain multiple branches. Due

to the different code contexts among branches, multiple security

patch variants exist for the same vulnerability. Hence, to ease the

management of OSS vulnerabilities, locating all patch variants

of an OSS vulnerability is pretty important. However, existing

works are mainly designed for locating a patch or several patches

for a vulnerability but cannot locate all its patch variants.

In this paper, we study the problem of how to accurately locate

all variants of a given security patch. We motivate the problem

with a preliminary study, which shows that it is rather challenging

to locate all patch variants, even with a reference patch, due

to the diverse practice of OSS developers in backporting patches.

To overcome these challenges, we propose a new patch location

method to locate all variants of a patch in a code repository

(e.g., a software or a specific version). Based on our findings in

the preliminary study, our method employs a rule-based model

and incorporates two-dimensional code commit features that are

specifically designed for the task of patch variants locating:

similarity features and representative features. With a ground

truth patch variants dataset, our method achieves a precision of

99.68% and a recall of 98.81% and significantly outperforms two

state-of-the-art baselines (PATCHSCOUT and TRACER). Besides,

our method shows strong capability in locating patch variants at

both upstream and downstream code repositories.

Instructions:

# SPV Prototype___________________________
SPV is a security patch variants locating tool. This repository releases the experimental data and source code.
### Environment Configuration
1. Please install the following software and their recommanded versions by yourself.- Python 3.9- Mysql 8.0- git 2.25

2. Fill the configuration of Mysql and the root directory of *SPV* in `$WORKDIR$/spv/config.ini````[root_path]root_path = $WORKDIR$/spv/
[mysql]host = 10.176.xxx.xxport = 8888user = your_namepw = your_passwddb = your_database```

3. Install Python dependencies as follows:```$ cd $WORKDIR$/spv/$ pip install -r requirements.txt```

### Quick Start
#### 1. Data preparation- **Reference patches** *SPV* needs a reference patch as input. To be specific, an Excel file with the header `['CVE', 'GitHub Repository', 'Reference Patch(es)']` is required. This file should contain the necessary information about reference patches, which can be collected manually. An example is provided in `$WORKDIR$/spv/input/reference.xlsx`
- **Local code repository** *SPV* locates the variants of the reference patch from local git repository. It is recommanded to save the git repositories under `$WORKDIR$/spv/repo/`. For example, to save *zulip* git repository from remote:```$ cd $WORKDIR$/spv/repo/$ git clone https://github.com/zulip/zulip.git``` Note that the local repository should be saved with the exactly directory name specified in the `'GitHub Repository'` field of the reference information file.

#### 2. Cache the information of repository to database.
*SPV* extracts necessary information from the local git repository and caches them to a database. To achieve this, prepare a list where the directory name of local repository is provided, like `$WORKDIR$/spv/input/repo_list.json`.
```[ "zulip"]```Then, cache them with following commands:
```$ cd $WORKDIR$/spv/src/$ python spv.py -cache repo_list.json --commit --title --diff```
By default, *spv* will search the repositories under `$WORKDIR$/spv/repo/`. You could change the directory by modifying the `[repository]repodir` in `config.ini`.
#### 3. Locate variants.*SPV* locates variants based on the cached information in the database. To do so, prepare a list of CVE to be predicted, like `$WORKDIR$/spv/input/cve_list.json`.```[ "CVE-2017-0881", "CVE-2020-14194", "CVE-2021-30477"]```
Run *SPV* to locate patches```$ cd $WORKDIR$/spv/src/$ python spv.py -predict cve_list.json```
By default, *SPV* will find the reference patch information in `$WORKDIR$/spv/input/reference.xlsx`. You can use `-infofile` to specified another file, like:```$ cd $WORKDIR$/spv/src/$ python spv.py -predict cve_list.json -infofile new_reference.xlsx```
#### 4. ResultsBy default, *SPV* will save the results under `$WORKDIR$/spv/results/` with name `predict-{date}.json`.
### Experiment reproducation
#### 1. Data
Files in *data* directory provide necessary data to reproduce the experiments of our paper.
The dataset we construct for Research Question 1~3 includes:- `reference.xlsx` containts the reference patches of 737 CVEs.- `ground_truth.xlsx` contains the patch variants of 737 CVEs.- `rq1_cve.json` contains CVEs used for RQ1.- `rq1_shuffled.xlsx` is composed of 10 epoches. Each epoch has exactly same CVEs as `rq1_cve.json` but in different order.- `rq2_cve.json` contains CVEs used for RQ2.- `rq3_cve.json` contains CVEs used for RQ3.
The newly collected dataset from NVD for RQ4 includes:- `rq4_cve.json` contains 432 CVEs used for RQ4.- `rq4-reference.xlsx` contains the reference patches of the 432 CVEs.- `rq4-checked_cve.json` contains 45 CVEs that we manually collected ground truth for.- `rq4-ground_truth.xlsx` contains the manullay collected patch variants of the 45 CVEs.
#### 2. Reproduction
**Research Question 1**
Run command:```$ cd $WORKDIR$/spv/src/$ python exp/rq1_exp.py```
**Research Question 2 & 3**
Follow the steps in *Quick Start* and use corresponding files (`$WORKDIR$/spv/data/reference.xlsx`), but add `--exp` for the experiments. `--exp` is used to accomodate to the range of affected branches in the dataset.
For example, run RQ2 by```$ cd $WORKDIR$/spv/src/$ python spv.py -predict ../data/rq2_cve.json -infofile ../data/reference.xlsx --exp```
**Research Question 4**
Follow the steps in *Quick Start* and use corresponding files (`$WORKDIR$/spv/data/rq4-reference.xlsx`).
#### 3. Training (optional)
1. Load the `training_info.sql` to your mysql database, which contains the pair set for training.
2. Add `train=True` in `main(infofile, shuffled_file)` at the end of `exp/rq1_exp.py` and run command:
```$ cd $WORKDIR$/spv/src/$ python exp/rq1_exp.py```

iam student

Veha Arya Thu, 10/17/2024 - 07:51 Permalink