Data for Novel Approaches to Stability for Enhanced Privacy-Preserving Machine Learning

Citation Author(s):
aidan
gao
Submitted by:
aidan gao
Last updated:
Sat, 12/28/2024 - 16:16
DOI:
10.21227/z7bt-w351
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Recently, machine learning models have seen considerable growth in size and popularity, lead-

ing to concerns regarding dataset privacy, especially around sensitive data containing personal information.

To address data extrapolation from model weights, various privacy frameworks ensure that the outputs of

machine learning models do not reveal their training data. However, this often results in diminished model

performance due to the necessary addition of noise to model weights. By enhancing models’ resistance to

minor variations in input, their stability improves, leading to a reduction in the amount of noise necessary

while still preserving privacy. This paper explores several techniques to improve stability and mitigate the

adverse effects of privatization within the Probably Approximately Correct Privacy Framework in machine

learning, covering both neural networks and linear regressions. Neural network stability methods focus

on varying clipping and pruning techniques, in addition to the novel tree-net applied to the context of

stability. Linear regression methods include sharing clipping techniques and introducing a novel group-

based clipping method instead of batch-based clipping. Linear regression testing utilizes data embedding

to improve accuracy further and introduces dynamic baseline training, a new method of stability training.

Using these methods, we enhance the test accuracy of a privatized Resnet20 on CIFAR-10 from 58.5% to

72.5% while upholding the same level of privacy.

Instructions: 

The dataset is split into four sets, with the sim train and test being CIFAR-10 embedded via Imagenet, and cifar-100 train and test being embedded by CIFAR-100.