Datasets
Standard Dataset
Data for Novel Approaches to Stability for Enhanced Privacy-Preserving Machine Learning
- Citation Author(s):
- Submitted by:
- aidan gao
- Last updated:
- Sat, 12/28/2024 - 16:16
- DOI:
- 10.21227/z7bt-w351
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Recently, machine learning models have seen considerable growth in size and popularity, lead-
ing to concerns regarding dataset privacy, especially around sensitive data containing personal information.
To address data extrapolation from model weights, various privacy frameworks ensure that the outputs of
machine learning models do not reveal their training data. However, this often results in diminished model
performance due to the necessary addition of noise to model weights. By enhancing models’ resistance to
minor variations in input, their stability improves, leading to a reduction in the amount of noise necessary
while still preserving privacy. This paper explores several techniques to improve stability and mitigate the
adverse effects of privatization within the Probably Approximately Correct Privacy Framework in machine
learning, covering both neural networks and linear regressions. Neural network stability methods focus
on varying clipping and pruning techniques, in addition to the novel tree-net applied to the context of
stability. Linear regression methods include sharing clipping techniques and introducing a novel group-
based clipping method instead of batch-based clipping. Linear regression testing utilizes data embedding
to improve accuracy further and introduces dynamic baseline training, a new method of stability training.
Using these methods, we enhance the test accuracy of a privatized Resnet20 on CIFAR-10 from 58.5% to
72.5% while upholding the same level of privacy.
The dataset is split into four sets, with the sim train and test being CIFAR-10 embedded via Imagenet, and cifar-100 train and test being embedded by CIFAR-100.