Data for Novel Approaches to Stability for Enhanced Privacy-Preserving Machine Learning

Citation Author(s):: aidan gao
Submitted by:: aidan gao
Last updated:: Sat, 12/28/2024 - 21:16
DOI:: 10.21227/z7bt-w351
Data Format:: **.mat

30 views

Categories:

Artificial Intelligence

Keywords:

Responsible Artifical Intelligence (RAI)

3D Dataset; Pattern Recognition; Machine Learning; Computer Vision

ACCESS DATASET CITE

Abstract

Recently, machine learning models have seen considerable growth in size and popularity, lead-

ing to concerns regarding dataset privacy, especially around sensitive data containing personal information.

To address data extrapolation from model weights, various privacy frameworks ensure that the outputs of

machine learning models do not reveal their training data. However, this often results in diminished model

performance due to the necessary addition of noise to model weights. By enhancing models’ resistance to

minor variations in input, their stability improves, leading to a reduction in the amount of noise necessary

while still preserving privacy. This paper explores several techniques to improve stability and mitigate the

adverse effects of privatization within the Probably Approximately Correct Privacy Framework in machine

learning, covering both neural networks and linear regressions. Neural network stability methods focus

on varying clipping and pruning techniques, in addition to the novel tree-net applied to the context of

stability. Linear regression methods include sharing clipping techniques and introducing a novel group-

based clipping method instead of batch-based clipping. Linear regression testing utilizes data embedding

to improve accuracy further and introduces dynamic baseline training, a new method of stability training.

Using these methods, we enhance the test accuracy of a privatized Resnet20 on CIFAR-10 from 58.5% to

72.5% while upholding the same level of privacy.

Instructions:

The dataset is split into four sets, with the sim train and test being CIFAR-10 embedded via Imagenet, and cifar-100 train and test being embedded by CIFAR-100.

Dataset Files

Imagenet and Cifar 100 Transformed data.zip (Size: 650.96 MB)

Datasets

Standard Dataset

Data for Novel Approaches to Stability for Enhanced Privacy-Preserving Machine Learning

Abstract

Instructions:

Dataset Files

QUESTIONS?

More like this Dataset

Weather Monitoring Station For Farms And Agriculture

Trilateration based on RSSI values in transmitters and receivers

The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs)

Retinal Fundus Multi-disease Image Dataset (RFMiD)

Experimental database for detecting and diagnosing rotor broken bar in a three-phase induction motor.

Dataset for classification of handwritten and printed text in a Doctor's prescription