Modified FedHome

Citation Author(s):: Arash Ahmadi (University of Oklahoma)

Sarah Sharif (University of Oklahoma)

Yaser 'Mike' Banad (University of Oklahoma)
Submitted by:: Yaser Banad
Last updated:: Mon, 06/03/2024 - 19:23
DOI:: 10.21227/5m4q-m158
Data Format:: .py
Research Article Link:: A Comparative Study of Sampling Methods with Cross-Validation in the FedHome Fr…

198 views

Categories:

Artificial Intelligence

Keywords:

federated learning

Personalized in-home health monitoring

Class imbalance

Oversampling techniques

Decentralized edge devices

ACCESS DATASET CITE

Abstract

This paper presents a comparative study of sampling methods within the FedHome framework, designed for personalized in-home health monitoring. FedHome leverages federated learning (FL) and generative convolutional autoencoders (GCAE) to train models on decentralized edge devices while prioritizing data privacy. A notable challenge in this domain is the class imbalance in health data, where critical events such as falls are underrepresented, adversely affecting model performance. To address this, the research evaluates six oversampling techniques using Stratified K-fold cross-validation: SMOTE, Borderline-SMOTE, Random OverSampler, SMOTETomek, SVM-SMOTE, and SMOTE-ENN. These methods are tested on FedHome’s public implementation over 200 training rounds with and without stratified K-fold cross-validation. The findings indicate that SMOTE-ENN achieves the most consistent test accuracy, with a standard deviation range of 0.0167-0.0176, demonstrating stable performance compared to other samplers. In contrast, SMOTE and SVM-SMOTE exhibit higher variability in performance, as reflected by their wider standard deviation ranges of 0.0157-0.0180 and 0.0155-0.0180, respectively. Similarly, the Random OverSampler method shows a significant deviation range of 0.0155-0.0176. SMOTE-Tomek, with a deviation range of 0.0160-0.0175, also shows greater stability but not as much as SMOTE-ENN. This finding highlights the potential of SMOTEENN to enhance the reliability and accuracy of personalized health monitoring systems within the FedHome framework.

Instructions:

Installation

To set up the project, you need to install the required packages. Since the project relies on PyTorch for training, it is recommended to install the CUDA version to enable GPU support for faster execution. You can find the appropriate installation commands for your operating system and GPU at the following link:

PyTorch Installation Guide: https://pytorch.org/get-started/locally/

For a standard installation without GPU support, you can simply run the following command in your terminal or command prompt:

pip install torch

-----------------------------------

Additionally, you need to install the following packages:

pip install torchvision scikit-learn ujson opacus==0.15.0 h5py imblearn calmsize

-----------------------------------

Dataset Generation

To generate the dataset, execute the following command in the dateset directory:

python generate_har.py

-----------------------------------

Running the Code

To run the code, execute the following command in the system folder using the command prompt or terminal:

python -u main.py -lr 0.01 -lbs 10 -nc 30 -jr 1 -nb 6 -data har -m harcnn -algo FedHome -gr 200 -fd x -did 1 > har-FedHome-cross-validation-o-fold-x.out

-----------------------------------

Note: The -fd x argument represents the fold number to be used in the project. Replace x with the desired fold number. The jpg files of the graphs would be generated at the current folder.

To recreate all the graphs presented in the paper, simply execute the graph.py script with this command:

python graph.py

which is located in the graph folder.

-----------------------------------

Datasets

Standard Dataset

Modified FedHome

Abstract

Instructions:

Dataset Files

DOCUMENTATION

DATASET SCRIPTS

QUESTIONS?

More from this Author

Neuromorphic Digital-Twin-based Controller for Indoor Multi-UAV Systems Deployment

Enhanced-HisSegNet: Improved SAR Image Flood Segmentation with Learnable Histogram Layers and Active Contour Model

More like this Dataset

Weather Monitoring Station For Farms And Agriculture

Trilateration based on RSSI values in transmitters and receivers

The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs)

Retinal Fundus Multi-disease Image Dataset (RFMiD)

Experimental database for detecting and diagnosing rotor broken bar in a three-phase induction motor.

Dataset for classification of handwritten and printed text in a Doctor's prescription