Non-IID IoT Person Detection (NIPD)

Citation Author(s):
University of Electronic Science and Technology of China
Submitted by:
Kangning Yin
Last updated:
Mon, 11/07/2022 - 20:14
Data Format:
0 ratings - Please login to submit your rating.


Federated learning (FL), as a privacy-preserving distributed machine learning algorithm is being rapidly applied in wireless communication networks, which enables IoT clients to obtain well-trained models while keeping data local. As object detection is in the security field, it can quickly retrieve the object of interest in images and videos, and combined with FL algorithms, it can be deployed on edge devices with limited computing power to process video data directly at the edge, reducing the transmission pressure on the network, relieving the computational burden on the central server, and reducing the processing latency of the video surveillance system. However, due to the different deployment scenarios of different devices, the data collected by the devices present independent and identically distributed (non-IID) characteristics, and the global model aggregated out is not effective if FL is performed. Meanwhile, the current research lacks publicly available data sets for real scenario target detection, which is not conducive to the study of the non-IID problem on wireless Internet of Things (IoT) edge devices. To this end, we open source a non-IID IoT person detection (NIPD) data set, which is collected from 5 different cameras with a total of 10,000 surveillance images of real scenes containing 55,160 people. We have built an experimental platform based on this data set and conducted some experiments and reflections. Our data set can effectively test the ability of IoT camera devices to solve real-world problems by FL algorithm, and also serve as a benchmark to solve the non-IID data distribution problem in real-world applications. 


Setting 1: Non-IID data, directly reflected by differences in the total number of samples, number of size objects, angles, camera lighting conditions, etc. The amount of data held by each client is basically the same, around 1600. From Camera_1 to Camera_5, the number of training data is 1590, 1574, 1581, 1583 and 1607, and the number of test data is 410, 426, 419, 417 and 393.

Setting 2: The number of data is imbalanced, as reflected by the fact that there is a certain difference in the number of images under each camera. We set the size of the above hyperparameter α to 0.1, reclassifying the data in training setting 1 as imbalanced. From Camera_1 to Camera_5, the number of training data is 1590, 1412, 1284, 1117, and 937, and the number of test data is 410, 426, 419, 417 and 393.