Sanitation Dataset

Citation Author(s):
Submitted by:
zhao zhang
Last updated:
Tue, 05/17/2022 - 22:17
Data Format:
Research Article Link:
0 ratings - Please login to submit your rating.


A new dataset named Sanitation is released to evaluate the HAR algorithm’s performance and benefit the researchers in this field, which collects seven types of daily work activity data from sanitation workers.We provide two .csv files, one is the raw dataset “sanitation.csv”, the other is the pre-processed features dataset which is suitable for machine learning based human activity recognition methods.


We provide two .csv files, one is the raw dataset “sanitation.csv”, the other is the pre-processed features dataset.

The raw data were collected by the wrist smartwatch which was equipped with a triaxial accelerometer. An SD card and a SIM card were installed for storage and real-time data transmission, respectively.

The self-collected Sanitation dataset is collected from the open environment. When the sanitation workers were doing the daily work activities with the smartphone worn on the right hand or the left hand, the data were collected continuously at a frequency of 25 Hz and sent to the receiver server through the SIM card. These seven types of activity are: Walk, Run, Sweep, Bsweep (sweep using a big broom), Clean, Dump and Daily activities (like sitting and smoking). 

The size of the whole dataset is 266555 x 3, which contains 266555 samples. Each sample contains X, Y and Z three axis acceleration values. There are 81739 samples of Bweep, 36502 samples of Clean, 45439 samples of Daily, 29518 samples of Dump, 3903 samples of Run, 60028 samples of Sweep and 9426 samples of Walk.

The first three columns of the sanitation.csv file represent the acceleration data of the X-axis, Y-axis and z-axis respectively. The acceleration data unit is g, that is, 9.81m/s. The fourth column represents the sampling point label.


The preprocessed dataset is provided by dividing the whole time series into 5026 windows by sliding window segmentation and generating 57 features for each window data. The time-domain and frequency-domain features are both extracted.

Yong Zhang


March. 28, 2019



I would like to download this dataset for my research experiments. Thank you in advance.

Submitted by Ky Nguyen on Wed, 02/19/2020 - 03:50

I would like to download this dataset for my course practice. Thank you.

Submitted by mahdi sheikh on Fri, 07/23/2021 - 13:19
Submitted by zhao zhang on Sat, 07/24/2021 - 04:28