Wearable RGB Camera Images of Human Locomotion Environments

Citation Author(s):
University of Waterloo
Submitted by:
Brock Laschowski
Last updated:
Mon, 07/27/2020 - 11:28
Data Format:
0 ratings - Please login to submit your rating.


Drawing inspiration from autonomous vehicles, using future environment information could improve the control of wearable biomechatronic devices for assisting human locomotion. To the authors knowledge, this research represents the first documented investigation using machine vision and deep convolutional neural networks for environment recognition to support the predictive control of robotic lower-limb prostheses and exoskeletons. One participant was instrumented with a battery-powered, chest-mounted RGB camera system. Approximately 10 hours of video footage were experimentally collected while ambulating throughout unknown outdoor and indoor environments. The sampled images were preprocessed and individually labelled. A deep convolutional neural network was developed and trained to automatically recognize three walking environments: level-ground, incline staircases, and decline staircases. The environment recognition system achieved 94.85% overall image classification accuracy. Extending these preliminary findings, future research should incorporate other environment classes (e.g., incline ramps) and integrate the environment recognition system with electromechanical sensors and/or surface electromyography for automated locomotion mode recognition. The challenges associated with implementing deep learning on wearable biomechatronic devices are discussed.

Reference: Laschowski B, McNally W, McPhee J, and Wong A. (2019). Preliminary Design of an Environment Recognition System for Controlling Robotic Lower-Limb Prostheses and Exoskeletons. IEEE International Conference on Rehabilitation Robotics, pp. 868-873. DOI: 10.1109/ICORR.2019.8779540.


One subject was instrumented with a battery-powered, chest-mounted RGB camera system (GoPro Hero4 Session). The subject walked around the university campus while collecting images throughout unknown outdoor and indoor environments with variable lighting, occlusions, signal noise, and intraclass variations. Data were collected at various times throughout the day to account for different lighting conditions. The sampled field-of-view was approximately 3 m ahead of the subject. Images were collected at 60 frames/second with a 1280×720-pixel resolution. Approximately 10 hours of video footage (i.e., amounting to 2,055,240 images) were collected throughout ten, 1-hour walking sessions. The dataset includes 10 individual folders, each corresponding with one data collection session. Since there were minimal differences between consecutive images at 60 frames/second, the dataset was downsampled to 1 frame/second. Images were cropped to 1:1 aspect ratios and resized to 224x224 pixel resolutions using bilinear interpolation.

Overall, 34,254 sampled images were manually labelled, including 27,030 for level-ground environments, 3,943 for incline staircases, and 3,281 for decline staircases. For transitions from staircases to level-ground environments, the images were labelled as staircases whenever the staircase was visible inside the sampled field-of-view. For transitioning from level-ground to staircase environments, the images were labelled as staircases provided that the subject was within 1-2 steps and forward-facing the staircase. The images were labelled by one designated researcher for consistency. The image files were labelled “imagenumber_0” for level-ground environments, “imagenumber_1” for incline staircases, and “imagenumber_2” for decline staircases. Prospective users are welcomed to implement their own labelling schemes.