Datasets
Standard Dataset
EvIs-Kitchen
- Citation Author(s):
- Submitted by:
- Yuzhe Hao
- Last updated:
- Mon, 07/08/2024 - 15:58
- DOI:
- 10.21227/7nkz-9p74
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Egocentric video and Inertial sensor data Kitchen activity dataset is the first V-S-S interaction-focused dataset for the ego-HAR task.
It consists of sequences of everyday kitchen activities involving rich interactions among the subject's body, object, and environment.
Besides the egocentric videos recorded by the GoPro camera, our dataset also includes the inertial sensor data recorded from the Fitbit watches attached on the subject's wrists, which are synchronized and correlated with the video data stream.
In total, our dataset contains 4,527 action samples from 12 subjects and 7 recipes, with 35 verb classes label and 56 noun classes label.
You can also access the document through the Introduction page: https://yuzhehao.github.io/EvIs-Kitchen-Introduction/
Our dataset contains 4 major folders: /Annotation
, /Video
, /RGB-frames
, and /Sensor
/Annotation
:
The annotation of all action segments are in one csv
file. Each line in this file is an annotation for a sample:
narration_id
("S01R01_011")
: "S01" means this action is from subject-1. "R01" means it is from recipe-1. The following "011" is the index of this action in the entire cooking process.verb
("crack")
: The Verb label of this action segment.noun
("egg")
: The Noun label of this action segment.start_frame
(4215)
: The index of frame (in RGB-frames sequence and in Sensor sequence) when this action starts.stop_frame
(4394)
: The index of frame (in RGB-frames sequence and in Sensor sequence) when this action ends.start_time
(02:20.5)
: The time ine the Video when this action starts.stop_time
(02.26.4)
: The time ine the Video when this action ends.temporal_length
(5976)
: The temporal length how long does this action last (withms
as unit).
/Video
:
The original raw video recorded by the GoPro camera, with 1920x1080 resolution in 60fps. Each MP4 file is a complete process of one subject cooking one of the recipes, and contains many action segments.
/RGB-frames
:
The 30fps video frames sequence of each long video in /Video
directory. Each folder contains the frame sequence for the corresponding long cooking video.
All frame image is resize to 228x128 for reducing the redundancy, saving more GPU memory cost during the training.
/Sensor
:
The 30fps inertial sensor data recorded by the Fitbit watches in npy
format. Each npy
file contains the complete sensor data sequence for the corresponding long cooking video.
For the sensor data sequence, the shape of each frame is (2,10). The first dimension means the left/right hands, their order is [left, right]. The second dimension means the 10 inertial sensor data, which are: 3-axis accelerometer, 3-axis gyroscope, 4-digit orientation. The order of the 10 inertial sensor data is: [acc-x, acc-y, acc-z, gyro-x, gyro-y, gyro-z, ori-a, ori-b, ori-c, ori-d]