Thermal image dataset for person detection - UNIRI-TID

Citation Author(s):
Mate
Kristo
Department of Informatics University of Rijeka
Marina
Ivasic-Kos
Department of Informatics University of Rijeka
Miran
Pobar
Department of Informatics University of Rijeka
Submitted by:
Marina Ivasic-Kos
Last updated:
Sun, 07/12/2020 - 19:02
DOI:
10.21227/yec9-yy29
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

We build an original dataset of thermal videos and images that simulate illegal movements around the border and in protected areas and are designed for training machines and deep learning models. The videos are recorded in areas around the forest, at night, in different weather conditions – in the clear weather, in the rain, and in the fog, and with people in different body positions (upright, hunched) and movement speeds (regu- lar walking, running) at different ranges from the camera. In addition to using standard camera lenses, telephoto lenses were also used to test their impact on the quality of thermal images and person detection in different weather conditions and distance from the camera. The obtained dataset comprises 7412 manually labeled images extracted from video frames captured in the long-wave infrared (LWIR) a segment of the electromagnetic (EM) spectrum.

Instructions: 

 

About 20 minutes of recorded material from the clear weather scenario, 13 minutes from the fog scenario, and about 15 minutes from rainy weather were processed. The longer videos were cut into sequences and from these sequences individual frames were extracted, resulting in 11,900 images for the clear weather, 4,905 images for the fog, and 7,030 images for the rainy weather scenarios.

A total of 6,111 frames were manual annotated so that could be used to train the supervised model for person detection. When selecting the frames, it was taken into account that the selected frames include different weather conditions so that in the set there were 2,663 frames shot in clear weather conditions, 1,135 frames of fog, and 2,313 frames of rain.

The annotations were made using the open-source Yolo BBox Annotation Tool that can simultaneously store annotations in the three most popular machine learning annotation formats YOLO, VOC, and MS COCO so all three annotation formats are available. The image annotation consists of a centroid position of the bounding box around each object of interest, size of the bounding box in terms of width and height, and corresponding class label (Human or Dog).