For the task of detecting casualties and persons in search and rescue scenarios in drone images and videos, our database called SARD was built. The actors in the footage have simulate exhausted and injured persons as well as "classic" types of movement of people in nature, such as running, walking, standing, sitting, or lying down. Since different types of terrain and backgrounds determine possible events and scenarios in captured images and videos, the shots include persons on macadam roads, in quarries, low and high grass, forest shade, and the like.
We build an original dataset of thermal videos and images that simulate illegal movements around the border and in protected areas and are designed for training machines and deep learning models. The videos are recorded in areas around the forest, at night, in different weather conditions – in the clear weather, in the rain, and in the fog, and with people in different body positions (upright, hunched) and movement speeds (regu- lar walking, running) at different ranges from the camera.
About 20 minutes of recorded material from the clear weather scenario, 13 minutes from the fog scenario, and about 15 minutes from rainy weather were processed. The longer videos were cut into sequences and from these sequences individual frames were extracted, resulting in 11,900 images for the clear weather, 4,905 images for the fog, and 7,030 images for the rainy weather scenarios.
A total of 6,111 frames were manual annotated so that could be used to train the supervised model for person detection. When selecting the frames, it was taken into account that the selected frames include different weather conditions so that in the set there were 2,663 frames shot in clear weather conditions, 1,135 frames of fog, and 2,313 frames of rain.
The annotations were made using the open-source Yolo BBox Annotation Tool that can simultaneously store annotations in the three most popular machine learning annotation formats YOLO, VOC, and MS COCO so all three annotation formats are available. The image annotation consists of a centroid position of the bounding box around each object of interest, size of the bounding box in terms of width and height, and corresponding class label (Human or Dog).
We introduce a new robotic RGBD dataset with difficult luminosity conditions: ONERA.ROOM. It comprises RGB-D data (as pairs of images) and corresponding annotations in PASCAL VOC format (xml files)
It aims at People detection, in (mostly) indoor and outdoor environments. People in the field of view can be standing, but also lying on the ground as after a fall.
To facilitate use of some deep learning softwares, a folder tree with relative symbolic link (thus avoiding extra space) will gather all the sequences in three folders : | |— image | | — sequenceName0_imageNumber_timestamp0.jpg | | — sequenceName0_imageNumber_timestamp1.jpg | | — sequenceName0_imageNumber_timestamp2.jpg | | — sequenceName0_imageNumber_timestamp3.jpg | | — … | |— depth_8bits | | — sequenceName0_imageNumber_timestamp0.png | | — sequenceName0_imageNumber_timestamp1.png | | — sequenceName0_imageNumber_timestamp2.png | | — sequenceName0_imageNumber_timestamp3.png | | — … | |— annotations | | — sequenceName0_imageNumber_timestamp0.xml | | — sequenceName0_imageNumber_timestamp1.xml | | — sequenceName0_imageNumber_timestamp2.xml | | — sequenceName0_imageNumber_timestamp3.xml | | — … |