The AirMuseum dataset is intended for multi-robot stereo-visual and inertial Simultaneous Localization And Mapping (SLAM). It consists in five indoor multi-robot scenarios acquired with ground and aerial robots in a former Air Museum at ONERA Meudon, France. Those scenarios were designed to exhibit some specific opportunities and challenges associated to collaborative SLAM. Each scenario includes synchronized sequences between multiple robots with stereo images and inertial measurements.


The dataset is organized as follows:

  • holds the calibrations of the cameras and the IMU sensors.
  • holds the calibration of the mounted apriltag markers for robots B and C. It consists in the estimated pose of the markers' frames with regard to the reference frame attached to one of the robot's cameras.
  • scenarioX_robotY holds the acquisitions of the robotY in scenarioX as ROS .bag files, as well the associated ground-truth trajectories (the ground-truth is provided for the frame attached to cam100 and for the body (inertial) frame).
  • scenarioX_trajectories.mp4 is a video of an accelerated (x20) top-view of the robot trajectories (robotA is red, robotB is green, robotC is blue and the drone is orange)
  • scenarioX_preview.mp4 is a x1.5 accelerated preview of the visual acquisitions of the robots

Additional updated details may be found on the associated github repository ( and in the associated article:  AirMuseum: a heterogeneous multi-robot dataset for stereo-visual and inertial Simultaneous Localization And Mapping - Rodolphe Dubois, Alexandre Eudes and Vincent Frémont - 2020 IEEE International Conference on Multisensor Fusion and Integration (MFI 2020).


The CHU Surveillance Violence Dataset (CSVD) is a collection of CCTV footage of violent and non-violent actions aiming to characterize the composition of violent actions into more specific actions. We produced several simple action classes for violent and non-violent actions do add variety and better distribution among simple and complex action classes for RGB and Action Silhouette Videos (enhanced Optical Flow Images) with their localized actions.


This dataset is for date-fruit grading. It contains the grades of three types of dates: Ajwa (grade 1, grade 2, and grade 3), Mabroom (grade 1, grade 2, and grade 3), dried Sukkary (grade 1 and 2)


This dataset contains images of three types of dates with their grades:

- Ajwah: grade 1, grade 2, and grade 3

- Mabroom: grade 1, grade 2, and grade 3

- Sukkary: grade 1 and grade 2


The dataset contains 2,400 vehicle images for license plate detection purposes. Images are taken from actively operating commercial cameras which are installed on a highway and in an entrance of a shopping mall. Images

contain generally one vehicle, but sometimes can contain two or more vehicles. For each image in pixel domain there exists two different images generated from encoded High Efficiency Video Coding (HEVC) stream using our method. 



•2,400 Pixel Domain Images

•2,400 HEVC Domain Images Generated from Our Block Partition Method

•2,400 HEVC Domain Images Generated from Our Prediction Based Method


•Each train test set contains 1,800 images.

•Each test set contains 600 images.


Images are given numeral names starting from 100,001 to 102,400 for each method. The same numbers are used to represent HEVC domain representations of pixel domain images. 


For each image there exists another file which contains plate annotation information in YOLO format.



|   +---HEVCDomain_BlockPartition

|   |   +---Test

|   |   \---Train

|   +---HEVCDomain_PredictionUnit

|   |   +---Test

|   |   \---Train

|   \---PixelDomain

|       +---Test

|       \---Train



We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets c


* At this moment, the paper of this dataset is under review. The dataset is going to be fully published along with the publication of the paper, while in the meanwhile, more parts of the dataset will be uploaded.

The dataset includes multi-view RGBD, 3D/2D pose, volumetric (mesh/point-cloud/3D character) and audio data along with metadata for spatiotemporal alignment.

The full dataset is splitted per subject and per activity per modality.

There are also two benchmarking subsets, H4D1 for single-person and H4D2 for two-person sequences, respectively.

The fornats are:

  • mRGBD: *.png
  • 3D/2D poses: *.npy
  • volumetric (mesh/point-cloud/): *.ply
  • 3D character: *.fbx
  • metadata: *.txt, *.json



Parking Slot Detection dataset

angle, type, and location of each parking slot


Parking Slot Detection dataset

angle, type, and location of each parking slot


The data set has been consolidated for the task of Human Posture Recognition. The data set consists of four postures namely -

  1. Sitting,
  2. Standing,
  3. Bending and,
  4. Lying.

There are 1200 images for each of the postures listed above. The images have a dimension of 512 x 512 px.


The data set has been structured according to the postures. The following directory structure is maintained -

    • Sitting - Contains 1200 images.
    • Standing - Contains 1200 images.
    • Bending - Contains 1200 images.
    • Lying - Contains 1200 images.

To use the data set just unzip the file. The images have been pre-processed in advance. The final images represent relevant silhouettes.


Images of various foods, taken with different cameras and different lighting conditions. Images can be used to design and test Computer Vision techniques that can recognize foods and estimate their calories and nutrition.


Please note that in its full view, the human thumb in each image is approximately 5 cm by 1.2 cm.

For more information, please see:

P. Pouladzadeh, A. Yassine, and S. Shirmohammadi, “FooDD: Food Detection Dataset for Calorie Measurement Using Food Images”, in New Trends in Image Analysis and Processing - ICIAP 2015 Workshops, V. Murino, E. Puppo, D. Sona, M. Cristani, and C. Sansone, Lecture Notes in Computer Science, Springer, Volume 9281, 2015, ISBN: 978-3-319-23221-8, pp 441-448. DOI: 10.1007/978-3-319-23222-5_54


A dataset of videos, recorded by an in-car camera, of drivers in an actual car with various facial characteristics (male and female, with and without glasses/sunglasses, different ethnicities) talking, singing, being silent, and yawning. It can be used primarily to develop and test algorithms and models for yawning detection, but also recognition and tracking of face and mouth. The videos are taken in natural and varying illumination conditions. The videos come in two sets, as described next: 


You can use all videos for research. Also, you can display the screenshots of some (not all) videos in your own publications. Please check the Allow Researchers to use picture in their paper column in the table to see if you can use a screenshot of a particular video or not. If for a particular video that column is “no”, you are NOT allowed to display pictures from that specific video in your own publications.

The videos are unlabeled, since it is very easy to see the yawning sequences. For more details, please see:

S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, “YawDD: A Yawning Detection Dataset”, Proc. ACM Multimedia Systems, Singapore, March 19 -21 2014, pp. 24-28. DOI: 10.1145/2557642.2563678