A dataset of videos, recorded by an in-car camera, of drivers in an actual car with various facial characteristics (male and female, with and without glasses/sunglasses, different ethnicities) talking, singing, being silent, and yawning. It can be used primarily to develop and test algorithms and models for yawning detection, but also recognition and tracking of face and mouth. The videos are taken in natural and varying illumination conditions. The videos come in two sets, as described next: 


You can use all videos for research. Also, you can display the screenshots of some (not all) videos in your own publications. Please check the Allow Researchers to use picture in their paper column in the table to see if you can use a screenshot of a particular video or not. If for a particular video that column is “no”, you are NOT allowed to display pictures from that specific video in your own publications.

The videos are unlabeled, since it is very easy to see the yawning sequences. For more details, please see:

S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, “YawDD: A Yawning Detection Dataset”, Proc. ACM Multimedia Systems, Singapore, March 19 -21 2014, pp. 24-28. DOI: 10.1145/2557642.2563678


The first bit of light is the gesture of being, on a massive screen of the black panorama. A small point of existence, a gesture of being. The universal appeal of gesture is far beyond the barriers of languages and planets. These are the microtransactions of symbols and patterns which have traces of the common ancestors of many civilizations.Gesture recognition is important to make communication between the computer system and humans, in the present era many studies are going on regarding the gesture recognition systems.


This is an eye tracking dataset of 84 computer game players who played the side-scrolling cloud game Somi. The game was streamed in the form of video from the cloud to the player. The dataset consists of 135 raw videos (YUV) at 720p and 30 fps with eye tracking data for both eyes (left and right). Male and female players were asked to play the game in front of a remote eye-tracking device. For each player, we recorded gaze points, video frames of the gameplay, and mouse and keyboard commands.


- AVI offset represents the frame from which data gathering has been started.

- The 1st frame of each YUV file is the 901st frame of its corresponding AVI file.

- For detailed info and instructions, please see:

Hamed Ahmadi, Saman Zad Tootaghaj, Sajad Mowlaei, Mahmoud Reza Hashemi, and Shervin Shirmohammadi, “GSET Somi: A Game-Specific Eye Tracking Dataset for Somi”, Proc. ACM Multimedia Systems, Klagenfurt am Wörthersee, Austria, May 10-13 2016, 6 pages. DOI: 10.1145/2910017.2910616


A set of chest CT data sets from multi-centre hospitals included five categories


Three month Coffee Leaf Rust dataset generated by the Cyber Physical Data Collection System.


Deep facial features with identity generated from CelebA dataset using facenet network (128 real-valued features). Dataset contains: - full dataset- training dataset- validation datasetLink to CelebA dataset: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html


The dataset contains medical signs of the sign language including different modalities of color frames, depth frames, infrared frames, body index frames, mapped color body on depth scale, and 2D/3D skeleton information in color and depth scales and camera space. The language level of the signs is mostly Word and 55 signs are performed by 16 persons two times (55x16x2=1760 performance in total).



The signs are collected at Shahid Beheshti University, Tehran, and show local gestures. The SignCol software (code: https://github.com/mohaEs/SignCol , paper: https://doi.org/10.1109/ICSESS.2018.8663952 ) is used for defining the signs and also connecting to Microsoft Kinect v2 for collecting the multimodal data, including frames and skeletons. Two demonstration videos of the signs are available at youtube: vomit: https://youtu.be/yl6Tq7J9CH4 , asthma spray: https://youtu.be/PQf8p_YNYfo . Demonstration videos of the SignCol are also available at https://youtu.be/_dgcK-HPAak and https://youtu.be/yMjQ1VYWbII .

The dataset contains 13 zip files totally: One zipfile contains readme, sample codes and data (Sample_AND_Codes.zip), the next zip file contains sample videos (Sample_Videos.zip) and other 11 zip files contain 5 signs in each (e.g. Signs(11-15).zip). For quick start, consider the Sample_AND_Codes.zip.

Each performed gesture is located in a directory named in Sign_X_Performer_Y_Z format which shows the Xth sign performed by the Yth person at the Znd iteration (X=[1,...,55], Y=[1,...,16], Z=[1,2]). The actual names of the signs are listed in the file: table_signs.csv.

Each directory includes 7 subdirectories:

1.      Times: time information of frames saved in CSV file.

2.      Color Frames: RGB frames saved in 8 bits *.jpg format with the size of 1920x1080.

3.      Infrared Frames: Infrared frames saved in 8 bits *.jpg format with the size of 512x424.

4.      Depth Frames: Depth frames saved in 8 bits *.jpg format with the size of 512x424.

5.      Body Index Frames: Body Index frames scaled in depth saved in 8 bits *.jpg format with the size of 512x424.

6.      Body Skels data: For each frame, there is a CSV file containing 25 rows according to 25 joints of body and columns for specifying the joint type, locations and space environments. Each joint location is saved in three spaces, 3D camera space, 2D depth space (image) and 2D color space (image). The 21 joints are visible in this dataset.

7.      Color Body Frames: frames of RGB Body scaled in depth frame saved in 8 bits *.jpg format with the size of 512x424.


Frames are saved as a set of numbered images and the MATLAB script PrReadFrames_AND_CreateVideo.m shows how to read frames and also how to create videos, if is required.

The 21 visible joints are Spine Base, Spine Mid, Neck, Head, Shoulder Left, Elbow Left, Wrist Left, Hand Left, Shoulder Right, Elbow Right, Wrist Right, Hand Right, Hip Left, Knee Left, Hip Right, Knee Right, Spine Shoulder, Hand TipLeft, Thumb Left, Hand Tip Right, Thumb Right. The MATLAB script PrReadSkels_AND_CreateVideo.m shows an example of reading joint’s informtaion, fliping them and drawing the skeleton on depth and color scale.

The updated information about the dataset and corresponding paper are available at GitHub repository MedSLset.

Terms and conditions for the use of dataset: 

1- This dataset is released for academic research purposes only.

2- Please cite both the paper and dataset if you found this data useful for your research. You can find the references and bibtex at MedSLset.

3- You must not distribute the dataset or any parts of it to others. 

4- The dataset just inclues image, text and video files and is scanned via malware protection softwares. You accept full responsibility for your use of the dataset. This data comes with no warranty or guarantee of any kind, and you accept full liability.

5- You will treat people appearing in this data with respect and dignity.

6- You will not try to identify and recognize the persons in the dataset.


We build an original dataset of thermal videos and images that simulate illegal movements around the border and in protected areas and are designed for training machines and deep learning models. The videos are recorded in areas around the forest, at night, in different weather conditions – in the clear weather, in the rain, and in the fog, and with people in different body positions (upright, hunched) and movement speeds (regu- lar walking, running) at different ranges from the camera.



About 20 minutes of recorded material from the clear weather scenario, 13 minutes from the fog scenario, and about 15 minutes from rainy weather were processed. The longer videos were cut into sequences and from these sequences individual frames were extracted, resulting in 11,900 images for the clear weather, 4,905 images for the fog, and 7,030 images for the rainy weather scenarios.

A total of 6,111 frames were manual annotated so that could be used to train the supervised model for person detection. When selecting the frames, it was taken into account that the selected frames include different weather conditions so that in the set there were 2,663 frames shot in clear weather conditions, 1,135 frames of fog, and 2,313 frames of rain.

The annotations were made using the open-source Yolo BBox Annotation Tool that can simultaneously store annotations in the three most popular machine learning annotation formats YOLO, VOC, and MS COCO so all three annotation formats are available. The image annotation consists of a centroid position of the bounding box around each object of interest, size of the bounding box in terms of width and height, and corresponding class label (Human or Dog).



Mosquito bites result in the deaths of more than 1 million people every year.   Certain species of mosquitos like Aedes are the main vector of arboviruses that cause Dengue, Malaria and Yellow fever. Image based mosquito species classification can be helpful to implement strategies to prevent the spread of mosquito borne disease. Automated mosquito species classification can aid in laborious and time consuming task of entomologists besides enhancing accuracy.


The dataset consists of images of two species of mosquitoes namely Aedes and Culex .  

There are 810 images of Aedes and 594 images of Culex class.


Recognition and classification of currency is one of the important task. It is a very crucial task for visually impaired people. It helps them while doing day to day financial transactions with shopkeepers while traveling, exchanging money at banks, hospitals, etc. The main objectives to create this dataset were:

        1)      Create a dataset of old and new Indian currency.

        2)      Create a dataset of Thai Currency.

        3)      Dataset consists of high-quality images.


The dataset consists of 10 classes namely 10 New, 10 Old, 20, 50 New, 50 Old, 100 New, 100 Old, 200, 500, 2000 of Indian banknotes and 5 classes namely 20, 50, 100, 500, and 2000 for Thai bank notes.