Computer Vision

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media

We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets c

Categories:: Artificial Intelligence
Computer Vision
Image Processing
Machine Learning

1578 Views

PIL-park

Parking Slot Detection dataset

angle, type, and location of each parking slot

Categories:: Computer Vision
Transportation

450 Views

Silhouettes for Human Posture Recognition

The data set has been consolidated for the task of Human Posture Recognition. The data set consists of four postures namely -

Sitting,
Standing,
Bending and,
Lying.

There are 1200 images for each of the postures listed above. The images have a dimension of 512 x 512 px.

Categories:: Computer Vision
Machine Learning

1744 Views

FooDD: Food Detection Dataset for Calorie Measurement Using Food Images

Images of various foods, taken with different cameras and different lighting conditions. Images can be used to design and test Computer Vision techniques that can recognize foods and estimate their calories and nutrition.

Categories:: Computer Vision

14862 Views

YawDD: Yawning Detection Dataset

A dataset of videos, recorded by an in-car camera, of drivers in an actual car with various facial characteristics (male and female, with and without glasses/sunglasses, different ethnicities) talking, singing, being silent, and yawning. It can be used primarily to develop and test algorithms and models for yawning detection, but also recognition and tracking of face and mouth. The videos are taken in natural and varying illumination conditions. The videos come in two sets, as described next:

Categories:: Computer Vision

37169 Views

YoGiR-1

The first bit of light is the gesture of being, on a massive screen of the black panorama. A small point of existence, a gesture of being. The universal appeal of gesture is far beyond the barriers of languages and planets. These are the microtransactions of symbols and patterns which have traces of the common ancestors of many civilizations.

Categories:: Artificial Intelligence
Computer Vision
Image Processing
Machine Learning
Health

244 Views

GSET Somi: A Game-Specific Eye Tracking Dataset for Somi

This is an eye tracking dataset of 84 computer game players who played the side-scrolling cloud game Somi. The game was streamed in the form of video from the cloud to the player. The dataset consists of 135 raw videos (YUV) at 720p and 30 fps with eye tracking data for both eyes (left and right). Male and female players were asked to play the game in front of a remote eye-tracking device. For each player, we recorded gaze points, video frames of the gameplay, and mouse and keyboard commands.

Categories:: Computer Vision
Image Processing

1074 Views

Coffee Leaf Rust Dataset

Three month Coffee Leaf Rust dataset generated by the Cyber Physical Data Collection System.

Categories:: Artificial Intelligence
Computer Vision
Image Processing
Machine Learning

3789 Views

Deep facial features calculated from CelebA dataset with identity (201804 objects, 128 features, 10021 identities)

Deep facial features with identity generated from CelebA dataset using facenet network (128 real-valued features). Dataset contains:
- full dataset
- training dataset
- validation dataset
Link to CelebA dataset: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

Categories:: Computer Vision
Image Processing
Machine Learning

360 Views

DiSPLaY Multimodal MedSLset (Medical Sign Language Set)

The dataset contains medical signs of the sign language including different modalities of color frames, depth frames, infrared frames, body index frames, mapped color body on depth scale, and 2D/3D skeleton information in color and depth scales and camera space. The language level of the signs is mostly Word and 55 signs are performed by 16 persons two times (55x16x2=1760 performance in total).

Categories:: Computer Vision
Image Processing
Machine Learning
Biomedical and Health Sciences
Sensors

2438 Views

Computer Vision

Computer Vision

Pages