Computer Vision

Visual storytelling refers to the manner of describing a set of images rather than a single image, also known as multi-image captioning. Visual Storytelling Task (VST) takes a set of images as input and aims to generate a coherent story relevant to the input images. In this dataset, we bridge the gap and present a new dataset for expressive and coherent story creation. We present the Sequential Storytelling Image Dataset (SSID), consisting of open-source video frames accompanied by story-like annotations.


The FMK (Finger Major Knuckle) dataset was proposed and created to support the experiments of identity verificatio of knuckles of middle and thumb fingers modalites. The images of this dataset were captured using the rear camera of an OPPO A12 smartphone. This dataset was created from 20 different subjects between the ages of 30 and 67. For each subject there are 3 images of major knuckle for the middle finger and 3 images of major knuckle for thumb finger.. The FMK dataset was proposed and constructed for testing and evaluation.


A short time ago, the study of contactless fingerprint authentication gained appeal among biometric researchers. Contactless fingerprint systems offer various advantages, such as ease of capture and affordability, over conventional fingerprint identification systems, which demand that the user's finger make direct contact with the sensor.


This robust dataset is extracted from the International Skin Imaging Collaboration (ISIC). Similar datasets are used for the annual ISIC Challenge, presenting an opportunity for the computer science community to produce algorithms that can outperform professional dermatology. The submitted dataset contains approximately 1,000 images of malignant melanomas, as well as approximately 1,000 images of benign melanomas.


SYPHAX dataset was collected from Tunisia in “Sfax” city, the second largest Tunisian city after the capital. A total of 2008 images were gathered through manual collection one by one, with each image energizing text detection challenges in nature according to real existing complexity of 15 different routes (downtown, Nasryia, Sidi Mansour, Sakiet Ezziet, Sakiet Eddayer, Mahdia road, Tunis main road, Chehia, Taniour, Lafran, Elayn, Gremda, Manzel Chaker, Matar route, Gabes road) along with ring roads, intersections and roundabouts.


This dataset consists of high-resolution visible-spectrum (RGB) and thermal infrared (TIR) images of two vineyards (Vitis vinifera L.) with varieties of Mouhtaro and Merlot, which was captured by Unmanned Aerial Vehicle (UAV) carrying TIR and RGB sensors three times in a cultivation period.


In the realm of real-time communications, WebRTC-based multimedia applications are increasingly prevalent as these can be smoothly integrated within Web browsing sessions. The browsing experience is then significantly improved concerning scenarios where browser add-ons and/or plug-ins are used; still, the end user's Quality of Experience (QoE) in WebRTC sessions may be affected by network impairments, such as delays and losses.


This dataset was initially collected by Mrs Athira P K  with the help of  teachers and students of Rahmania school for handicapped, Kozhikode, Kerala, India. Later the dataset was extended by many other BTech and MTech students with the help of their friends.

MUDRA NITC dataset consists of videos of static and dynamic gestures of Indian sign language. In static gestures mainly static alphabets videos and  preprocessed image frames are included.


With the increasing use of drones for surveillance and monitoring purposes, there is a growing need for reliable and efficient object detection algorithms that can detect and track objects in aerial images and videos. To develop and test such algorithms,  datasets of aerial videos captured from drones are essential.