Skip to main content

Computer Vision

The visual sensor captures images of the crane loading operation scene, while simultaneously collecting the motion control commands from the crane's operational control end. A neural network model is trained to predict the crane's motion control commands in an end-to-end manner.

Categories:

We release a large-scale endoscopic video dataset covering seven types of intraoperative adverse events (iAEs) across heterogeneous surgical domains. Source domain: Cholec80 is re-annotated for iAEs detection from laparoscopic cholecystectomy videos. Target domain: dViAEs comprises robot-assisted colorectal and HPB surgery videos.

Categories:

ITDAV-25 (Indian Thermal Dataset for Autonomous Vehicles), a thermal image dataset specifically curated to advance research in Advanced Driver Assistance Systems (ADAS), particularly for environments characterized by low visibility, night-time conditions, and inclement weather. The dataset comprises of 13,688 raw thermal images, collected without any synthetic augmentation techniques.

Categories:

The use of technology in cricket has seen a significant increase in recent years, leading to overlapping computer vision-based research efforts. This study aims to extract front pitch view shots in cricket broadcasts by utilizing deep learning. The front pitch view (FPV) shots include ball delivery by the bowler and the stroke played by the batter. FPV shots are valuable for highlight generation, automatic commentary generation and bowling and batting techniques analysis. We classify each broadcast video frame as FPV and non-FPV using deep-learning models.

Categories:

<p>Electrocardiogram (ECG) interpretation is critical for diagnosing a wide range of cardiovascular conditions. To streamline and accelerate the development of deep learning models in this domain, we present a novel, image-based version of the PTB Diagnostic ECG Database tailored for use with convolutional neural networks (CNNs), vision transformers (ViTs), and other image classification architectures. This enhanced dataset consists of 516 grayscale .png images, each representing a 12-lead ECG signal arranged as a 2D matrix (12 × T, where T is the number of time steps).

Categories:
We evaluate the performance of our proposed method using four benchmark datasets: MNIST, CIFAR-10, Traffic-sign Recognition (TSR), and Room-occupancy Detection (ROD). Each dataset is divided into training and test sets, with specific proportions as described below.MNIST: This dataset consists of grayscale images of handwritten digits, with 10 distinct classes. It includes 60,000 training images and 10,000 test images, each formatted as a 28x28 pixel grayscale map.CIFAR-10: Unlike MNIST, CIFAR-10 is a dataset of color images.
Categories:

CIFAR-10 and CIFAR-100 datasets comprise images of 10 and 100 categories, respectively, with a fixed size of 32x32 pixels in color.

Tiny-ImageNet dataset consists of 200 categories with approximately 120,000 samples, where each class contains 500 training images, 50 validation images, and 50 test images, with each image sized at 64 x 64. 

Categories:

The Tiny-ImageNet dataset contains 200 categories and approximately 120,000 samples. The CIFAR-10 and CIFAR-100 datasets respectively contain 10 and 100 categories. 

All experiments were conducted on a server equipped with two NVIDIA A100 GPUs (each with 80GB memory), running the Ubuntu 20.04 operating system and the CUDA 11.8 computing platform under the Pytorch 1.8 framework. The server has 256GB of memory and is powered by a 64-core Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz.

Categories:

To promote the development of camouflaged object detection technology, an visible-infrared artificial camouflage dataset (VIAC) is constructed. To simulate and replicate real-world scenarios, we customize and procure a set of metal models and camouflage materials to construct artificial camouflage environments. Utilizing DJI drones equipped with a dual-mode (visible and infrared) imaging system, we conduct coordinated aerial photography in complex outdoor settings, thereby comprehensively acquiring 1,500 pairs of high-quality visible and infrared images.

Categories:

Synthetic image data set using a generative model with explicit control over the head pose. HPGEN offers a promising solution to address data set bias in the head pose estimation as current benchmarks suffer from a limited number of images, imbalanced data distributions, the high cost of annotation, and ethical concerns.

Categories: