Computer Vision

This dataset is part of the research presented in the paper "Phase-based Video Magnification with Handheld Cameras," published in IEEE Transactions on Circuits and Systems for Video Technology. It contains video magnification results of various objects, including an exciter, toy, cantilever beam, and simulated ball, all captured under handheld camera motion interference. The dataset includes magnified video results processed using multiple motion magnification methods, such as Phase-based, Acceleration, Jerk-aware, BVMF, VS-Phase-based, and the proposed method ("Ours").

Categories:
27 Views

Visual tracking has seen remarkable advancements, largely driven by the availability of large-scale training datasets that have enabled the development of highly accurate and robust algorithms. While significant progress has been made in tracking general objects, research on more challenging scenarios, such as tracking camouflaged objects, remains limited. Camouflaged objects, which blend seamlessly with their surroundings or other objects, present unique challenges for detection and tracking in complex environments.

Categories:
17 Views

We collect an SfM dataset composed of 17 object-centric texture-poor scenes with accurate ground-truth poses. In our dataset, low-textured objects are placed on a texture-less plane. For each object, we record a video sequence of around 30 seconds surrounding the object. The ground-truth poses per frame are estimated by ARKit and BA post-processing, with the help of textured markers, which are cropped in the test images. To impose larger viewpoint changes, we sample 60 subset image bags for each scene, similar to the IMC dataset.

Categories:
26 Views

This dataset is designed to advance research in Visual Question Answering (VQA), specifically addressing challenges related to language priors and compositional reasoning. It incorporates question labels categorizing queries based on their susceptibility to either issue, allowing for targeted evaluation of VQA models. The dataset consists of 33,051 training images and 14,165 validation images, along with 571,244 training questions and 245,087 validation questions. Among the training questions, 313,664 focus on compositional reasoning, while 257,580 pertain to language prior.

Categories:
23 Views

In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low-ligh environments. In low-light scenes, lighting may change dramatically, targets may lack distinct texture features, and in some scenarios, targets may not be directly observable.

Categories:
14 Views

The ability to train a robot to recognize human gestures is critical in enabling close proximity to Human-Robot Interaction (HRI). To that end, generating the appropriate dataset for the corresponding Machine Learning (ML) algorithm is essential. In this work, we introduced new datasets for hand gesture recognition. Given the complexity of generating thousands of physical hand gestures, we started with the basic hand gestures and developed additional synthetic gestures thus creating a comprehensive set.

Categories:
57 Views

Hyperspectral images are represented by numerous
narrow wavelength bands in the visible and near-infrared parts
of the electromagnetic spectrum. As hyperspectral imagery gains
traction for general computer vision tasks, there is an increased
need for large and comprehensive datasets for use as training
data.
Recent advancements in sensor technology allow us to capture
hyperspectral data cubes at higher spatial and temporal reso-
lution. However, there are few publicly available multi-purpose

Categories:
58 Views

The IARPA WRIVA program aims to develop software systems that can create photorealistic, navigable 3D site models using a highly limited corpus of imagery, to include ground level imagery, surveillance height imagery, airborne altitude imagery, and satellite imagery. Additionally, where imagery lacks metadata indicating geolocation, information about camera parameters, or is corrupted by artifacts, WRIVA seeks to detect and correct these factors to incorporate the imagery in site-modelling and other downstream image processing and analysis algorithms.

Categories:
156 Views

Using the PVIFS-02 whole-sky imagers, we collected 500,000 independent cloud images from 2021 to 2023, captured in a southern city and a northern city in China. The cloud images collected in southern China are clear, with obvious cloud edges. In contrast, the cloud images from northern China appear relatively blurred. This difference is attributed to the geographical characteristics of northern China, where regions are frequently affected by sand and dust, leading to a certain degree of image blurring. It brings challenges to cloud detection and classification.

 

Categories:
40 Views

Pages