Computer Vision

Visual tracking has seen remarkable advancements, largely driven by the availability of large-scale training datasets that have enabled the development of highly accurate and robust algorithms. While significant progress has been made in tracking general objects, research on more challenging scenarios, such as tracking camouflaged objects, remains limited. Camouflaged objects, which blend seamlessly with their surroundings or other objects, present unique challenges for detection and tracking in complex environments.

Categories:
32 Views

We collect an SfM dataset composed of 17 object-centric texture-poor scenes with accurate ground-truth poses. In our dataset, low-textured objects are placed on a texture-less plane. For each object, we record a video sequence of around 30 seconds surrounding the object. The ground-truth poses per frame are estimated by ARKit and BA post-processing, with the help of textured markers, which are cropped in the test images. To impose larger viewpoint changes, we sample 60 subset image bags for each scene, similar to the IMC dataset.

Categories:
44 Views

This dataset is designed to advance research in Visual Question Answering (VQA), specifically addressing challenges related to language priors and compositional reasoning. It incorporates question labels categorizing queries based on their susceptibility to either issue, allowing for targeted evaluation of VQA models. The dataset consists of 33,051 training images and 14,165 validation images, along with 571,244 training questions and 245,087 validation questions. Among the training questions, 313,664 focus on compositional reasoning, while 257,580 pertain to language prior.

Categories:
31 Views

In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low-ligh environments. In low-light scenes, lighting may change dramatically, targets may lack distinct texture features, and in some scenarios, targets may not be directly observable.

Categories:
24 Views

The ability to train a robot to recognize human gestures is critical in enabling close proximity to Human-Robot Interaction (HRI). To that end, generating the appropriate dataset for the corresponding Machine Learning (ML) algorithm is essential. In this work, we introduced new datasets for hand gesture recognition. Given the complexity of generating thousands of physical hand gestures, we started with the basic hand gestures and developed additional synthetic gestures thus creating a comprehensive set.

Categories:
108 Views

Hyperspectral images are represented by numerous
narrow wavelength bands in the visible and near-infrared parts
of the electromagnetic spectrum. As hyperspectral imagery gains
traction for general computer vision tasks, there is an increased
need for large and comprehensive datasets for use as training
data.
Recent advancements in sensor technology allow us to capture
hyperspectral data cubes at higher spatial and temporal reso-
lution. However, there are few publicly available multi-purpose

Categories:
84 Views

The IARPA WRIVA program aims to develop software systems that can create photorealistic, navigable 3D site models using a highly limited corpus of imagery, to include ground level imagery, surveillance height imagery, airborne altitude imagery, and satellite imagery. Additionally, where imagery lacks metadata indicating geolocation, information about camera parameters, or is corrupted by artifacts, WRIVA seeks to detect and correct these factors to incorporate the imagery in site-modelling and other downstream image processing and analysis algorithms.

Categories:
377 Views

Using the PVIFS-02 whole-sky imagers, we collected 500,000 independent cloud images from 2021 to 2023, captured in a southern city and a northern city in China. The cloud images collected in southern China are clear, with obvious cloud edges. In contrast, the cloud images from northern China appear relatively blurred. This difference is attributed to the geographical characteristics of northern China, where regions are frequently affected by sand and dust, leading to a certain degree of image blurring. It brings challenges to cloud detection and classification.

 

Categories:
73 Views

The EuroSAT-SAR dataset is a SAR version of the EuroSAT dataset. We matched each Sentinel-2 image in EuroSAT with one Sentinel-1 patch according to the geospatial coordinates, ending up with 27,000 dual-pol Sentinel-1 SAR images divided in 10 classes. The EuroSAT-SAR dataset was collected as one downstream task in the work FG-MAE to serve as a CIFAR-like, clean, balanced ML-ready dataset for remote sensing SAR image recognition.

Categories:
131 Views

The NEU-DET dataset is a collection of images featuring surface defects on hot-rolled steel strips. These defects are categorized into six classes: cracks (cr), inclusions (in), patches (pa), pitted surfaces (ps), rolled-in scale (rs), and scratches (sc). The dataset contains 300 grayscale images for each category, totaling 1800 images, with each image sized at 200×200 pixels.

Categories:
299 Views

Pages