Computer Vision
This paper describes a dataset of droplet images captured using the sessile drop technique, intended for applications in wettability analysis, surface characterization, and machine learning model training. The dataset comprises both original and synthetically augmented images to enhance its diversity and robustness for training machine learning models. The original, non-augmented portion of the dataset consists of 420 images of sessile droplets. To increase the dataset size and variability, an augmentation process was applied, generating 1008 additional images.
- Categories:
The OnePose dataset contains over 450 video sequences of 150 objects. For each object, multiple video recordings, accompanied camera poses and 3D bounding box annotations are provided. These sequences are collected under different background environments, and each has an average recording-length of 30 seconds covering all views of the object. The dataset is randomly divided into training and validation sets. For each object in the validation set, we assign one mapping sequence for building the SfM map and use a test sequence for the evaluation.
- Categories:
Humanoid robots are anticipated to play integral roles in domestic settings, aiding individuals with everyday tasks. Given the prevalence of translucent storage containers in households, characterized by their practicality and transparency, it becomes imperative to equip humanoid robots with the capability to localize and manipulate these objects accurately. Consequently, 6D pose estimation of objects is a crucial area of research to advance robotic manipulation.
- Categories:
Decentralized Collaborative Simultaneous Localization and Mapping (C-SLAM) is essential to enable multi-robot missions in unknown environments without relying on pre-existing localization and communication infrastructure. This technology is anticipated to play a key role in the exploration of the Moon, Mars, and other planets. In this work, we introduce a novel dataset collected during C-SLAM experiments involving three robots operating on a Mars analogue terrain.
- Categories:
This dataset is part of the research presented in the paper "Phase-based Video Magnification with Handheld Cameras," published in IEEE Transactions on Circuits and Systems for Video Technology. It contains video magnification results of various objects, including an exciter, toy, cantilever beam, and simulated ball, all captured under handheld camera motion interference. The dataset includes magnified video results processed using multiple motion magnification methods, such as Phase-based, Acceleration, Jerk-aware, BVMF, VS-Phase-based, and the proposed method ("Ours").
- Categories:
Abstract
- Categories:
Visual tracking has seen remarkable advancements, largely driven by the availability of large-scale training datasets that have enabled the development of highly accurate and robust algorithms. While significant progress has been made in tracking general objects, research on more challenging scenarios, such as tracking camouflaged objects, remains limited. Camouflaged objects, which blend seamlessly with their surroundings or other objects, present unique challenges for detection and tracking in complex environments.
- Categories:
We collect an SfM dataset composed of 17 object-centric texture-poor scenes with accurate ground-truth poses. In our dataset, low-textured objects are placed on a texture-less plane. For each object, we record a video sequence of around 30 seconds surrounding the object. The ground-truth poses per frame are estimated by ARKit and BA post-processing, with the help of textured markers, which are cropped in the test images. To impose larger viewpoint changes, we sample 60 subset image bags for each scene, similar to the IMC dataset.
- Categories:
This dataset is designed to advance research in Visual Question Answering (VQA), specifically addressing challenges related to language priors and compositional reasoning. It incorporates question labels categorizing queries based on their susceptibility to either issue, allowing for targeted evaluation of VQA models. The dataset consists of 33,051 training images and 14,165 validation images, along with 571,244 training questions and 245,087 validation questions. Among the training questions, 313,664 focus on compositional reasoning, while 257,580 pertain to language prior.
- Categories:
In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low-ligh environments. In low-light scenes, lighting may change dramatically, targets may lack distinct texture features, and in some scenarios, targets may not be directly observable.
- Categories: