Datasets
Standard Dataset
Part of the pseudo-right image generated in the KITTI3D dataset
- Citation Author(s):
- Submitted by:
- Yuguang Shi
- Last updated:
- Sat, 05/11/2024 - 10:44
- DOI:
- 10.21227/9jba-rs12
- License:
- Categories:
- Keywords:
Abstract
One of the key problems in 3D object detection is to reduce the accuracy gap between methods based on LiDAR sensors and those based on monocular cameras. A recently proposed framework for monocular 3D detection based on Pseudo-Stereo has received considerable attention in the community. However, three problems have been discovered in existing practices: (1) relying on a high-performance monocular depth estimator, (2) the generated image suffering from visual holes, deformations, and artifacts, and (3) being difficult to be compatible with geometry-based stereo detectors. In this work, we propose a novel pseudo-stereo 3D detection framework without depth estimation, called PS-SVDM. This framework utilizes a diffusion model to generate a high-quality virtual right view from a left image to mimic the stereo camera signal. With this representation, we can apply various existing stereo image-based detection algorithms. Afterwards, we further explore the application of PS-SVDM in depth-free stereo 3D detection, and the final framework is compatible with most stereo detectors. Experiments conducted on the KITTI-3D Car category show that our method ranks $1$ st among published monocular 3D detectors.
Comments
Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection in Autonomous Driving