Part of the pseudo-right image generated in the KITTI3D dataset

Citation Author(s):
Submitted by:
Yuguang Shi
Last updated:
Sat, 05/11/2024 - 10:44
0 ratings - Please login to submit your rating.


One of the key problems in 3D object detection is to reduce the accuracy gap between methods based on LiDAR sensors and those based on monocular cameras. A recently proposed framework for monocular 3D detection based on Pseudo-Stereo has received considerable attention in the community. However, three problems have been discovered in existing practices: (1) relying on a high-performance monocular depth estimator, (2) the generated image suffering from visual holes, deformations, and artifacts, and (3) being difficult to be compatible with geometry-based stereo detectors. In this work, we propose a novel pseudo-stereo 3D detection framework without depth estimation, called PS-SVDM. This framework utilizes a diffusion model to generate a high-quality virtual right view from a left image to mimic the stereo camera signal. With this representation, we can apply various existing stereo image-based detection algorithms. Afterwards, we further explore the application of PS-SVDM in depth-free stereo 3D detection, and the final framework is compatible with most stereo detectors. Experiments conducted on the KITTI-3D Car category show that our method ranks $1$ st among published monocular 3D detectors. 


For paper submission only


Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection in Autonomous Driving

Submitted by Yuguang Shi on Sat, 05/11/2024 - 10:45