Name: Part of the pseudo-right image generated in the KITTI3D dataset
Creator: Yuguang Shi
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence

Abstract

One of the key problems in 3D object detection is to reduce the accuracy gap between methods based on LiDAR sensors and those based on monocular cameras. A recently proposed framework for monocular 3D detection based on Pseudo-Stereo has received considerable attention in the community. However, three problems have been discovered in existing practices: (1) relying on a high-performance monocular depth estimator, (2) the generated image suffering from visual holes, deformations, and artifacts, and (3) being difficult to be compatible with geometry-based stereo detectors. In this work, we propose a novel pseudo-stereo 3D detection framework without depth estimation, called PS-SVDM. This framework utilizes a diffusion model to generate a high-quality virtual right view from a left image to mimic the stereo camera signal. With this representation, we can apply various existing stereo image-based detection algorithms. Afterwards, we further explore the application of PS-SVDM in depth-free stereo 3D detection, and the final framework is compatible with most stereo detectors. Experiments conducted on the KITTI-3D Car category show that our method ranks $1$ st among published monocular 3D detectors.

Instructions:

For paper submission only

Comments

Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection in Autonomous Driving

Submitted by Yuguang Shi on Sat, 05/11/2024 - 10:45

Dataset Files

view128 (2).zip (1.08 GB)

Datasets

Standard Dataset

Part of the pseudo-right image generated in the KITTI3D dataset

Abstract

Comments

Dataset Files

QUESTIONS?