Abstract

Data augmentation has been widely adopted for object detection in 2D image and 3D point clouds. However, existing multimodal data augmentation is only a brief reference to single-modal work, and the challenge is to ensure the consistency and rationality of the augmented image and point cloud simultaneously. We propose a novel multimodal data augmentation method based on ground truth sampling (GT sampling) for generating content-rich synthetic scenes. We first built a GT database and a scene ground database based on the raw training set, then used the context of the image and point cloud to guide the paste location and filtering strategy of the samples. We demonstrate the effectiveness of this multimodal 3D object detector training strategy on the publicly available KITTI dataset. Our experiments evaluated different superimposition strategies ranging from context-free GT sampling in raw scenes, all the way to context-guided semantics informed positioning and filtering in new training scenes. Our method outperforms existing GT sampling methods with more than 15% relative performance improvement on benchmark datasets. In ablation studies, our sample pasting strategy brings +2.81% gain compared to previous work, the superior performance demonstrates that the multimodal context of modeled objects is crucial for placing them in the correct environment.

Instructions:

Context-guided Ground Truth Sampling for Multi-Modality Data Augmentation in Autonomous Driving

Dataset Files

train_log.zip (482.83 kB)
train_log.zip (482.83 kB)

Datasets

Standard Dataset

Context-guided Ground Truth Sampling for Multi-Modality Data Augmentation in Autonomous Driving

Abstract

Dataset Files

QUESTIONS?