X-ScanRefer

Citation Author(s):
Yiwei
Ma
Submitted by:
Yiwei Ma
Last updated:
Mon, 07/08/2024 - 15:58
DOI:
10.21227/r6s1-wr17
License:
0
0 ratings - Please login to submit your rating.

Abstract 

ScanReferr facilitates a clear correspondence between expressions and instances in 3D point cloud scenes, enabling effective identification of target objects. However, the explicit mention of the target object in the expression creates a shortcut that filters out negative samples, aiding model learning. In order to mitigate overreliance on this shortcut, we conducted manual processing of the ScanReferr dataset. Specifically, we replaced the name of the referring object with the term ``object'' while preserving the names of other objects. For example, consider the expression ``The trash can is to the left of the bookshelf. It is behind the chair.'' After processing, we replaced ``trash can'' with ``object'' while keeping ``bookshelf'' unchanged, resulting in the sentence ``The object is to the left of the bookshelf. It is behind the chair.'' By removing the explicit mention of the target object, the model is compelled to rely on additional information such as attributes and positional relationships within the expression to identify the target instance. We will use the term X-ScanReferer to refer to the processed dataset. 

Instructions: 

ScanReferr facilitates a clear correspondence between expressions and instances in 3D point cloud scenes, enabling effective identification of target objects. However, the explicit mention of the target object in the expression creates a shortcut that filters out negative samples, aiding model learning. In order to mitigate overreliance on this shortcut, we conducted manual processing of the ScanReferr dataset. Specifically, we replaced the name of the referring object with the term ``object'' while preserving the names of other objects. For example, consider the expression ``The trash can is to the left of the bookshelf. It is behind the chair.'' After processing, we replaced ``trash can'' with ``object'' while keeping ``bookshelf'' unchanged, resulting in the sentence ``The object is to the left of the bookshelf. It is behind the chair.'' By removing the explicit mention of the target object, the model is compelled to rely on additional information such as attributes and positional relationships within the expression to identify the target instance. We will use the term X-ScanReferer to refer to the processed dataset.