Transparent object pose estimation dataset

Name: Transparent object pose estimation dataset
Creator: Munkhtulga Byambaa
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Computer Vision

Citation Author(s):: Munkhtulga Byambaa

Gou Koutaki

Lodoiravsal Choimaa
Submitted by:: Munkhtulga Byambaa
Last updated:: Mon, 07/08/2024 - 19:59
DOI:: 10.21227/v95h-ge13

86 views

Categories:

Computer Vision

ACCESS DATASET CITE

Abstract

Grasping and manipulating transparent objects with a robot is a challenge in robot vision. To successfully perform robotic grasping, 6D object pose estimation is needed. However, transparent objects are difficult to recognize because their appearance varies depending on the background, and modern 3D sensors cannot collect reliable depth data on transparent object surfaces due to the translucent, refractive, and specular surfaces. To address these challenges, we proposed a 6D pose estimation of transparent objects for manipulation. Given a single RGB image of transparent objects, the 2D keypoints are estimated using a deep neural network. Then, the PnP algorithm takes camera intrinsics, object model size, and keypoints as inputs to estimate the 6D pose of the object. Finally, the predicted poses of the transparent object were used for grasp planning. Our experiments demonstrated that our picking system is capable of grasping transparent objects from different backgrounds. To the best of our knowledge, this is the first time a robot has grasped transparent objects from a single RGB image. Furthermore, the experiments show that our method is better than the 6D pose estimation baselines and can be generalized to real-world images.

Instructions:

# Dataset overview

This dataset is a collection of synthetic images with ground truth annotations for research in object detection and 6D pose estimation.

The dataset contains 25k domain-randomized images and 27k photorealistic images per object.

Objects are 5 transparent objects from the [ClearGrasp dataset](https://sites.google.com/view/cleargrasp/synthetic-dataset?authuser=0).

Each frame consists of RGBD images and 3D poses, per-pixel semantic segmentation, and 2D/3D bounding box coordinates for all object instances.

The images show the objects falling onto different surfaces in different scenes (wooden (beech, walnut, and three types of oak) and uniform colors), captured by a [custom plug-in](https://github.com/NVIDIA/Dataset_Synthesizer) for Unreal Engine 4.

The dataset can be used for research in pose estimation, depth estimation from a single or stereo pair of cameras, semantic segmentation, and other applications within computer vision and robotics.

## File details

The details of the files are as follows.

### Setting files

In each data folder containing frames, there are two files describing the exported scene:

* `_object_settings.json` includes information about the objects exported. This includes

- the names of the exported object classes (`exported_object_classes`)

- details about the exported object classes (`exported_objects`), including

- the name of the class (`class`)

- numerical class ID for semantic segmentation (`segmentation_class_id`). For `mixed`, this number uniquely identifies the object class, but for `single`, this number is always 255, since there is just one object.

- 4x4 Euclidean transformation (`fixed_model_transform`). This transformation is applied to the original publicly-available YCB object in order to center and align it (translation values are in centimeters) with the coordinate system (see the discussion above on the NDDS tool). Note that this is actually the transpose of the matrix.

- dimensions of the 3D bounding cuboid along the XYZ axes (`cuboid_dimensions`)

* `_camera_settings.json` includes the intrinsics of both cameras (`camera_settings`).

#### Image files

The image files are

- RGB images: JPEG-compressed images from the virtual cameras

- depth images: Depth along the optical axis (in 0.1 mm increments)

- segmentation images: Each pixel indicates the numerical ID of the object whose surface is visible at that pixel

#### Annotation files

Each annotation file includes

- XYZ position and orientation of the camera in the world coordinate frame (`camera_data`)

- for each object,

- class name (`class`)

- visibility, defined as the percentage of the object that is not occluded (`visibility`). (0 means fully occluded whereas 1 means fully visible)

- XYZ position (in centimeters) and orientation (`location` and `quaternion_xyzw`)

- 4x4 transformation (redundant, can be computed from previous) (`pose_transform_permuted`)

- 3D position of the centroid of the bounding cuboid (in centimeters) (`cuboid_centroid`)

- 2D projection of the previous onto the image (in pixels) (`projected_cuboid_centroid`)

- 2D bounding box of the object in the image (in pixels) (`bounding_box`)

- 3D coordinates of the vertices of the 3D bounding cuboid (in centimeters) (`cuboid`)

- 2D coordinates of the projection of the above (in pixels (`projected_cuboid`)

*Note:* Like the `fixed_model_transform`, the `pose_transform_permuted` is actually the transpose of the matrix. Moreover, after transposing, the columns are permuted, and there is a sign flip (due to UE4's use of a lefthand coordinate system). Specifically, if `A` is the matrix given by `pose_transform_permuted`, then the actual transform is given by `A^T * P`, where `^T` denotes transpose, `*` denotes matrix multiplication, and the permutation matrix `P` is given by

```

[ 0 0 1]

P = [ 1 0 0]

[ 0 -1 0]

```

Funding Agency

MJEED

Grant Number

J14C16

Dataset Files

DOCUMENTATION

readme.txt

Datasets

Standard Dataset

Transparent object pose estimation dataset

Abstract

Instructions:

Dataset Files

DOCUMENTATION

QUESTIONS?

More from this Author

Dataset

More like this Dataset

The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs)

YawDD: Yawning Detection Dataset

OSCD - Onera Satellite Change Detection

Retinal Fundus Multi-disease Image Dataset (RFMiD)

UBFC-Phys

Data Fusion Contest 2019 (DFC2019)