Datasets
Standard Dataset
Transparent object pose estimation dataset
- Citation Author(s):
- Submitted by:
- Munkhtulga Byambaa
- Last updated:
- Mon, 07/08/2024 - 15:59
- DOI:
- 10.21227/v95h-ge13
- License:
- Categories:
Abstract
Grasping and manipulating transparent objects with a robot is a challenge in robot vision. To successfully perform robotic grasping, 6D object pose estimation is needed. However, transparent objects are difficult to recognize because their appearance varies depending on the background, and modern 3D sensors cannot collect reliable depth data on transparent object surfaces due to the translucent, refractive, and specular surfaces. To address these challenges, we proposed a 6D pose estimation of transparent objects for manipulation. Given a single RGB image of transparent objects, the 2D keypoints are estimated using a deep neural network. Then, the PnP algorithm takes camera intrinsics, object model size, and keypoints as inputs to estimate the 6D pose of the object. Finally, the predicted poses of the transparent object were used for grasp planning. Our experiments demonstrated that our picking system is capable of grasping transparent objects from different backgrounds. To the best of our knowledge, this is the first time a robot has grasped transparent objects from a single RGB image. Furthermore, the experiments show that our method is better than the 6D pose estimation baselines and can be generalized to real-world images.
# Dataset overview
This dataset is a collection of synthetic images with ground truth annotations for research in object detection and 6D pose estimation.
The dataset contains 25k domain-randomized images and 27k photorealistic images per object.
Objects are 5 transparent objects from the [ClearGrasp dataset](https://sites.google.com/view/cleargrasp/synthetic-dataset?authuser=0).
Each frame consists of RGBD images and 3D poses, per-pixel semantic segmentation, and 2D/3D bounding box coordinates for all object instances.
The images show the objects falling onto different surfaces in different scenes (wooden (beech, walnut, and three types of oak) and uniform colors), captured by a [custom plug-in](https://github.com/NVIDIA/Dataset_Synthesizer) for Unreal Engine 4.
The dataset can be used for research in pose estimation, depth estimation from a single or stereo pair of cameras, semantic segmentation, and other applications within computer vision and robotics.
## File details
The details of the files are as follows.
### Setting files
In each data folder containing frames, there are two files describing the exported scene:
* `_object_settings.json` includes information about the objects exported. This includes
- the names of the exported object classes (`exported_object_classes`)
- details about the exported object classes (`exported_objects`), including
- the name of the class (`class`)
- numerical class ID for semantic segmentation (`segmentation_class_id`). For `mixed`, this number uniquely identifies the object class, but for `single`, this number is always 255, since there is just one object.
- 4x4 Euclidean transformation (`fixed_model_transform`). This transformation is applied to the original publicly-available YCB object in order to center and align it (translation values are in centimeters) with the coordinate system (see the discussion above on the NDDS tool). Note that this is actually the transpose of the matrix.
- dimensions of the 3D bounding cuboid along the XYZ axes (`cuboid_dimensions`)
* `_camera_settings.json` includes the intrinsics of both cameras (`camera_settings`).
#### Image files
The image files are
- RGB images: JPEG-compressed images from the virtual cameras
- depth images: Depth along the optical axis (in 0.1 mm increments)
- segmentation images: Each pixel indicates the numerical ID of the object whose surface is visible at that pixel
#### Annotation files
Each annotation file includes
- XYZ position and orientation of the camera in the world coordinate frame (`camera_data`)
- for each object,
- class name (`class`)
- visibility, defined as the percentage of the object that is not occluded (`visibility`). (0 means fully occluded whereas 1 means fully visible)
- XYZ position (in centimeters) and orientation (`location` and `quaternion_xyzw`)
- 4x4 transformation (redundant, can be computed from previous) (`pose_transform_permuted`)
- 3D position of the centroid of the bounding cuboid (in centimeters) (`cuboid_centroid`)
- 2D projection of the previous onto the image (in pixels) (`projected_cuboid_centroid`)
- 2D bounding box of the object in the image (in pixels) (`bounding_box`)
- 3D coordinates of the vertices of the 3D bounding cuboid (in centimeters) (`cuboid`)
- 2D coordinates of the projection of the above (in pixels (`projected_cuboid`)
*Note:* Like the `fixed_model_transform`, the `pose_transform_permuted` is actually the transpose of the matrix. Moreover, after transposing, the columns are permuted, and there is a sign flip (due to UE4's use of a lefthand coordinate system). Specifically, if `A` is the matrix given by `pose_transform_permuted`, then the actual transform is given by `A^T * P`, where `^T` denotes transpose, `*` denotes matrix multiplication, and the permutation matrix `P` is given by
```
[ 0 0 1]
P = [ 1 0 0]
[ 0 -1 0]
```
Dataset Files
- pr_mixed_25k.zip (28.51 GB)
- pr_single.zip (70.65 GB)
- dr_mixed_30k.zip (37.05 GB)
Documentation
Attachment | Size |
---|---|
readme.txt | 5.05 KB |