Abstract 

 

There exist several commonly used datasets in relation to object detection that include COCO (with multiple versions) and ImageNet containing large annotations for 80 and 1000 objects (i.e. classes) respectively. However, very limited datasets are available comprising specific objects identified by visually imapeired people (VIP) such as wheel-bins, trash-Bags, e-Scooters, advertising boards, and bollard. Furthermore, the annotations for these objects are not available in existing sources.

We identified a publically available 3D-scan dataset (without annotations) comprising variety of required objects including benches, advertising boards, pole, and wheel-bins  CITATION Sun16 \l 2057 [1]. The 3D-scans were captured using R-GBD camera from varying perspectives, orientations, distances, and angles producing more natural representations of data diversity as compared to augmented data generation (such as zoom in/out, translation, rotation, shear etc.). We transformed 3D-Scans of required objects (bins, advert-Boards, poles, and benches) to corresponding image frames.

For the trash-Bags and e-Scooters, we used publically available google images (with NY-CC license). The cars and persons annotated datasets are acquired from public sources [2] and [3], respectively. We then annotate the images using public annotation tool (DarkLabel: https://darkpgmr.tistory.com/16) in the required form (bounding boxes, class label) to be used for the object detection.

 

 

[1] Q.-Y. Z. S. M. V. K. Sungjoon Choi, “A Large Dataset of Object Scans,” arXiv, 2016.

 

 

[2] M. S. J. D. L. F.-F. Jonathan Krause, “3D Object Representations for Fine-Grained Categorization,” in 4th IEEE Workshop on 3D Representation and Recognition, ICCV (3dRR-13), Sydney, Australia, 2013.

 

 

[3] P. L. C. C. L. X. T. Yubin DENG, “Pedestrian Attribute Recognition At Far Distance,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014.

 

 

Instructions: 

Dataset and Annotations for the Outdoor Objects Identified as Potential Hazards by Visually Impaired People

 

A.    Folder ‘annotated dataset used for custom training’

This folder contains the dataset and annotations for 8 objects (cars, bins, poles, person, bench, wheel-bin, advertising board, trash-bags) used for the custom-training of proposed object detection models (YOLOv5s and Mask R-CNN). The images are in ‘jpeg’ format while the annotations are on VOC (.xml) format comprising, objects’ class and bounding box information.

 

B.    Folder ‘Additional Dataset’

We also produce an additional dataset comprising 41,411 images of aforementioned objects. The objects in 3D-Scans (http://redwood-data.org/3dscan/dataset.html) are segmented using un-supervised algorithm (scale invariant pattern matching) and the identified regions are automatically cropped and stored in corresponding image files (jpeg).