Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complimentary to the information available in the individual frames of the video.

Instructions: 

 

MIT DriveSeg (Manual) Dataset is a forward facing frame-by-frame pixel level semantic labeled dataset captured from a moving vehicle during continuous daylight driving through a crowded city street.

The dataset can be downloaded from the IEEE DataPort or demoed as a video.

 

Technical Summary

Video data - 2 minutes 47 seconds (5,000 frame) 1080P (1920x1080) 30 fps

Class definitions (12) - vehicle, pedestrian, road, sidewalk, bicycle, motorcycle, building, terrain (horizontal vegetation), vegetation (vertical vegetation), pole, traffic light, and traffic sign

 

Technical Specifications, Open Source Licensing and Citation Information

Ding, L., Terwilliger, J., Sherony, R., Reimer, B. & Fridman, L. (2020). MIT DriveSeg (Manual) Dataset for Dynamic Driving Scene Segmentation. Massachusetts Institute of Technology AgeLab Technical Report 2020-1, Cambridge, MA. (pdf)

Ding, L., Terwilliger, J., Sherony, R., Reimer, B. & Fridman, L. (2020). MIT DriveSeg (Manual) Dataset. IEEE Dataport. DOI: 10.21227/mmke-dv03.

 

Related Research

Ding, L., Terwilliger, J., Sherony, R., Reimer. B. & Fridman, L. (2019). Value of Temporal Dynamics Information in Driving Scene Segmentation. arXiv preprint arXiv:1904.00758. (link)

 

Attribution and Contact Information

This work was done in collaboration with the Toyota Collaborative Safety Research Center (CSRC). For more information, click here.

For any questions related to this dataset or requests to remove Identifying information please contact driveseg@mit.edu.

 

Categories:
2583 Views

Synthetic Aperture Radar (SAR) images can be extensively informative owing to their resolution and availability. However, the removal of speckle-noise from these requires several pre-processing steps. In recent years, deep learning-based techniques have brought significant improvement in the domain of denoising and image restoration. However, further research has been hampered by the lack of availability of data suitable for training deep neural network-based systems. With this paper, we propose a standard synthetic data set for the training of speckle reduction algorithms.

Instructions: 

In Virtual SAR we have infused images with varying level of noise, which helps in improving the accuray fo blind denoising task. The holdout set can be created using images from USC SIPI Aerials database and the the provided matlab script (preprocess_holdout.m) tested on Matlab R2019b.

 

The usage for research purposes is for free. If you use this dataset, please cite the following paper along with the dataset: Virtual SAR: A Synthetic Dataset for Deep Learning based Speckle Noise Reduction Algorithms

Categories:
732 Views

This is the data for paper "Environmental Context Prediction for Lower Limb Prostheses with Uncertainty Quantification" published on IEEE Transactions on Automation Science and Engineering, 2020. DOI: 10.1109/TASE.2020.2993399. For more details, please refer to https://research.ece.ncsu.edu/aros/paper-tase2020-lowerlimb. 

Instructions: 

Seven able-bodied subjects and one transtibial amputee participated in this study. Subject_001 to Subject_007 are able-bodied participants and Subject_008 is a transtibial amputee.

 

Each folder in the subject_xxx.zip file has one continuous session of data with the following items: 

1. folder named "rpi_frames": the frames collected from the lower limb camera. Frame rate: 10 frames per second. 

2. folder named "tobii_frames": the frames collected from the on-glasses camera. Frame rate: 10 frames per second. 

3. labels_fps10.mat: synchronized terrain labels, gaze from the eye-tracking glasses, GPS coordinates, and IMU signals. 

3.1 cam_time: the timestamps for the videos, GPS, gazes, and labeled terrains (unit: second). 10Hz

3.2 imu_time: the timestamps for the IMU sensors (unit: second). 40Hz.

3.3 GPS: the GPS coordinates (latitude, longitude)

3.4 rpi_FrameIds, tobii_FrameIds: the frame ID for the lower-limb and on-glasses cameras respectively. The ids indicate the filenames in "rpi_frames" and "tobii_frames" respectively. 

3.5 rpi_IMUs, tobii_IMUs: the imu signals from the two devices. Columns: (accel_x,accel_y,accel_z,gyro_x,gyro_y,gyro_z)

3.6 terrains: the type of terrains the subjects are current on. Six terrains: tile, brick, grass, cement, upstairs, downstairs. "undefined" and "unlabelled" can be regarded as the same kind of data that needs to be deprecated.

 

The following sessions were collected during busy hours (many pedestrians were around):

'subject_005/01', 

'subject_005/02'

'subject_006/01', 

'subject_006/02', 

'subject_007/01', 

'subject_007/02', 

The following sessions were collected during non-busy hours (few pedestrians were around):

'subject_005/03', 

'subject_005/04',

'subject_006/03', 

'subject_006/04',

'subject_007/03', 

'subject_007/04',

'subject_008/01',

'subject_008/02'

The other sessions were collected without specific collecting hours (e.g. busy or non-busy). 

For the following sessions, the data collection devices were not optimized (e.g. non-optimal brightness balance). Thus, we recommend to use these sessions as training or validation dataset but not as testing data.

'subject_001/02'

'subject_003/01'

'subject_003/02'

'subject_003/03'

'subject_004/01'

'subject_004/02'

Categories:
355 Views

As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on recognizing textures and materials in real-world images, which plays an important role in object recognition and scene understanding. Aiming at describing objects or scenes with more detailed information, we explore how to computationally characterize apparent or latent properties (e.g. surface smoothness) of materials, i.e., computational material characterization, which moves a step further beyond material recognition.

Instructions: 

Dataset Characteristics and Filename Formats

 

The "CoMMonS_FullResolution" folder includes 6912 full-resolution images (2560x1920). The "CoMMonS_Sampled" folder includes sampled images (resolution: 300x300), which are sampled from full-resolution images with different positions (x, y), rotation angles (r), zoom levels (z), a touching direction ("pile"), a lightness condition ("l5"), and a camera function setting ("ed3u"). This "CoMMonS_Sampled" folder is an example of a dataset subset for training and testing (e.g. 5: 1). Our dataset focuses on material characterization for one material (fabric) in terms of one of three properties (fiber length, smoothness, and toweling effect), facilitating a fine-grained texture classification. In this particular case, the dataset is used for a standard supervised problem of material quality evaluation. It takes fabric samples with human expert ratings as training inputs, and takes fabric samples without human subject ratings as testing inputs to predict quality ratings of the testing samples. The texture patches are classified into 4 classes according to each surface property measured by human sense of touch. For example, the human expert rates surface fiber length into 4 levels, from 1 (very short) to 4 (long), and similarly for smoothness and toweling effect. In short, the "CoMMonS_Sampled" folder includes 9 subfolders, each of which includes both sampled images and attribute class labels.

Categories:
251 Views

As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on recognizing textures and materials in real-world images, which plays an important role in object recognition and scene understanding. Aiming at describing objects or scenes with more detailed information, we explore how to computationally characterize apparent or latent properties (e.g. surface smoothness) of materials, i.e., computational material characterization, which moves a step further beyond material recognition.

Categories:
75 Views

This aerial image dataset consists of more than 22,000 independent buildings extracted from aerial images with 0.0075 m spatial resolution and 450 km^2 covering in Christchurch, New Zealand. The most parts of aerial images are down-sampled to 0.3 m ground resolution and cropped into 8,189 non-overlapping tiles with 512* 512. These tiles make up the whole dataset. They are split into three parts: 4,736 tiles for training, 1,036 tiles for validation and 2,416 tiles for testing.

Categories:
55 Views

This Dataset contains "Pristine" and "Distorted" videos recorded in different places. The 

distortions with which the videos were recorded are: "Focus", "Exposure" and "Focus + Exposure". 

Those three with low (1), medium (2) and high (3) levels, forming a total of 10 conditions 

(including Pristine videos). In addition, distorted videos were exported in three different 

qualities according to the H.264 compression format used in the DIGIFORT software, which were: 

High Quality (HQ, H.264 at 100%), Medium Quality (MQ, H.264 at 75%) and Low Quality 

Instructions: 

  

0. This Dataset is intended to evaluate "Visual Quality Assessment" (VQA) and "Visual Object 

Tracking" (VOT) algorithms. It has 4476 videos with different distortions and their Bounding Box 

annotations ([x(x coordinate) y(y coordinate) w(width) h(height)]) for each frame. It also contains 

a MATLAB script which allows to generate the video sequences for VOT algorithms evaluation.

 

1. Move the "generateSequences.m" file to the "surveillanceVideosDataset" Folder.

 

2. Open the script and modify the next parameters according to your need:

 

%---------------------------------------------------------------%

                                                                              % 

%Sequence settings and images nomenclature   %

imagesType = '.jpg';                                              %

imgFolder = 'img';                                                 %  

gtName = 'groundtruth.txt';                                   %

imgNomenclature = ['%04d' imagesType];           %

                                                                             %

%--------------------------------------------------------------%

 

The last configuration will create a folder like this for each video:

 

0001SequenceExample (Folder)

- - img (Folder)

- - - - 0001.jpg (Image)

- - - - 0002.jpg (Image)

- - - - ....

- - - - ....

- - - - ....

- - - - 0451.jpg (Image)

- - groundtruth.txt (txt file: Bounding Box Annotations)

 

3. Press "Run" and wait until the sequences are built. The process can take a long time due to the 

number of videos. You will need 33 GB for the videos, 30 MB for the Bounding Box annotations and 230 

GB for the sequences (.jpg format).

 

--------------------------------------------------------------------------------------------------------------------------------------------

 

 

Categories:
222 Views

The PRIME-FP20 dataset is established for development and evaluation of retinal vessel segmentation algorithms in ultra-widefield (UWF) fundus photography (FP). PRIME-FP20 provides 15 high-resolution UWF FP images acquired using the Optos 200Tx camera (Optos plc, Dunfermline, United Kingdom), the corresponding labeled binary vessel maps, and the corresponding binary masks for the valid data region for the images. For each UWF FP image, a concurrently captured UWF fluorescein angiography (FA) is also included. 

Instructions: 

UWF FP images, UWF FA images, labeled UWF FP vessel maps, and binary UWF FP validity masks are provided, where the file names indicate the correspondence among them.

 

Users of the dataset should cite the following paper

L. Ding, A. E. Kuriyan, R. S. Ramchandran, C. C. Wykoff, and G. Sharma, ``Weakly-supervised vessel detection in ultra-widefield fundus photography via iterative multi-modal registration and learning,'' IEEE Trans. Medical Imaging, accepted for publication, to appear.

 

Categories:
517 Views

Dataset asscociated with a paper in IEEE Transactions on Pattern Analysis and Machine Intelligence

"The perils and pitfalls of block design for EEG classification experiments"

DOI: 10.1109/TPAMI.2020.2973153

 If you use this code or data, please cite the above paper.

Instructions: 

See the paper "The perils and pitfalls of block design for EEG classification experiments" on IEEE Xplore.

DOI: 10.1109/TPAMI.2020.2973153

Code for analyzing the dataset is included in the online supplementary materials for the paper.

The code and the appendix from the online supplementary materials are also included here.

If you use this code or data, please cite the above paper.

Categories:
309 Views

 

Instructions: 

The dataset is stored as a tarball (.tar.gz). Data can be extracted on most Linux systems using the command `tar -xzvf ASIs.tar.gz`. On MacOS systems the Archive Utility should be able to extract .tar.gz files by default. On Windows systems third-party software such as 7zip is available to extract tarballs. Alternatively the Windows Linux Subsystem can be used with the command `tar -xzvf ASIs.tar.gz`. 

 

Categories:
103 Views

Pages