First Name: 
Last Name: 

Datasets & Competitions

Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complimentary to the information available in the individual frames of the video.



MIT DriveSeg (Manual) Dataset is a forward facing frame-by-frame pixel level semantic labeled dataset captured from a moving vehicle during continuous daylight driving through a crowded city street.

The dataset can be downloaded from the IEEE DataPort or demoed as a video.


Technical Summary

Video data - 2 minutes 47 seconds (5,000 frame) 1080P (1920x1080) 30 fps

Class definitions (12) - vehicle, pedestrian, road, sidewalk, bicycle, motorcycle, building, terrain (horizontal vegetation), vegetation (vertical vegetation), pole, traffic light, and traffic sign


Technical Specifications, Open Source Licensing and Citation Information

Ding, L., Terwilliger, J., Sherony, R., Reimer, B. & Fridman, L. (2020). MIT DriveSeg (Manual) Dataset for Dynamic Driving Scene Segmentation. Massachusetts Institute of Technology AgeLab Technical Report 2020-1, Cambridge, MA. (pdf)

Ding, L., Terwilliger, J., Sherony, R., Reimer, B. & Fridman, L. (2020). MIT DriveSeg (Manual) Dataset. IEEE Dataport. DOI: 10.21227/mmke-dv03.


Related Research

Ding, L., Terwilliger, J., Sherony, R., Reimer. B. & Fridman, L. (2019). Value of Temporal Dynamics Information in Driving Scene Segmentation. arXiv preprint arXiv:1904.00758. (link)


Attribution and Contact Information

This work was done in collaboration with the Toyota Collaborative Safety Research Center (CSRC). For more information, click here.

For any questions related to this dataset or requests to remove Identifying information please contact