Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dynamics of the visual scene carries that is complimentary to the information available in the individual frames of the video.
MIT DriveSeg (Manual) Dataset is a forward facing frame-by-frame pixel level semantic labeled dataset captured from a moving vehicle during continuous daylight driving through a crowded city street.
Video data - 2 minutes 47 seconds (5,000 frame) 1080P (1920x1080) 30 fps
Class definitions (12) - vehicle, pedestrian, road, sidewalk, bicycle, motorcycle, building, terrain (horizontal vegetation), vegetation (vertical vegetation), pole, traffic light, and traffic sign
Ding, L., Terwilliger, J., Sherony, R., Reimer, B. & Fridman, L. (2020). MIT DriveSeg (Manual) Dataset for Dynamic Driving Scene Segmentation. Massachusetts Institute of Technology AgeLab Technical Report 2020-1, Cambridge, MA. (pdf)
Ding, L., Terwilliger, J., Sherony, R., Reimer, B. & Fridman, L. (2020). MIT DriveSeg (Manual) Dataset. IEEE Dataport. DOI: 10.21227/mmke-dv03.
Ding, L., Terwilliger, J., Sherony, R., Reimer. B. & Fridman, L. (2019). Value of Temporal Dynamics Information in Driving Scene Segmentation. arXiv preprint arXiv:1904.00758. (link)
This work was done in collaboration with the Toyota Collaborative Safety Research Center (CSRC). For more information, click here.
For any questions related to this dataset or requests to remove Identifying information please contact firstname.lastname@example.org.