Skip to main content

Datasets

Standard Dataset

DEM_building_Illinois

Citation Author(s):
YIfan Chen (University of Illinois at Urbana-Champaign)
Aiman Soliman (National Center for Supercomputing Applications)
Volodymyr Kindratenko (National Center for Supercomputing Applications)
Shirui Luo (National Center for Supercomputing Applications)
Rauf Makharov (University of Illinois at Urbana-Champaign)
Submitted by:
Yifan Chen
Last updated:
DOI:
10.21227/p2fp-mn16
Data Format:
No Ratings Yet

Abstract

The lack of quality label data is considered one of the main bottlenecks for training machine and deep learning models. Weakly supervised learning using incomplete, coarse, or inaccurate data is an alternative strategy to overcome the scarcity of training data. We trained a U-Net model for segmenting Buildings’ footprints from a high-resolution digital elevation model, using existing label data from the open-access Microsoft building footprints data set. Comparison using an independent, manually labeled benchmark indicated the success of the weak supervision learning as the quality of the model prediction (IoU: 0.876) surpassed that of the original Microsoft data quality (IoU: 0.672) by approximately 20 percent. Moreover, adding extra channels such as elevation derivatives, slope, aspect, and profile curvatures did not enhance the weak learning process as the model learned directly from the original elevation data. Our results demonstrate the value of using existing data for training deep learning models even if they are noisy and incomplete.

Instructions:

label_clip.tif contains Microsoft labelling.

DEM_clip.tif is the DEM map. 

manual_label_raster_lzw.tif contains manual labelling for certain area. 

Dataset preprocessing and sampling script can be found in the github repository we include in our paper.