The lack of quality label data is considered one of the main bottlenecks for training machine and deep learning models. Weakly supervised learning using incomplete, coarse, or inaccurate data is an alternative strategy to overcome the scarcity of training data. We trained a U-Net model for segmenting Buildings’ footprints from a high-resolution digital elevation model, using existing label data from the open-access Microsoft building footprints data set. Comparison using an independent, manually labeled benchmark indicated the success of the weak supervision learning as the quality of the model prediction (IoU: 0.876) surpassed that of the original Microsoft data quality (IoU: 0.672) by approximately 20 percent. Moreover, adding extra channels such as elevation derivatives, slope, aspect, and profile curvatures did not enhance the weak learning process as the model learned directly from the original elevation data. Our results demonstrate the value of using existing data for training deep learning models even if they are noisy and incomplete.
label_clip.tif contains Microsoft labelling.
DEM_clip.tif is the DEM map.
manual_label_raster_lzw.tif contains manual labelling for certain area.
Dataset preprocessing and sampling script can be found in the github repository we include in our paper.