CAlving Fronts and where to Find thEm: a benchmark dataset and methodology for automatic glacier calving front extraction from sar imagery
The temporal variability in calving front positions of marine-terminating glaciers permits inference on the frontal ablation. Frontal ablation, the sum of the calving rate and the melt rate at the terminus, significantly contributes to the mass balance of glaciers. Therefore, the glacier area has been declared as an Essential Climate Variable product by the World Meteorological Organization. The presented dataset provides the necessary information for training deep learning techniques to automate the process of calving front delineation. The dataset includes Synthetic Aperture Radar (SAR) images of seven glaciers distributed around the globe. Five of them are located in Antarctica: Crane, Dinsmoore-Bombardier-Edgeworth, Mapple, Jorum and the Sjörgen-Inlet Glacier. The remaining glaciers are the Jakobshavn Isbrae Glacier in Greenland and the Columbia Glacier in Alaska. Several images were taken for each glacier, forming a time series. The time series lie in the time span between 1995 and 2020. The images have different spatial resolutions, as they were captured by different satellites. The satellites used are Sentinel-1, TerraSAR-X, TanDEM-X, ENVISAT, European Remote Sensing Satellite 1&2, ALOS PALSAR, and RADARSAT-1. Along with the SAR images, two types of labels are provided so that deep learning techniques can be trained in a supervised manner. One label provides the position of the calving front. The other label shows the position of different landscape regions comprising glacier, rock outcrop, ocean including ice-melange, and an area where no information is available consisting of SAR shadows, layover regions, and areas outside the swath. The two labels allow different approaches to calving front delineation, as the calving front can be extracted from landscape region predictions during post-processing. As additional information for post-processing, the dataset includes bounding boxes for the dynamic calving front for each image. This bounding box excludes nearly static calving fronts also visible in the images, which are not of interest but would still be predicted as calving fronts by deep learning techniques. Hence, all front predictions outside this bounding box can be excluded during post-processing. To ensure the generalizability of the trained deep learning techniques to new unseen glaciers, the dataset is split into a training and an out-of-sample test set. The latter shall only be used to test the performance of the trained front delineation algorithm after all hyperparameters are optimized. The test set comprises the time series of Mapple and Columbia. More information on the dataset and how to use it can be found in the related paper.
The dataset has four subfolders: bounding_boxes, fronts, sar_images, and zones.The bounding_boxes folder includes the bounding boxes for each image as separate text files.The fronts, sar_images, and zones folders are each divided into test and train subfolders.The sar_images folder holds the SAR images for training and testing as png files.The fronts and zones folders include the labels (fronts - calving front position and zones - position of landscape regions) for each of the images in the sar_images folder.The labels are png files with the same size and location as the corresponding SAR image.The naming scheme of all files is: Glacier_Date_Satellite_SpatialResolutionInMeter_QualityFactor_Orbit(_Modality).pngThe modality gives the type of label (front or zones).The quality factor (with 1 being the best and 6 the worst) is based on the expert's opinion, who labelled the data.Images with a quality factor of 6 were hard to interpret for the expert. Thus, the labels for these images may contain some inaccuracies.