Refined Massachusetts Building Dataset

- Citation Author(s):
- Submitted by:
- TASKIN KAVZOGLU
- Last updated:
- DOI:
- 10.21227/ne5s-wg52
- Data Format:
- Links:
- Categories:
- Keywords:
Abstract
The Massachusetts dataset, created using vector data from the OpenStreetMap (OSM) platform, was observed to contain various types of labeling errors. Since the OSM data are continuously updated by volunteer contributors, manual data entry may bring the risk of inconsistency and inaccuracy [20]. Also, the resolution of the images exacerbates labeling errors by contributing to problems such as blurred building boundaries [21]. These errors were carefully analyzed and categorized into six main groups: (1) mislabeling, (2) inclusion of non-building elements, (3) false positive estimates, (4) missing labels, (5) spatial misalignment, and (6) object contamination (Fig. 5). It should be noted that the red lines in the figure show the building boundaries in the dataset. A thorough search and updating process were conducted to resolve all these problems. All images were cropped into 512×512 patches with half overlap, which resulted in the appearance of white (empty) regions. To prevent these regions affecting the training process, the image patches containing white pixels and their corresponding labels were removed. Eventually, the dataset consisted of 1,495 training, 320 testing, and 320 validation images.
Instructions:
You can unzip the file to extract training, testing, and validation datasets in tiff format (512x512 patch size)