Datasets
Open Access
GeoNRW
- Citation Author(s):
- Submitted by:
- Naoto Yokoya
- Last updated:
- Tue, 05/17/2022 - 22:21
- DOI:
- 10.21227/s5xq-b822
- Link to Paper:
- License:
- Categories:
Abstract
This dataset consists of orthorectified aerial photographs, LiDAR derived digital elevation models and segmentation maps with 10 classes, acquired through the open data program of the German state North Rhine-Westphalia (https://www.opengeodata.nrw.de/produkte/) and refined with OpenStreeMap. Please check the license information (http://www.govdata.de/dl-de/by-2-0). Preprocessing consists of resampling the 0.1m resolution photographs to 1m, taking the first LiDAR return while averaging within 1m² to arrive at the same resolution as the photographs, and rasterizing vector files of the land cover data. In total the dataset consists of 7783 triplets of size 1000x1000 pixels.
Dataset description
The data was mostly acquired over urban areas in North-Rhine Westphalia, Germany. Since the acquisition dates for the aerial photographs and LiDAR do not match exactly, there can be discrepancies in what they show and in which season, e.g., trees change their leaves or lose them in autumn. In our experience, these differences are not drastic but should be kept in mind.
We have included two Python scripts. plot_examples.py creates the example image used on this website. calc_and_plot_stats.py calculates and plots the class statistics. Furthermore, we published the code to create the dataset at https://github.com/gbaier/geonrw, which makes it easy to extend the dataset with other areas in North-Rhine Westphalia. The repository also contains a PyTorch data loader.
This multimodal dataset should be useful for a variety of tasks. Image segmentation using multiple inputs, height estimation from the aerial photographs, or semantic image synthesis.
Organization
Similar to the original source of the data (https://www.opengeodata.nrw.de/produkte/geobasis/lbi/dop/dop_jp2_f10_paketiert/), we organize all samples by the city they were acquired over. Their filenames, e.g., 345_5668_rgb.jp2 consists of the UTM zone 32N coordinates and the datatype (RGB, DEM or seg for land cover).
File formats
All data is geocoded and can be opened using QGIS (https://www.qgis.org/). The aerial photographs are stored as JPEG2000 files, the land cover maps and digital elevation models both as GeoTIFFs. The accompanying scripts show how to read the data into Python.
Dataset Files
- nrw_dataset.tar.gz (30.15 GB)
- plot_scripts.zip (410.59 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Comments
Can this dataset be used for water body segmentation?