Datasets
Standard Dataset
CloudPatch-7 Hyperspectral Dataset
- Citation Author(s):
- Submitted by:
- OLCAY KURSUN
- Last updated:
- Sun, 09/08/2024 - 13:47
- DOI:
- 10.21227/fgb9-qs51
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
The "CloudPatch-7 Hyperspectral Dataset" comprises a manually curated collection of hyperspectral images, focused on pixel classification of atmospheric cloud classes. This labeled dataset features 380 patches, each a 50x50 pixel grid, derived from 28 larger, unlabeled parent images approximately 4402-by-1600 pixels in size. Captured using the Resonon PIKA XC2 camera, these images span 462 spectral bands from 400 to 1000 nm. Each patch is extracted from a parent image ensuring that its pixels fall within one of seven atmospheric conditions: Dense Dark Cumuliform Cloud, Dense Bright Cumuliform Cloud, Semi-transparent Cumuliform Cloud, Dense Cirroform Cloud, Semi-transparent Cirroform Cloud, Clear Sky - Low Aerosol Scattering (dark), and Clear Sky - Moderate to High Aerosol Scattering (bright). Incorporating contextual information from surrounding pixels enhances pixel classification into these 7 classes, making this dataset a valuable resource for spectral analysis, environmental monitoring, atmospheric science research, and testing machine learning applications that require contextual data. Parent images are very big in size, but they can be made available upon request.
We manually labeled 50x50 patches derived from larger, unlabeled parent images. Each patch is extracted ensuring its pixels fall within one of seven atmospheric conditions:
- Class-1: Dense Dark Cumuliform Cloud (46 patches)
- Class-2: Dense Bright Cumuliform Cloud (77 patches)
- Class-3: Semi-transparent Cumuliform Cloud (72 patches)
- Class-4: Dense Cirroform Cloud (25 patches)
- Class-5: Semi-transparent Cirroform Cloud (28 patches)
- Class-6: Clear Sky - Low Aerosol Scattering (dark) (68 patches)
- Class-7: Clear Sky - Moderate to High Aerosol Scattering (bright) (64 patches)
The pickle file contains a total of 38,000 pixels, with 100 pixels per patch and each pixel having 462 wavelengths. However, we also make the bip (and the header) HSI files available so that you can use the spatial properties of the pixels and/or use more pixels per patch.
Load the Data from the Pickle File:
import pickle
# Load the data from the pickle file
with open('data.pickle', 'rb') as file:
data = pickle.load(file)
# Access data components
X = data['X'] # Features (38000 pixels, each with 462 wavelengths)
y = data['y'] # Labels (classes for each pixel)
patch_id = data['patch_id'] # Patch ID for each pixel
parent_image_id = data['parent_image_id'] # Parent image ID for each pixel
Perform GroupShuffleSplit for Train-Test Split of the patches not the pixels:
from sklearn.model_selection import GroupShuffleSplit
# Initialize the GroupShuffleSplit
gss = GroupShuffleSplit(test_size=0.8, n_splits=1)
# Generate indices to split data into training and test sets
train_idx, test_idx = next(gss.split(X, y, groups=patch_id))
# Create the training and testing sets
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
# For more robust evaluation, use multiple splits by increasing n_splits as follows
gss = GroupShuffleSplit(test_size=0.8, n_splits=10)
accuracies = []
for train_idx, test_idx in gss.split(X, y, groups=patch_id):
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
# Train your model here and evaluate accuracy
# e.g., model.fit(X_train, y_train)
# accuracy = model.score(X_test, y_test)
# accuracies.append(accuracy)
# Print average accuracy over all splits
# print(f'Average accuracy: {sum(accuracies) / len(accuracies):.2f}')
To read and display the bip files (instead of the pickle file)
You can use the script provided below (see the related dataset at https://ieee-dataport.org/documents/cloud-radiance-hsi):
import os
import spectral
def main():
directory = 'data/'
rgb_bands = [198, 123, 62] # selected 650 nm for Red, 550 nm for Green, 470 nm for Blue
# Process all hyperspectral images in the directory and display an RGB render
for root, dirs, files in os.walk(directory):
for file_name in files:
if file_name.endswith('.hdr'):
hdr_file = os.path.join(root, file_name)
print("Opening image file:", hdr_file)
img = spectral.open_image(hdr_file)
spectral.imshow(img, bands=rgb_bands, figsize=(10, 10))
if __name__ == '__main__':
main()
More details are available at:
Yan, H.; Zheng, R.; Mallela, S.; Russell, R.; Kursun, O. Collection of a Hyperspectral Atmospheric Cloud Dataset and Enhancing Pixel Classification through Patch-Origin Embedding. Remote Sens. 2024, 16, 3315. https://doi.org/10.3390/rs16173315
Dataset Files
- CloudPatch7_HSI_dataset.zip (785.13 MB)
- spectral_data.mat.zip (350.98 MB)