Abstract

The "CloudPatch-7 Hyperspectral Dataset" comprises a manually curated collection of hyperspectral images, focused on pixel classification of atmospheric cloud classes. This labeled dataset features 380 patches, each a 50x50 pixel grid, derived from 28 larger, unlabeled parent images approximately 4402-by-1600 pixels in size. Captured using the Resonon PIKA XC2 camera, these images span 462 spectral bands from 400 to 1000 nm. Each patch is extracted from a parent image ensuring that its pixels fall within one of seven atmospheric conditions: Dense Dark Cumuliform Cloud, Dense Bright Cumuliform Cloud, Semi-transparent Cumuliform Cloud, Dense Cirroform Cloud, Semi-transparent Cirroform Cloud, Clear Sky - Low Aerosol Scattering (dark), and Clear Sky - Moderate to High Aerosol Scattering (bright). Incorporating contextual information from surrounding pixels enhances pixel classification into these 7 classes, making this dataset a valuable resource for spectral analysis, environmental monitoring, atmospheric science research, and testing machine learning applications that require contextual data. Parent images are very big in size, but they can be made available upon request.

Instructions:

We manually labeled 50x50 patches derived from larger, unlabeled parent images. Each patch is extracted ensuring its pixels fall within one of seven atmospheric conditions:

Class-1: Dense Dark Cumuliform Cloud (46 patches)
Class-2: Dense Bright Cumuliform Cloud (77 patches)
Class-3: Semi-transparent Cumuliform Cloud (72 patches)
Class-4: Dense Cirroform Cloud (25 patches)
Class-5: Semi-transparent Cirroform Cloud (28 patches)
Class-6: Clear Sky - Low Aerosol Scattering (dark) (68 patches)
Class-7: Clear Sky - Moderate to High Aerosol Scattering (bright) (64 patches)

The pickle file contains a total of 38,000 pixels, with 100 pixels per patch and each pixel having 462 wavelengths. However, we also make the bip (and the header) HSI files available so that you can use the spatial properties of the pixels and/or use more pixels per patch.

Load the Data from the Pickle File:


import pickle

# Load the data from the pickle file
with open('data.pickle', 'rb') as file:
    data = pickle.load(file)

# Access data components
X = data['X']  # Features (38000 pixels, each with 462 wavelengths)
y = data['y']  # Labels (classes for each pixel)
patch_id = data['patch_id']  # Patch ID for each pixel
parent_image_id = data['parent_image_id']  # Parent image ID for each pixel

Perform GroupShuffleSplit for Train-Test Split of the patches not the pixels:


from sklearn.model_selection import GroupShuffleSplit

# Initialize the GroupShuffleSplit
gss = GroupShuffleSplit(test_size=0.8, n_splits=1)

# Generate indices to split data into training and test sets
train_idx, test_idx = next(gss.split(X, y, groups=patch_id))

# Create the training and testing sets
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]

# For more robust evaluation, use multiple splits by increasing n_splits as follows
gss = GroupShuffleSplit(test_size=0.8, n_splits=10)
accuracies = []
for train_idx, test_idx in gss.split(X, y, groups=patch_id):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    # Train your model here and evaluate accuracy
    # e.g., model.fit(X_train, y_train)

    # accuracy = model.score(X_test, y_test)
    # accuracies.append(accuracy)

# Print average accuracy over all splits
# print(f'Average accuracy: {sum(accuracies) / len(accuracies):.2f}')

To read and display the bip files (instead of the pickle file)

You can use the script provided below (see the related dataset at https://ieee-dataport.org/documents/cloud-radiance-hsi):


import os
import spectral

def main():
    directory = 'data/'
    rgb_bands = [198, 123, 62]  # selected 650 nm for Red, 550 nm for Green, 470 nm for Blue

    # Process all hyperspectral images in the directory and display an RGB render
    for root, dirs, files in os.walk(directory):
        for file_name in files:
            if file_name.endswith('.hdr'):
                hdr_file = os.path.join(root, file_name)
                print("Opening image file:", hdr_file)
                img = spectral.open_image(hdr_file)
                spectral.imshow(img, bands=rgb_bands, figsize=(10, 10))

if __name__ == '__main__':
    main()
    


More details are available at:

Yan, H.; Zheng, R.; Mallela, S.; Russell, R.; Kursun, O. Collection of a Hyperspectral Atmospheric Cloud Dataset and Enhancing Pixel Classification through Patch-Origin Embedding. Remote Sens. 2024, 16, 3315. https://doi.org/10.3390/rs16173315

Funding Agency:

NSF

Grant Number:

2003740 and 2411519

Dataset Files

CloudPatch7_HSI_dataset.zip (785.13 MB)
spectral_data.mat.zip (350.98 MB)

Datasets

Standard Dataset

CloudPatch-7 Hyperspectral Dataset

Abstract

Load the Data from the Pickle File:

Perform GroupShuffleSplit for Train-Test Split of the patches not the pixels:

To read and display the bip files (instead of the pickle file)

More from this Author

Cloud Radiance HSI

Dataset Files

QUESTIONS?