CloudPatch-7 Hyperspectral Dataset

Citation Author(s):
Hua
Yan
Auburn University at Montgomery
Rachel
Zheng
Auburn University at Montgomery
Brandon
Boehm
Auburn University at Montgomery
Sameer
Shaga
Auburn University at Montgomery
Derienne
Black
Auburn University at Montgomery
Luis
Cueva Parra
High Point University
Randy
Russell
Auburn University at Montgomery
Olcay
Kursun
Auburn University at Montgomery
Submitted by:
OLCAY KURSUN
Last updated:
Tue, 05/28/2024 - 18:50
DOI:
10.21227/fgb9-qs51
License:
0
0 ratings - Please login to submit your rating.

Abstract 

 The "CloudPatch-7 Hyperspectral Dataset" comprises a manually curated collection of hyperspectral images, focused on pixel classification of atmospheric cloud classes. This labeled dataset features 380 patches, each a 50x50 pixel grid, derived from 28 larger, unlabeled parent images approximately 5000x1500 pixels in size. Captured using the Resonon PIKA XC2 camera, these images span 462 spectral bands from 400 to 1000 nm. Each patch is extracted from a parent image ensuring that its pixels fall within one of seven atmospheric conditions: Dense Dark Cumuliform Cloud, Dense Bright Cumuliform Cloud, Semi-transparent Cumuliform Cloud, Dense Cirroform Cloud, Semi-transparent Cirroform Cloud, Clear Sky - Low Aerosol Scattering (dark), and Clear Sky - Moderate to High Aerosol Scattering (bright). Incorporating contextual information from surrounding pixels enhances pixel classification into these 7 classes, making this dataset a valuable resource for spectral analysis, environmental monitoring, atmospheric science research, and testing machine learning applications that require contextual data. Parent images are very big in size, but they can be made available upon request. 

Instructions: 

We manually labeled 380 patches, each a 50x50 pixel grid, derived from 28 larger, unlabeled parent images approximately 5000x1500 pixels in size. Each patch is extracted ensuring its pixels fall within one of seven atmospheric conditions:

  • Class-1: Dense Dark Cumuliform Cloud (46 patches)
  • Class-2: Dense Bright Cumuliform Cloud (77 patches)
  • Class-3: Semi-transparent Cumuliform Cloud (72 patches)
  • Class-4: Dense Cirroform Cloud (25 patches)
  • Class-5: Semi-transparent Cirroform Cloud (28 patches)
  • Class-6: Clear Sky - Low Aerosol Scattering (dark) (68 patches)
  • Class-7: Clear Sky - Moderate to High Aerosol Scattering (bright) (64 patches)

The pickle file contains a total of 38,000 pixels, with 100 pixels per patch and each pixel having 462 wavelengths. However, we also make the bip (and the header) HSI files available so that you can use the spatial properties of the pixels and/or use more pixels per patch.

 
Load the Data from the Pickle File:

import pickle

# Load the data from the pickle file
with open('data.pickle', 'rb') as file:
    data = pickle.load(file)

# Access data components
X = data['X']  # Features (38000 pixels, each with 462 wavelengths)
y = data['y']  # Labels (classes for each pixel)
patch_id = data['patch_id']  # Patch ID for each pixel
parent_image_id = data['parent_image_id']  # Parent image ID for each pixel
        
Perform GroupShuffleSplit for Train-Test Split of the patches not the pixels:

from sklearn.model_selection import GroupShuffleSplit

# Initialize the GroupShuffleSplit
gss = GroupShuffleSplit(test_size=0.8, n_splits=1)

# Generate indices to split data into training and test sets
train_idx, test_idx = next(gss.split(X, y, groups=patch_id))

# Create the training and testing sets
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]

# For more robust evaluation, use multiple splits by increasing n_splits as follows
gss = GroupShuffleSplit(test_size=0.8, n_splits=10)
accuracies = []
for train_idx, test_idx in gss.split(X, y, groups=patch_id):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    # Train your model here and evaluate accuracy
    # e.g., model.fit(X_train, y_train)

    # accuracy = model.score(X_test, y_test)
    # accuracies.append(accuracy)

# Print average accuracy over all splits
# print(f'Average accuracy: {sum(accuracies) / len(accuracies):.2f}')
        
To read and display the bip files (instead of the pickle file)

You can use the script provided below (see the related dataset at https://ieee-dataport.org/documents/cloud-radiance-hsi):


import os
import spectral

def main():
    directory = 'data/'
    rgb_bands = [198, 123, 62]  # selected 650 nm for Red, 550 nm for Green, 470 nm for Blue

    # Process all hyperspectral images in the directory and display an RGB render
    for root, dirs, files in os.walk(directory):
        for file_name in files:
            if file_name.endswith('.hdr'):
                hdr_file = os.path.join(root, file_name)
                print("Opening image file:", hdr_file)
                img = spectral.open_image(hdr_file)
                spectral.imshow(img, bands=rgb_bands, figsize=(10, 10))

if __name__ == '__main__':
    main()
    
Funding Agency: 
NSF
Grant Number: 
2003740 and 2411519