Abstract

This dataset, titled "Synthetic Sand Boil Dataset for Levee Monitoring: Generated Using DreamBooth Diffusion Models," provides a comprehensive collection of synthetic images designed to facilitate the study and development of semantic segmentation models for sand boil detection in levee systems. Sand boils, a critical factor in levee integrity, pose significant risks during floods, necessitating accurate and efficient monitoring solutions. Leveraging the advanced capabilities of DreamBooth diffusion models, this dataset offers high-fidelity, pixel-aligned image-mask pairs that capture the complex and varied environments typical of levee systems. The dataset addresses the challenge of obtaining sufficient annotated data by providing a scalable and cost-effective alternative to traditional data collection methods. Each image in the dataset is accompanied by precise segmentation masks, enabling detailed analysis and model training. This synthetic dataset serves as a valuable resource for researchers and practitioners aiming to enhance levee monitoring techniques through deep learning and semantic segmentation. By integrating state-of-the-art generative models, the dataset supports the development of robust and accurate models, paving the way for improved environmental monitoring and disaster prevention strategies.

Instructions:

This dataset provides a comprehensive collection of synthetic images and real test images designed to facilitate the study and development of semantic segmentation models for sand boil detection in levee systems. Sand boils pose significant risks during floods, necessitating accurate and efficient monitoring solutions. Leveraging DreamBooth diffusion models, this dataset includes 930 synthetic training images and 51 real test images, each with corresponding segmentation masks and convex hull coordinates. The dataset supports the development of robust models for levee monitoring, offering a scalable and cost-effective alternative to traditional data collection methods.

Data Format:

Images: Provided in PNG format, with a resolution suitable for semantic segmentation tasks.
Annotations: Segmentation masks are provided in JSON format, detailing pixel-level classifications for each image.
Convex Hull Coordinates: Included in a separate JSON file, compatible with annotation tools like the VGG Image Annotator (VIA).

Instructions for Use:

Data Structure: The dataset is organized into two main directories: Synthetic_training_images and Real_test_images. Each directory contains subfolders for images and their corresponding masks.
Loading Data: Images can be loaded using standard image processing libraries such as OpenCV or PIL. Annotations can be parsed using JSON libraries to extract segmentation masks and convex hull coordinates.
Annotation Process: Convex hull annotations provide a streamlined method for delineating object boundaries, significantly reducing manual effort. This technique is particularly useful for large synthetic datasets, offering a structured approach to boundary formation and facilitating efficient model training.
Model Training: The dataset is designed to support transfer learning and fine-tuning of existing segmentation models. Users are encouraged to experiment with different architectures and training strategies to optimize model performance.

Funding Agency:

10.13039/100006752-U.S. Department of the Army—U.S. Army Corps of Engineers (USACE)

Grant Number:

W912HZ-23-2-0004

Dataset Files

Synthetic Sand boil Dataset.zip (381.43 MB)

Datasets

Standard Dataset

Synthetic Sand Boil Dataset for Levee Monitoring

Abstract

Dataset Files

QUESTIONS?