ShrimpView: A Versatile Dataset for Shrimp Detection and Recognition

Citation Author(s):: Ravi Kumar Lanke

Debnath Bhattacharyya (Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram 522302, Guntur, Andhra Pradesh, India)

Karthick Raghunath K M
Submitted by:: Ravi Lanke
Last updated:: Fri, 09/22/2023 - 07:04
DOI:: 10.21227/fp5f-xp20
Data Format:: *.avi; *.csv; *.txt; *.zip

1667 views

Categories:

Keywords:

ACCESS DATASET CITE

Abstract

The "ShrimpView: A Versatile Dataset for Shrimp Detection and Recognition" is a meticulously curated collection of 10,000 samples (each with 11 attributes) designed to facilitate the training of deep learning models for shrimp detection and classification. Each sample in this dataset is associated with an image and accompanied by 11 categorical attributes. These attributes span a range of features including species type ("Pacific White Shrimp," "Tiger Prawn," "Ghost Shrimp"), life stage ("Larvae," "Juvenile," "Adult"), and environmental conditions ("Freshwater," "Saltwater," "Brackish Water"), among others. Additionally, the dataset incorporates variations in image orientation, background, and lighting conditions to enhance model generalizability. With resolutions ranging from 640x480 px to 1920x1080 px, the dataset is well-suited for both object detection and multi-class classification tasks.

The variability in these attributes aims to improve the model's generalizability and robustness. For instance, the "Species" and "Life Stage" attributes can aid in multi-class classification, while "Color" and "Size" add complexity for object detection. "Orientation" and "Background" introduce viewpoint and environmental variance, respectively. "Lighting" conditions can simulate different capture scenarios, and "Resolution" offers scale variance. "Grouping" prepares the model for detecting multiple instances in a single frame, and "Habitat" ensures the model is trained for different water conditions. It aims to serve as a robust foundation for developing comprehensive machine learning solutions in the domain of aquatic species recognition. This rich, multi-dimensional dataset is thus ideal for training a robust deep learning model for comprehensive shrimp detection and classification.

The computational costs involved in generating the dataset are listed as follows:

Data Augmentation Cost: Caug=20 seconds

CSV Generation Cost: Ccsv=1.1 seconds

Attribute Randomization Cost: Cattr=1.1 seconds

Total Disk Space Needed: Dtotal=550 MB

Total Time Needed: CTotal=22.2 seconds

Total Time Needed with Parallelization: Ctotal, Cparallel=55.5 seconds (4 cores)

Instructions:

The computational costs involved in generating the dataset are listed as follows:

Data Augmentation Cost: Caug=20 seconds

CSV Generation Cost: Ccsv=1.1 seconds

Attribute Randomization Cost: Cattr=1.1 seconds

Total Disk Space Needed: Dtotal=550 MB

Total Time Needed: CTotal=22.2 seconds

Total Time Needed with Parallelization: Ctotal, Cparallel=55.5 seconds (4 cores)

Benefits:

Robust Model Training: The comprehensive nature of the dataset allows for the training of robust machine learning models capable of handling various scenarios, from different shrimp species to diverse environmental conditions.

Multi-objective Learning: With attributes like species, size, and life stage, the dataset is ideal for multi-objective learning tasks such as classification, object detection, and even segmentation.

Improved Generalizability: The inclusion of image augmentations and various environmental factors makes models trained on this dataset more likely to generalize well to new, unseen data.

Resource Efficiency: The dataset alleviates the need for extensive data collection from the field, saving time and resources while ensuring a wide variety of data points.

Interdisciplinary Research: The dataset can be valuable not only for computer science research but also for marine biology, environmental science, and aquaculture, facilitating interdisciplinary collaborations.

Benchmarking: The dataset can serve as a standard benchmark for evaluating the performance of various machine learning algorithms in the domain of aquatic species recognition.

Data Availability: Having a pre-processed and well-annotated dataset accelerates the research process, as researchers can directly focus on model development and evaluation.

Future Scope:

Temporal Analysis: Future versions could include time-series data to study shrimp behavior, growth, and other temporal aspects.

Integration with IoT: The dataset could be extended to work in tandem with IoT devices for real-time monitoring of aquatic habitats.

Expand Species Range: Including more species of shrimp and even other aquatic creatures could make the dataset even more comprehensive.

High-Dimensional Data: Future versions could include additional data types like depth maps, thermal images, or even videos for more complex analyses.

Ethical and Fair AI: Future research could focus on ensuring that the dataset is balanced and free from biases, thus promoting ethical and fair AI practices.

Climate Impact Analysis: The dataset could be used to study the impact of changing environmental factors on aquatic species, offering valuable insights for climate change research.

Human-Animal Interaction: The dataset can be expanded to study the interaction between shrimps and human activities, such as fishing or pollution, providing a broader scope for environmental science research.

Automated Aquaculture: Combining this dataset with sensor data could lead to the development of automated monitoring and control systems for aquaculture.

The ShrimpView: A Versatile Dataset for Shrimp Detection and Recognition dataset seamlessly integrates a rich set of image data samples with a corresponding augmented categorical dataset, each entry of which maps to an image and contains 11 diverse attributes such as species, life stage, and environmental conditions. This dual-faceted dataset serves as a robust foundation for machine learning models, enabling multi-objective learning tasks like classification, object detection, and even segmentation. The amalgamation of visual and categorical data enhances the model's training efficacy, broadens its applicability, and significantly improves its generalizability and interpretability, making it an invaluable resource in the machine learning domain for comprehensive shrimp detection and classification.

I need the dataset ,please

Ji Ziliang Fri, 10/04/2024 - 08:22 Permalink

Hi, I need the data set... could you send it to me

best regards, Esteban

Esteban Gallegos Wed, 02/19/2025 - 18:19 Permalink