Design a classifier to classify diseases in paddy based on leaf color
- Submission Dates:
- 08/14/2024 to 08/22/2024
- Citation Author(s):
- Submitted by:
- VISHNU T S
- Last updated:
- Fri, 08/23/2024 - 10:49
- DOI:
- 10.21227/3emp-zs52
- Data Format:
- Links:
- License:
- Creative Commons Attribution
- Categories:
- Keywords:
Abstract
There is an increasing demand for automated systems capable of accurately diagnosing paddy diseases, which would help lower pesticide usage and prevent yield loss. Yet, the absence of publicly available datasets with annotated disease labels has posed a challenge to the development and benchmarking of advanced deep learning models. To address this issue, we created and open-sourced the Paddy Doctor dataset, facilitating the development of reliable and effective paddy disease diagnosis systems.
"Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking" is a specialized dataset designed to aid in the development and evaluation of machine learning models for the automated classification of diseases in paddy (rice) plants. This dataset typically includes a diverse collection of high-quality images of paddy leaves affected by various diseases, along with healthy samples. Each image is labeled according to the type of disease or condition it represents.
The winners will be awarded from a prize pool of 40,000 rupees.
Reference: Petchiammal A, Briskline Kiruba S, Murugan D, Pandarasamy Arjunan, November 18, 2022, "Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking", IEEE Dataport, doi: https://dx.doi.org/10.21227/hz4v-af08.
PADDY DOCTOR: AN AUTOMATED VISUAL IMAGE SENSING MODEL FOR PADDY DISEASE CLASSIFICATION
1. Introduction
A specific dataset called "Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking" was created to support the testing and development of machine learning models for the automated illness classification of paddy (rice) plants. Together with healthy samples, this dataset usually contains a wide range of excellent photos of paddy leaves afflicted by different illnesses. Each picture has a label indicating the kind of illness or condition it depicts.
The need for automated systems that can correctly identify diseases in paddy fields is growing since doing so would reduce the need for pesticides and save crop loss. However, the lack of publicly accessible datasets with illness classifications has made it difficult to construct and compare sophisticated deep learning models.
2 . IMAGE SEARCH
Image search from a given dataset involves retrieving specific images that match certain criteria or queries from a larger collection of images stored in the dataset. This process typically uses metadata (such as tags, labels, or descriptions) or visual similarity (like color, shape, or texture) to filter and find relevant images. In research or application contexts, image search is often crucial for tasks such as training machine learning models, analyzing visual patterns, or conducting experiments where visual data is a key component.
2. Data Modeling
Objective: Understand and structure the Paddy doctor dataset for effective use.
Download and Explore the Dataset:
Obtain the Paddy Doctor dataset from IEEE Dataport.
Explore the dataset to understand its structure, classes, and features.
Visualize sample images and corresponding labels to get an overview of the data distribution.
Data Annotation:
Ensure all images are correctly annotated with one of the sixteen plant diseases.
Verify the quality of annotations and make necessary corrections.
Data Splitting:
Split the dataset into training, validation, and test sets (e.g., 70% training, 15% validation, 15% test).
Ensure balanced representation of all disease classes in each split.
Data Cleaning and Preprocessing
Objective: Prepare the data for model training.
Data Cleaning:
Remove any corrupted or low-quality images.
Normalize image sizes to a standard dimension (e.g., 224x224 pixels).
Convert images to grayscale if color is not a significant feature for your models.
Data Augmentation:
Apply data augmentation techniques such as rotation, flipping, zooming, and cropping to increase dataset variability and improve model generalization.
Here is the data collection summary.
3.SUMMARY
Crop name: Paddy
Total number of images: 16,225
Total number of classes: 13 (12 paddy diseases and normal leaf)
Image type: Visual (RGB)
Image file type: JPEG
Image resolution: 1,080 x 1,440 pixels
Smartphone device used: CAT S62 Pro
Data collection period: February to April 2021
Data collection location: Pallamadai, Tamil Nadu, India - 627357
Additional metadata: paddy age and variety for each image
Competition Dataset Files
- metadata.csv (526.09 kB)
- paddy-doctor-diseases-medium.zip (1.23 GB)
- paddy-doctor-diseases-small-400-split.zip (104.09 MB)
- paddy-doctor-diseases-small-augmented-26k-split.zip (506.27 MB)
- paddy-doctor-diseases-small-augmented-26k.zip (505.87 MB)
- paddy-doctor-diseases-small-augmented-5x.zip (1.54 GB)
- paddy-doctor-diseases-small-augmented-65k.zip (1.24 GB)
- paddy-doctor-diseases-small-split.zip (322.40 MB)
- paddy-doctor-diseases-small.zip (322.15 MB)
- paddy-doctor-diseases.zip (4.64 GB)
Documentation
Attachment | Size |
---|---|
documentation.pdf | 1.02 MB |
Comments
Deadline to submit
What is the deadline to submit the predictions? How to submit the predictions?
test
test
test
test data
test
test user data