Using Datasets to Develop Automated Plant Disease Detection Systems

Dr. Pandarasamy Arjunan of the Berkeley Education Alliance for Research in Singapore was eager to solve a problem plaguing many Asian countries, the ability to identify disease in paddy plants accurately and efficiently. Pandarasamy uploaded the dataset titled Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking to IEEE DataPort a way to further the technology of advanced disease detection systems. His dataset won the Environment category of the 2022 Dataset Upload Content.

About the Dataset

Pandarasamy’s dataset consisted of more than 16,000 labeled paddy leaf images to advance computer vision algorithms to identify disease and pests of paddy leaves. Diseases and pests have a significant impact on rice farming and can result in significant losses of the crop and manually identifying diseases can be difficult. A solution is to automate the disease identification process. However, the limited availability of datasets hinders the implementation of disease detection systems that use advanced image processing and deep learning techniques.

Pandarasamy developed and open sourced the dataset to enable the development of efficient and robust paddy disease diagnosis systems. His dataset can be used to experiment and implement computer vision models to identify the type of disease present in leaf images.

The dataset is connected to a published conference paper of the same name and can be read here.

Benefits of the IEEE DataPort Platform

“As a researcher, there are many advantages to using the IEEE DataPort platform.”
  • Easy-to-use interfaces for discovering a wide range of interesting datasets across various fields
  • Ability to upload and share datasets in various standard formats with a large research community
  • Interesting competitions, useful for beginners to gain experience working with datasets and machine learning models

Pandarasamy said the IEEE DataPort platform helped him achieve his research goals, especially in terms of sharing reproducible research datasets and code with a wider research community. This is particularly important because many scientific journals now require researchers to share reproducible materials during the submission and review process to increase transparency in research.

See the dataset, learn more about IEEE DataPort, or upload your own research data.