LeafNet: A large-scale dataset for training image-text models in leaf disease identification

Citation Author(s):
Khang
Nguyen Quoc
School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea
Lan
Le Thi Thu
Software Engineering Department, FPT University, Cantho City, 910000 VN, Vietnam
Luyl-Da
Quach
Software Engineering Department, FPT University, Cantho City, 910000 VN, Vietnam
Submitted by:
Da Quach
Last updated:
Fri, 02/21/2025 - 11:25
DOI:
10.21227/epxf-hr31
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The PlantVillage dataset, with over 54,000 images spanning 14 plant species and 26 disease types, has been widely used for leaf disease classification. However, it is limited in both scale and diversity. To address these limitations, we developed LeafNet, a large-scale dataset designed to support foundation models for leaf disease diagnosis. LeafNet comprises over 186,000 images from 22 crop species, covering 43 fungal diseases, 8 bacterial diseases, 2 mould (oomycete) diseases, 6 viral diseases, and 3 mite-induced diseases, categorized into 97 classes. The dataset was meticulously collected and processed to minimize intra-class variations while ensuring clarity by maintaining a consistent imaging distance. The disease symptom descriptions were curated from reputable sources, including UME, NIH, and published studies, providing high-quality annotations to support AI-driven plant pathology research.

Instructions: 

The LeafNet dataset consists of:

  • Image Folders: Each folder is named after a class and contains images belonging to that specific class.
  • Metadata JSON File: A JSON file providing detailed information about each class, including crop name, disease name, and a description of symptoms.

Documentation

AttachmentSize
File Readme File2.89 KB