Soil Quality and Nutrient Levels from Multiple Districts of Bangladesh: IoT-Based Data

Citation Author(s):: Mohammod Abul Kashem (Dhaka University of Engineering & Technology)

Mehedi Hasan Shuvo (Dhaka University of Engineering & Technology)
Submitted by:: Mehedi Hasan Shuvo
Last updated:: Wed, 05/07/2025 - 08:20
DOI:: 10.21227/kpyp-ky32
Data Format:: *.csv

48 views

Categories:

Keywords:

Soil type

ACCESS DATASET CITE

Abstract

This dataset contains IoT-based soil data collected from various districts of Bangladesh, aimed at assessing soil quality for crop recommendation purposes. The data includes key soil parameters, such as nitrogen, phosphorus, potassium levels, soil conductivity, pH, humidity, and temperature, measured over multiple time intervals. The dataset includes a Soil Quality Index (SQI) and fuzzy classification categories, allowing for an in-depth soil fertility analysis. The data was gathered using IoT sensors, providing real-time and precise measurements from diverse soil types across different regions. This dataset is valuable for agricultural research, environmental monitoring, and developing precision farming techniques for further crop recommendations. It is expected to support further studies on soil fertility measurement along with SQI.

Instructions:

This dataset provides IoT-based soil quality data collected from different districts in Bangladesh. It includes various soil parameters, such as nitrogen, phosphorus, potassium content, soil conductivity, humidity, pH, temperature, and a Soil Quality Index (SQI). The dataset is designed for use in agricultural and environmental research to assess and improve soil quality and crop recommendations.

Researchers, scientists, and data analysts are encouraged to use this dataset to explore a range of analytical tasks, such as soil quality prediction, nutrient analysis, and developing machine learning models for agricultural management followed by crop recommendations. Below are the instructions and guidelines for working with the dataset.

1. Dataset Overview:

The dataset contains the following columns:

Datetime: Time of data collection (UNIX timestamp).
Nitrogen, Phosphorus, Potassium: These columns provide nutrient levels in the soil, which are critical for understanding soil fertility.
Soil Conductivity: A measure of soil salinity, indicating the level of dissolved salts and its effects on plant growth.
Soil Humidity: Percentage of moisture content in the soil, which is crucial for understanding irrigation needs.
Soil pH: The acidity or alkalinity of the soil.
Soil Temperature: Soil temperature measured in Celsius, affecting plant growth and microbial activity.
SQI (Soil Quality Index): A composite score derived from the soil parameters, providing an overall assessment of soil quality.
Fuzzy Category: A categorical label that classifies soil quality based on the SQI score (e.g., Low, Medium, High).
Fuzzy Degree: Represents the degree of membership of the soil to different fuzzy categories, which allows for a nuanced classification.

2. Analysis Tools and Frameworks:

To analyze this dataset, we recommend the following tools and methods:

Data Analysis:
- Python: Popular libraries include:
  - Pandas for data manipulation and cleaning.
  - NumPy for numerical operations.
  - Matplotlib and Seaborn for visualizing trends and distributions.
- R: Useful for statistical analysis and visualization.
- Excel/Google Sheets: If your analysis is more straightforward or you want a quick, interactive approach.
Machine Learning and Predictive Modeling:
- Python:
  - Use Scikit-learn for regression or classification models, such as predicting soil quality based on other parameters.
  - XGBoost or LightGBM for more complex, tree-based models.
  - TensorFlow/Keras for building deep learning models.
- R: Leverage machine learning packages like caret, randomForest, or xgboost.
Statistical Analysis:
- You can apply descriptive statistics and advanced tests to find correlations or trends using Python (Pandas, Scipy) or R.
- Statistical tests like ANOVA, Chi-square tests, and regression analysis are useful for understanding relationships between soil quality parameters.

3. Suggested Research and Analysis Tasks:

Soil Quality Prediction:
- Use machine learning models to predict the soil quality index (SQI) based on the soil parameters. This can involve supervised learning techniques where the target variable is the SQI, and the features are the soil nutrient levels, pH, humidity, conductivity, etc.
Nutrient Deficiency Analysis:
- Investigate the relationships between nitrogen, phosphorus, and potassium levels with the soil's overall quality and fertility. This can be used to develop recommendations for fertilizer application in agricultural practices.
Soil Classification:
- Apply clustering algorithms (e.g., K-means or DBSCAN) to identify distinct soil types or regions with similar characteristics. You can also perform classification to predict fuzzy categories (e.g., Low, Medium, High) based on the data attributes.
Temporal Analysis:
- Investigate how soil quality and parameters evolve over time. This analysis could reveal seasonal trends, optimal planting periods, or effects of different weather patterns on soil health.
Geospatial Analysis:
- Although the dataset doesn’t contain direct geospatial information (latitude/longitude), you could combine it with other external data sources to conduct geographic analysis. This could involve visualizing soil quality across different districts of Bangladesh and identifying regions that need more attention in terms of agricultural management.

4. Competition Instructions:

This dataset is intended for researchers, data scientists, and agricultural experts to apply their knowledge in soil science and machine learning. Depending on the competition or use case, participants are encouraged to:

Develop models that predict soil quality or other key parameters (e.g., nitrogen, phosphorus, potassium levels).
Create algorithms for soil classification or clustering.
Perform exploratory data analysis to uncover hidden insights from the data.
Suggest recommendations for improving soil health based on data-driven analysis.

5. Evaluation Metrics:

For model evaluation, participants can use common regression or classification evaluation metrics such as:

Root Mean Squared Error (RMSE) or Mean Absolute Error (MAE) for regression models.
Accuracy, Precision, Recall, and F1-score for classification models.
Cross-validation techniques to ensure robust and generalized models.

6. Further Considerations:

Data Preprocessing: Clean the data by handling missing values, normalizing or standardizing numerical features, and encoding categorical variables (such as fuzzy categories).
Feature Engineering: Try creating new features from the existing ones (e.g., interaction terms or time-based features).
Model Interpretability: For applied models, consider techniques like SHAP or LIME for better understanding model decisions, especially when the model is being used in practical agricultural settings.

Funding Agency

Dhaka University of Engineering & Technology (DUET), Gazipur, under University Grand Commission (UGC)