Citation Author(s):
Submitted by:
Erlin Erlin
Last updated:
Sun, 01/07/2024 - 10:50
Data Format:
Research Article Link:
0 ratings - Please login to submit your rating.


The dataset utilized in this research originates from two primary sources: the Central Bureau of Statistics of Indonesia, which provides data on Harvested Area and Production, and the Meteorology, Climatology, and Geophysics Agency of Indonesia, responsible for data on Rainfall, Humidity, and Temperature. This dataset encompasses six years of observations, collected annually from 2018 to 2023. It is important to note that the data for December 2023 are predictive estimates from these agencies. The dataset consists of seven variables, including one dependent variable (the target), rice production, and six independent variables: year, province, harvested area, rainfall, humidity, and temperature. Among these, 'province' is a categorical variable, while the rest are numerical. The data from these sources is amalgamated into a single file and converted to CSV format, enabling efficient further data processing.


Data dictionary for our CSV dataset:

1.      Province

-        Type: Object (String)

-        Description: The name of the province in Indonesia.

-        Example Values: Aceh, Sumatera Utara, Sumatera Barat, Riau, Jambi

2.      Year

-        Type: Integer (int64)

-        Description: The year of observation.

-        Example Values: 2018, 2019, 2020, 2021, 2022

3.      Harvested Area

-        Type: Integer (int64)

-        Description: The harvested area for rice (in hectares).

-        Example Values: 329516, 310012, 317869, 297058, 271750

4.      Production

-        Type: Integer (int64)

-        Description: Rice production quantity (in metric tons).

-        Example Values: 1861567, 1714438, 1757313, 1634640, 1509456

5.      Rainfall

-        Type: Integer (int64)

-        Description: Annual rainfall (in mm).

-        Example Values: 2336, 1437, 1790, 2293, 1834

6.      Humidity

-        Type: Integer (int64)

-        Description: Average annual humidity (percentage).

-        Example Values: 81, 82, 76, 80, 79

7.      Temperature

-        Type: Integer (int64)

-        Description: Average annual temperature (in degrees Celsius).

-        Example Values: 28, 27, 29, 26, 30

 Instructions for Using the Rice Production Dataset and Analysis Tools


A. Dataset Utilization


1. Accessing the Dataset


The dataset is provided in CSV format and can be downloaded from


Ensure you have software capable of opening CSV files, such as Microsoft Excel, Google Sheets, or programming environments like Python or R.


2. Understanding the Dataset


Refer to the provided Data Dictionary for detailed descriptions of each column.


3. Data Cleaning and Preprocessing


The dataset has been preprocessed (e.g., imputation, removal).


B. Analysis Tools


1. Choosing an Analysis Tool


For statistical analysis or machine learning, tools like Python (with libraries such as Pandas, NumPy, Scikit-Learn) or R are recommended.


For simpler analysis or visualizations, spreadsheet software like Excel can be used.


2. Loading the Dataset


In Python, use pandas.read_csv('file_path') to load the dataset.


In R, use read.csv('file_path') to load the dataset.


In Excel, simply open the file through the File menu.


3. Conducting Analysis

Uisng Exploratory Data Analysis (EDA) to understand data trends and patterns.

Funding Agency: 
Ministry of Education, Culture, Research, and Technology of Indonesia
Grant Number: