Datasets
Standard Dataset
XAI Evaluation Multivariate Time Series Dataset
- Citation Author(s):
- Submitted by:
- Veena More
- Last updated:
- Tue, 02/06/2024 - 23:46
- DOI:
- 10.21227/mczx-d871
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This data repository comprises three distinct datasets tailored for different predictive modeling tasks. The first dataset is a synthetic dataset designed to simulate multivariate time series patterns, incorporating both linear and non-linear dependencies among input and target features. The second dataset, the Beijing Air Quality PM2.5 dataset, consists of PM2.5 measurements alongside meteorological data like temperature, humidity, and wind speed, with the objective of predicting PM2.5 concentrations. Lastly, the Beijing Multi-Site Air Quality dataset encompasses hourly readings of various air pollutants from 12 monitoring sites in Beijing, aiming to estimate PM2.5 pollutant levels in the atmosphere. Each dataset presents unique challenges and opportunities for developing and evaluating predictive models, offering valuable resources for research and analysis in environmental science and machine learning domains.
The whole data repository comprises 3 data sets
1) Synthetic Dataset:
The synthetic dataset is crafted to mimic patterns resembling a multivariate time series, encompassing both linear and non-linear dependencies among input and target features.
If we denote a time instance as ts={a,b,c,d,e,f,g} with g as the target column, the relationship is expressed by the equation:
g=(−44a−32b+0c+8d+e2−f) / 100
2) Beijing PM2.5 data:
The Beijing PM2.5 dataset provides hourly PM2.5 measurements recorded at the US Embassy in Beijing, alongside meteorological data sourced from Beijing Capital International Airport.
Dataset Characteristics:
- Type: Multivariate, Time-Series
- Subject Area: Climate and Environment
- Associated Tasks: Regression
- Feature Type: Integer, Real
- Number of Instances: 43,824
Additional Information: The dataset covers the time period from January 1st, 2010, to December 31st, 2014. Missing data points are indicated by "NA" values.
Additional Variable Information
No: row number
year: year of data in this row
month: month of data in this row
day: day of data in this row
hour: hour of data in this row
pm2.5: PM2.5 concentration (ug/m^3)
DEWP: Dew Point (℃)
TEMP: Temperature (℃)
PRES: Pressure (hPa)
cbwd: Combined wind direction
Iws: Cumulated wind speed (m/s)
Is: Cumulated hours of snow
Ir: Cumulated hours of rain
3) Beijing Multi-site Air-Quality Data
This hourly dataset encompasses measurements of six primary air pollutants and six pertinent meteorological variables across various sites in Beijing.
Dataset Characteristics:
- Type: Multivariate, Time-Series
- Subject Area: Climate and Environment
- Associated Tasks: Regression
- Feature Type: Integer, Real
- Number of Instances: 420,768
Additional Information: The dataset aggregates hourly air pollutant data from 12 nationally-controlled air-quality monitoring sites, sourced from the Beijing Municipal Environmental Monitoring Center. Meteorological data corresponding to each air-quality site are matched with the closest weather station operated by the China Meteorological Administration. The time frame spans from March 1st, 2013, to February 28th, 2017. Missing data points are marked as "NA."
Additional Variable Information
No: row number
year: year of data in this row
month: month of data in this row
day: day of data in this row
hour: hour of data in this row
PM2.5: PM2.5 concentration (ug/m^3)
PM10: PM10 concentration (ug/m^3)
SO2: SO2 concentration (ug/m^3)
NO2: NO2 concentration (ug/m^3)
CO: CO concentration (ug/m^3)
O3: O3 concentration (ug/m^3)
TEMP: temperature (degree Celsius)
PRES: pressure (hPa)
DEWP: dew point temperature (degree Celsius)
RAIN: precipitation (mm)
wd: wind direction
WSPM: wind speed (m/s)
station: name of the air-quality monitoring site
Comments
Evaluation dataset for Explainable AI