XAI Evaluation Multivariate Time Series Dataset

Citation Author(s):
Veena
More
Ramesh
K
Nadiyah
Ahmed
Submitted by:
Veena More
Last updated:
Tue, 02/06/2024 - 23:46
DOI:
10.21227/mczx-d871
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This data repository comprises three distinct datasets tailored for different predictive modeling tasks. The first dataset is a synthetic dataset designed to simulate multivariate time series patterns, incorporating both linear and non-linear dependencies among input and target features. The second dataset, the Beijing Air Quality PM2.5 dataset, consists of PM2.5 measurements alongside meteorological data like temperature, humidity, and wind speed, with the objective of predicting PM2.5 concentrations. Lastly, the Beijing Multi-Site Air Quality dataset encompasses hourly readings of various air pollutants from 12 monitoring sites in Beijing, aiming to estimate PM2.5 pollutant levels in the atmosphere. Each dataset presents unique challenges and opportunities for developing and evaluating predictive models, offering valuable resources for research and analysis in environmental science and machine learning domains.

Instructions: 

The whole data repository comprises 3 data sets

1) Synthetic Dataset:

The synthetic dataset is crafted to mimic patterns resembling a multivariate time series, encompassing both linear and non-linear dependencies among input and target features.

If we denote a time instance as ts={a,b,c,d,e,f,g} with g as the target column, the relationship is expressed by the equation:

g=(−44a−32b+0c+8d+e2−f) / 100

2) Beijing PM2.5 data:

The Beijing PM2.5 dataset provides hourly PM2.5 measurements recorded at the US Embassy in Beijing, alongside meteorological data sourced from Beijing Capital International Airport.

Dataset Characteristics:

  • Type: Multivariate, Time-Series
  • Subject Area: Climate and Environment
  • Associated Tasks: Regression
  • Feature Type: Integer, Real
  • Number of Instances: 43,824

Additional Information: The dataset covers the time period from January 1st, 2010, to December 31st, 2014. Missing data points are indicated by "NA" values.

 

Additional Variable Information

No: row number

year: year of data in this row

month: month of data in this row

day: day of data in this row

hour: hour of data in this row

pm2.5: PM2.5 concentration (ug/m^3)

DEWP: Dew Point (℃)

TEMP: Temperature (℃)

PRES: Pressure (hPa)

cbwd: Combined wind direction

Iws: Cumulated wind speed (m/s)

Is: Cumulated hours of snow

Ir: Cumulated hours of rain

3) Beijing Multi-site Air-Quality Data

This hourly dataset encompasses measurements of six primary air pollutants and six pertinent meteorological variables across various sites in Beijing.

Dataset Characteristics:

  • Type: Multivariate, Time-Series
  • Subject Area: Climate and Environment
  • Associated Tasks: Regression
  • Feature Type: Integer, Real
  • Number of Instances: 420,768

Additional Information: The dataset aggregates hourly air pollutant data from 12 nationally-controlled air-quality monitoring sites, sourced from the Beijing Municipal Environmental Monitoring Center. Meteorological data corresponding to each air-quality site are matched with the closest weather station operated by the China Meteorological Administration. The time frame spans from March 1st, 2013, to February 28th, 2017. Missing data points are marked as "NA."

Additional Variable Information

 

No: row number

year: year of data in this row

month: month of data in this row

day: day of data in this row

hour: hour of data in this row

PM2.5: PM2.5 concentration (ug/m^3)

PM10: PM10 concentration (ug/m^3)

SO2: SO2 concentration (ug/m^3)

NO2: NO2 concentration (ug/m^3)

CO: CO concentration (ug/m^3)

O3: O3 concentration (ug/m^3)

TEMP: temperature (degree Celsius)

PRES: pressure (hPa)

DEWP: dew point temperature (degree Celsius)

RAIN: precipitation (mm)

wd: wind direction

WSPM: wind speed (m/s)

station: name of the air-quality monitoring site

 

Comments

Evaluation dataset for Explainable AI

Submitted by Veena More on Tue, 02/06/2024 - 23:48