Datasets
Open Access
A Densely-Deployed, High Sampling Rate, Open-Source Air Pollution Monitoring WSN
- Citation Author(s):
- Submitted by:
- Mohammad Ghazivakili
- Last updated:
- Tue, 05/17/2022 - 22:21
- DOI:
- 10.21227/m4pb-g538
- Data Format:
- Link to Paper:
- License:
- Categories:
- Keywords:
Abstract
This work contains data gathered by a series of sensors (PM 10, PM 2.5, temperature, relative humidity, and pressure) in the city of Turin in the north part of Italy (more precisely, at coordinates 45.041903N, 7.625850E). The data has been collected for a period of 5 months, from October 2018 to February 2019. The scope of the study was to address the calibration of low-cost particulate matter sensors and compare the readings against official measures provided by the Italian environmental agency (ARPA Piemonte). The database proposed has been designed as general enough to handle not only PM measures plus temperature and relative humidity but also almost any other quantity, such as altitude, wind speed and direction, radioactivity, electromagnetic pollution, etc.
The total size of the database is about 50GB of time-stamped data. The directory also contains several useful scripts that can be used to perform the calibration and the analysis of the acquired data, such as plotting graphs, displaying the correlation with the reference values, printing measurement errors, etc. The scripts implement two commonly used calibration techniques, namely Multivariate Linear Regression and Random Forest, resorting to the SciKitLearn Python library.
The README files included in the main subdirectories report hints and comments on the data set format and the logic of the scripts. Please refer to them for further details. Please note that, following article 18.5 of Italian Decree 155/2010 on the dissemination of air quality data, which absorbs EU directive 2008/50/CE, ARPA Piemonte (http://www.arpa.piemonte.it/english-version) can not be ascribed for any mistake in these data, that can not be considered official, unlike the ones provided by ARPA itself.
Authors can be contacted at the following addresses:
{bartolomeo.montrucchio, edoardo.giusto, mohammad.ghazivakili, stefano.quer, renato.ferrero}@polito.it, and c.fornaro@uninettunouniversity.net
A Densely-Deployed, High Sampling Rate, Open-Source Air Pollution Monitoring WSN
Documentation for the air pollution monitoring station developed at Politecnico di Torino by:
Edoardo Giusto, Mohammad Ghazi Vakili under the supervision of Prof. Bartolomeo Montrucchio.
System Overview
This section includes a description of our architecture from several points of view, going from the hardware and software architecture, to the communication protocols.
Hardware Architecture
We target the following key characteristics of our system:
- The rapid and easy prototyping capabilities,
- Flexibility in connection scenarios, and
- Cheapness but also dependability of components.
As each board has to include a limited number of modules, to facilitate our prototype development, we select the Raspberry Pi
single-board computer as a monitoring board.
Due to our constraints in terms of cost, size and power consumption we select its Zero Wireless version
based on the ARM11 microprocessor
.
The basic operating principle of the system is the following. The data gathered from the sensors are stored in the MicroSD card of the RPi
. At certain time intervals the RPi tries to connect to a Wi-Fi network
and, if such a connection is established, it uploads the newly acquired data to a remote server.
The creation of the Wi-Fi network is achieved using a mobile phone set to operate as personal hot-spot, while on the remote server resides the database storing all the performed measurements.
Software Architecture
Wi-Fi connectivity
was one of the requirements for the system, but at the same time, the system itself should have not to produce unnecessary electromagnetic noise, possibly impacting the operating ability of the host's appliances.
To reduce the time in which the Wi-Fi connection was active, the Linux OS
was set to activate the specific interface at predefined time instants in order to connect to the portable hot-spot.
Once connected to the network, the system performed the following tasks:
- synchronization of the system and
RTC clock
with a remote Network Time Protocol (NTP) server, - synchronization of
the local samples directory
with theremote directory
residing on the server.
The latter task is performed using theUNIX rsync
utility, which has to be installed on both the machines.
To gather data from the sensors, a Python program has been implemented, which runs continuously with a separate process reading from each physical sensor plugged to the board and writing on the MicroSD card.
It has to be noted that for what concerns the PM sensors, since the UART communication had to take place using GPIOs, a Pigpiod deamon has been leveraged, to create digital serial ports over the Pi's pins.
The directories on the remote server are a simple copy of the MicroSD cards mounted on the boards.
Data in these directories have been inserted in a MySQL database.
Mechanical Design and Hardware Components
In order to easily stack more than one device together, a 3D printed modular case has been designed.
Several enclosing frames can be tied together using nuts and bolts, with the use of a single cap on top.
Figure shows the 3D board design, together with the final sensor and board configurations.
Each platform is equipped with 4 PM sensors (a good trade-off between size and redundancy), 1 Temperature (T)
and Relative Humidity (HT)
sensor and 1 Pressure (P)
sensor.
As our target was to capture significant data sampling for the particulate matter we adopt the following sensors:
-
The
Honeywell HPMA115S0-XXX
as PM sensor.
As one of our targets was to evaluate these sensors' suitability for air pollution monitoring applications, we insert 4 instances of this sensor in every single platform.
This sort of redundancy allows us to detect strange phenomena and to avoid several kind of malfunctions, making more stable the overall system. -
The
DHT22
as temperature and relative humidity sensor.
This is very widespread in prototyping applications, with several open-source implementation of its library, publicly available on the internet. -
The Bosch
BME280
as a pressure sensor.
This is a cheap but precise barometric pressure and temperature sensor which comes pre-soldered on a small PCB for easy prototyping.
The system also includes a Real Time Clock (RTC)
module for the operating system to retrieve the correct time after a sudden power loss. The chosen device is the DS3231
.
The DS3231 communicates via I2C interface and has native support in the Linux kernel.
As a last comment, notice that a Printed Circuit Board (PCB) was designed to facilitate connections and soldering of the various sensors and other components.
Database
Create database
The database structure can be created using the scripts located in the mysql_insertion
folder of the Dataset/SQL_Table
repository.
mysql -u <user> [-h <host>] [-p] < create_db.sql
Load SQL data (SQL Format)
Data formated in SQL can be loaded using the mysql command mysql -u username -p WEATHER_STATION < db_whole_data.sql
, and the db_whole_data.sql
is available in the SQL_data/
folder of the Dataset
directory.
Load RAW data (CSV)
Data can be loaded using the python script sql_ins.py
available in the mysql_insertion
folder of the Dataset/SQL_Table
repository.
python sql_ins.py <data_folder>
The script assumes the following folder structure:
* data_folder
|-- 01-board_table
|-- 02-unit_of_measure_table
|-- 03-param_type_table
|-- 04-board_config_table
|-- 05-physical_sensor_table
|-- 06-logical_sensor_table
|-- 07-board_sensor_connection_table
|-- 08-measure_table
|-- arpa
|-- mobility
|-- stations
Each folder contains a set of csv files. The script automatically loads data into the appropriate table and using the correct fields, which are specified as a list of parameters in the script. It is possible to edit the script to load only a subset of the folders.
System Usage
To replicate the experiments, the user should clone the raspberry pi image into a MicroSD (16-32 GB).
To do this, s/he can issue the command dd if=/path/to/image of=/path/of/microsd bs=4m
on Linux.
The sampling scripts are run by a systemd unit automatically at system startup. The same systemd unit handles also the automatic respawn of the processes if some problems occur. The data are stored in the /home/alarm/ws/data
directory, with filenames corresponding to the date of acquisition.
In order to upload these data to a database, it is possible to use the guide contained in the "database" directory.
In order to perform calibration and tests, it is recommended to take a look at the guide contained in the "analysis" directory. A Python class has been implemented to perform calibration of sensors against the ARPA reference ones. The resulting calibration can then be applied to a time window of choice.
3D Model
A 3D model
of the case has been developed using SketchUp online
software.
The resulting model is split in 5 different parts, each large enough to fit in our 3D printer
(Makerbot Replicator 2X).
The model is stackable, meaning that several cases can be put on top of each other, with a single roof piece.
Printed Circuit Board
A PCB
has been developed using KiCad software
, so to create a hat for the RPi0 connecting all the sensors.
WS Analysis library documentation (v0.2)
The aim of this package is to provide fast and easy access and analysis to the Weather Station database. This package is located in the analysis
directory, and it is compatible only with Python 3. Please follow the readme file for more information.
Directory Structure
project
├── 3D_Box
│ ├── Cap_v0_1stpart.skp
│ ├── Cap_v0_2dpart.skp
│ ├── ws_rpzero_noGPS_v1.skp
│ ├── ws_sensors_2d_half_v2.skp
│ └── ws_sensors_half_v2.skp
├── analysis
│ ├── arpa_station.json
│ ├── board.json
│ ├── example.py
│ ├── extract.py
│ ├── out.pdf
│ ├── requirements.txt
│ ├── ws_analysis
│ │ ├── __pycache__
│ │ │ └── ws_analysis.cpython-37.pyc
│ │ ├── rpt.txt
│ │ └── script_offset.py
│ ├── ws_analysis.md
│ ├── ws_analysis.pdf
│ ├── ws_analysis.py
│ └── ws_analysis.pyc
├── Dataset
│ ├── db_setup.html
│ ├── db_setup.md
│ ├── db_setup.pdf
│ ├── er_diagram.pdf
│ ├── mysql_insertion
│ │ ├── extract_to_file.py
│ │ ├── remove_duplicate.py
│ │ └── sql_ins.py
│ ├── SQL_Table
│ │ ├── create_db.sql
│ │ ├── create_measure_table.sql
│ │ └── load_data.sql
│ └── SQL_data
│ └── db_whole_data.sql.gz
├── PCB
│ └── WS_v2_output.tar.xz
├── readme.html
├── readme.md
├── readme.pdf
└── scripts
├── python
│ ├── csv
│ │ ├── arpa_retrieve.py
│ │ ├── filemerge.py
│ │ ├── gpx2geohash.py
│ │ ├── parse_csv.py
│ │ └── validation.py
│ └── mpu9250
│ └── gyro.py
└── README.md
Dataset Files
- Full dataset in SQL format Dataset.gz.tar (56.33 GB)
- Full dataset in compressed SQL format Dataset_compressed.gz.tar (6.19 GB)
- WS Analysis library documentation (v0.2) with compressed dataset dense_deployed_air_pollution_monitoring_system.zip (5.43 GB)
- WS Analysis library documentation (v0.2) with compressed dataset dense_deployed_air_pollution_monitoring_system.zip (5.43 GB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
readme.pdf | 74.97 KB |