Datasets
Standard Dataset
Synthetic Data for Smart Meter Attack Detection
- Citation Author(s):
- Submitted by:
- Victor Contrera...
- Last updated:
- Mon, 02/24/2025 - 09:12
- DOI:
- 10.21227/7k2f-wz30
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset contains synthetic smart meter data with simulated cyber attacks, designed to support research in anomaly detection, cybersecurity, and energy consumption analysis. The dataset is based on 159 users from the Smart Meters in London dataset, selected for their regular consumption patterns. This larger dataset can be found in
https://www.kaggle.com/datasets/jeanmidev/smart-meters-in-london,
which is a refactorised version of the data found in
https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-....
It spans one year with half-hourly data, totaling 17,520 consumption records per user. The first 11 months remain unaltered, and the attacks are introduced in the final month. Seven distinct attack strategies have been applied, resulting in a total of 1,272 user datasets, including both the original and attacked versions. The attacks are indexed from 0 to 7 and include: (0) reduction to the historical minimum, (1) a one-week consumption reduction within the month by a fixed percentage, (2) progressive reduction over time, (3) cut-off at a predefined threshold preventing consumption from exceeding a set limit, (4) progressive reduction during peak hours, (5) progressive reduction during peak hours with redistribution to off-peak hours, (6) swapping consumption between peak and off-peak hours, and (7) unaltered consumption. This dataset may be useful for testing anomaly detection methods and exploring different strategies for identifying attacks in smart meter data.
The dataset is provided in CSV format, with each row representing the consumption data of a user at a specific half-hourly interval over the course of one year, totaling 17,520 data points per user. The data spans from 2012-02-29 00:00:00 to 2013-02-27 23:30:00, with each timestamp representing a half-hour period. Each user has a unique identifier and the data includes energy consumption in kilowatt-hours (kWh) for each half-hour interval. The attacks are introduced in the final month of the year, with seven different attack strategies applied, indexed from 0 to 7. The dataset is intended for research on anomaly detection and testing cybersecurity strategies for smart meter data. It is recommended to process the data using tools like Python (with libraries such as Pandas, NumPy, and Matplotlib) or R. Researchers are encouraged to use this dataset to explore different methods for identifying and mitigating attacks in smart grid systems.