Machine Learning

Please cite the following paper when using this dataset:

N. Thakur, S. Cui, K. A. Patel, N. Azizi, V. Knieling, C. Han, A. Poon, and R. Shah, “Marburg Virus Outbreak and a New Conspiracy Theory: Findings from a Comprehensive Analysis and Forecasting of Web Behavior,” Journal of Computation, Vol. 11, Issue. 11, Article. 234, Nov. 2023, DOI:



The Integrated Energy Management and Forecasting Dataset is a comprehensive data collection specifically designed for advanced algorithmic modeling in energy management. It combines two distinct yet complementary datasets - the Energy Forecasting Data and the Energy Grid Status Data - each tailored for different but related purposes in the energy sector.


The rapid evolution of communication networks and the ever-increasing demand for efficient data transfer have led to the development of cognitive networking, which aims to enhance network performance through intelligent and adaptive protocols. To facilitate research and development in this domain, we present a comprehensive dataset detailing the parameters of a Network Protocol Stack which can be used to develop a Cognitive Network Protocol Stack designed for efficient networking.


The poor posture is one of the main common health problems in the growth of adolescents, which seriously affects their physical and mental health. The posture gait recognition is a premise for preventing and correcting the poor posture. This paper proposes a gait recognition method for poor posture based on PCA-BP neural network. Using wearable intelligent insoles to measure plantar pressure, a gait recognition model based on PCA-BP neural network model is constructed.


The VNA dataset has three features: frequency, S21, and phase, while the MIMO dataset has an additional 'Channel' feature. The VNA dataset is larger than the MIMO dataset, with 507,709 rows compared to 164,161 rows in the MIMO dataset. This is because the VNA dataset was sampled at a 1 MHz resolution, while the MIMO dataset was sampled at a 25 MHz resolution, which is the limit set by the MATLAB API. As a result, the VNA dataset provides 4,701 samples per tag, while the MIMO dataset provides 190 samples per tag per channel for each reading.


As the harmful effects of climate change on human society increase, the analysis of abnormal weather is becoming an important issue. Therefore, this work provides the Korean weather dataset, including the anomaly score measurements by using seven different methods. In this dataset, seven types of weather data for each day in 64 Korean cities from 2010 to 2020 are provided by Weather Radar Center in Korea Meteorological Administration.


As the harmful effects of climate change on human society increase, the analysis of abnormal weather is becoming an important issue. Therefore, this work provides the Korean weather dataset, including the anomaly score measurements by using seven different methods. In this dataset, seven types of weather data for each day in 64 Korean cities from 2010 to 2020 are provided by Weather Radar Center in Korea Meteorological Administration.


Low-light images and video footage often exhibit issues due to the interplay of various parameters such as aperture, shutter speed, and ISO settings. These interactions can lead to distortions, especially in extreme lighting conditions. This distortion is primarily caused by the inverse relationship between decreasing light intensity and increasing photon noise, which gets amplified with higher sensor gain. Additionally, secondary characteristics like white balance and color effects can also be adversely affected and may require post-processing correction.


The data included here within is the associated model training results from the correlated paper "Distribution-Driven Augmentation of Real-World Datasets for Improved Cancer Diagnostics With Machine Learning". This paper focuses on using kernel density estimators to curate datasets by balancing classes and filling missing null values though synthetically generated data. Additionally, this manuscript proposes a technique for joining distinct datasets to train a model with necessary features from multiple different datasets as a type of transfer-learning.

