IEEE DataPort Announces Winners of Fall 2022 Dataset Upload Contest

IEEE DataPort, a global research data platform, announced the winners of its Fall 2022 Dataset Upload Contest. The dataset upload contest was designed to encourage authors and researchers to bring their research to the forefront of the global technical community. The contest was open to the public and ran from October 1st through November 30th, with more than 100 users participating. 

This year’s contest focused on four categories important to the research data community, including environmental/climate change, AI/machine learning, biomedical and health sciences, and a general/other category. A panel of IEEE volunteers selected the top datasets in each of the four contest categories based on the number of unique visitors each dataset received by the end of the contest period, the quality of the metadata, and the value to the technical community. 

The winner of the AI/machine learning category is Francesca Meneghello from University of Padova

in Italy, with their dataset titled “CSI Dataset for Wireless Human Sensing on 80 MHZ Wi-Fi Channels.” The dataset provided data to develop wireless sensing applications namely activity recognition, people identification and people counting, leveraging Wi-Fi devices with the goal of providing a common ground for the development and comparison of Wi-Fi enabled wireless sensing solutions. Francesca and her team collected data in seven different environments. Overall, the dataset contains more than thirteen hours of channel readings among which six hours are for activity recognition, and the remaining is evenly split between person identification and people counting. 

The winner of the environments/climate change category is Pandarasamy Arjunan from Berkeley Education Alliance for Research in Singapore, with their dataset titled “Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking.” The dataset contains more than 16,000 paddy leaf images across 13 classes (12 different paddy diseases and healthy leaves). The visual image dataset is used for experiments and benchmarking computer vision algorithms. With the goal of providing enough data for algorithms to automate the early and accurate identification and diagnosis of paddy diseases and pests. Previously, the manual inspection process was inefficient, time-consuming, and error-prone. The lack of public datasets was a major bottleneck to benchmarking deep learning models and wider adoption of automated solutions. 

The winner of the health sciences and biomedical category is Julio Valdez fromTecNM/Instituto Tecnologico de Mexicali, with their dataset titled “Cardiopulmonary Sounds Database.”  This dataset contains information on cardiopulmonary signals that were recorded simultaneously. The signals were separated into two folders, one titled heart sounds and the other lung sounds. In addition, two matlab programs are included in the dataset, one with which the signals can be recorded and another to make graphs in time and frequency. 

The winner of the general/other category is Solomiia Fedushko from Lviv Polytechnic National University in Ukraine, with their dataset titled “Propaganda and Fake News on the War in Ukraine.” The data collection includes posts from social media networks popular among Russian-speaking people. The information was gathered using predefined keywords such as "war" and "special military operation." The content is mainly relevant to Ukraine's continuing conflict with Russia. Following a thorough assessment and analysis of the data, propaganda and false news were detected. The information gathered was anonymized.