Machine Learning

 Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.


This dataset contains thousands of Channel State Information (CSI) samples collected using the 64-antenna KU Leuven Massive MIMO testbed. The measurements focused on four different antenna array topologies; URA LoS, URA NLoS, ULA LoS and, DIS LoS. The users channel is collected using CNC-tables, resulting in a dataset where all samples are provided with a very accurate spatial label. The user position is sweeped across a 9 squared meter area, halting every 5 millimeter, resulting in a dataset size of 252,004 samples for each measured topology.


This study presented six datasets for DNA/RNA sequence alignment for one of the most common alignment algorithms, namely, the Needleman–Wunsch (NW) algorithm. This research proposed a fast and parallel implementation of the NW algorithm by using machine learning techniques. This study is an extension and improved version of our previous work . The current implementation achieves 99.7% accuracy using a multilayer perceptron with ADAM optimizer and up to 2912 giga cell updates per second on two real DNA sequences with a of length 4.1 M nucleotides.


The boring and repetitive task of monitoring video feeds makes real-time anomaly detection tasks difficult for humans. Hence, crimes are usually detected hours or days after the occurrence. To mitigate this, the research community proposes the use of a deep learning-based anomaly detection model (ADM) for automating the monitoring process.


This datset contains 2000  images of size 256 X256. The dataset is created by captuirng photos using mobile phone. This dataset is applicable for two classes namely water and wet surface.


This is the dataset provided and collected while "Car Hacking: Attack & Defense Challenge" in 2020. We are the main organizer of the competition along with Culture Makers and Korea Internet & Security Agency. We are very proud of releasing these valuable datasets for all security researchers for free.

The competition aimed to develop attack and detection techniques of Controller Area Network (CAN), a widely used standard of in-vehicle network. The target vehicle of competition was Hyundai Avante CN7.


The MIMOSigRef-SD dataset was created with the goal to support the research community in the design and development of novel multiple-input multiple-ouotput (MIMO) transceiver architectures. It was recorded using software radios as transmitters and receivers, and a wireless channel emulator to facilitate a realistic representation of a variety of different channel environments and conditions.


Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions.Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures.


The dataset is generated by performing different MiTM attacks in the synthetic electric grid in RESLab testbed at Texas A&M University, US. The testbed primarily consists of a dynamic power system simulator (Powerworld Dynamic Studio), network emulator (CORE), Snort IDS, open DNP3 master and Elasticsearch's Packetbeat index. There are raw and processed files that can be used by security enthusiasts to develop new features and also to train IDS using our feature space respectively.


This dataset is for short-term spatio-temporal PV forecasting.

This dataset consists of three two parts. The first part is the spatio-temporal PV dataset which obatined from different PV sites. The second part is the corresponding weather datasets, including temperature, wind speed, wind direction, etc. 

The dataset also contains the demo codes for showing the concept of a machine learning based PV forecasting model. 

More information will be added in the future.