This dataset is released with our research paper titled “Scene-graph Augmented Data-driven Risk Assessment of Autonomous Vehicle Decisions” ( In this paper, we propose a novel data-driven approach that uses scene-graphs as intermediate representations for modeling the subjective risk of driving maneuvers. Our approach includes a Multi-Relation Graph Convolution Network, a Long-Short Term Memory Network, and attention layers.


As an alternative to classical cryptography, Physical Layer Security (PhySec) provides primitives to achieve fundamental security goals like confidentiality, authentication or key derivation. Through its origins in the field of information theory, these primitives are rigorously analysed and their information theoretic security is proven. Nevertheless, the practical realizations of the different approaches do take certain assumptions about the physical world as granted.


The data is provided as zipped NumPy arrays with custom headers. To load an file the NumPy package is required.

The respective loadz primitive allows for a straight forward loading of the datasets.

To load a file “file.npz” the following code is sufficient:

import numpy as np

measurement = np.load(’file.npz ’, allow pickle =False)

header , data = measurement [’header ’], measurement [’data ’]

The dataset comes with a supplementary script illustrating the basic usage of the dataset.


The Magnetic Resonance – Computed Tomography (MR-CT) Jordan University Hospital (JUH) dataset has been collected after receiving Institutional Review Board (IRB) approval of the hospital and consent forms have been obtained from all patients. All procedures followed are consistent with the ethics of handling patients’ data.


The Magnetic Resonance – Computed Tomography (MR-CT) Jordan University Hospital (JUH) dataset has been collected after receiving Institutional Review Board (IRB) approval of the hospital and consent forms have been obtained from all patients. All procedures followed are consistent with the ethics of handling patients’ data.


The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build "good" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE).




Lumos5G 1.0 is a dataset that represents the `Loop` area of the IMC'20 paper - "Lumos5G: Mapping and Predicting Commercial mmWave 5G Throughput". The Loop area is a 1300 meter loop near U.S. Bank Stadium in Minneapolis downtown area that covers roads, railroad crossings, restaurants, coffee shops, and recreational outdoor parks.

This dataset is being made available to the research community.


The description of the columns in the dataset CSV, from left to right, are:

- `run_num`: Indicates the run number. For each trajectory and mobility mode, we conduct several runs of experiments.
- `seq_num`: This is the sequence number. For each run, the sequence number acts like an index or a per-second timeline.
- `abstractSignalStr`: Indicates the abstract signal strength as reported by Android API ( No matter whether the UE was connected to 5G service or not, this column always reported a value associated with the LTE/4G radio. Note, if one is interested to understand the signal strength values related to 5G-NR, we refer them to other columns such as `nr_ssRsrp`, `nr_ssRsrq`, and `nr_ssSinr`.
- `latitude`: The latitude in degrees as reported by Android's API (
- `longitude`: The longitude in degrees as reported by Android's API (
- `movingSpeed`: The ground mobility/moving speed of the UE as reported by Android's API ( The unit is meters per second.
- `compassDirection`: The bearing in degrees as reported by Android's API ( Bearing is the horizontal direction of travel of this device, and is not related to the device orientation. It is guaranteed to be in the range `(0.0, 360.0]` if the device has a bearing.
- `nrStatus`: Indicates if the UE was connected to 5G network or not. When `nrStatus=CONNECTED`, the UE was connected to 5G. All other values of `nrStatus` such as `NOT_RESTRICTED` and `NONE` indicate the UE was not connected to 5G. `nrStatus` was obtained by parsing the raw string representation of `ServiceState` object (
- `lte_rssi`: Get Received Signal Strength Indication (RSSI) in dBm of the primary serving LTE cell. The value range is [-113, -51] inclusively or CellInfo#UNAVAILABLE if unavailable. Reference: TS 27.007 8.5 Signal quality +CSQ.
- `lte_rsrp`: Get reference signal received power (RSRP) in dBm of the primary serving LTE cell.
- `lte_rsrq`: Get reference signal received quality (RSRQ) of the primary serving LTE cell.
- `lte_rssnr`: Get reference signal signal-to-noise ratio (RSSNR) of the primary serving LTE cell.
- `nr_ssRsrp`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssRsrp` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -140 dBm to -44 dBm.
- `nr_ssRsrq`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssRsrq` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -20 dB to -3 dB.
- `nr_ssSinr`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssSinr` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215 Sec 5.1.*, 3GPP TS 38.133 Range: -23 dB to 40 dB
- `Throughput`: Indicates the throughput perceived by the UE. iPerf 3.7 was used to measure the per-second TCP downlink at the UE.
- `mobility_mode`: Indicates the grouth truth about the mobility mode when the experiment was conducted. This value can either be walking or driving.
- `trajectory_direction`: Indicates the ground truth about the trajectory direction of the experiment conducted at the Loop area. `CW` indicates clockwise direction, while `ACW` indicates anti-clockwise. Note, the driving experiments were only conducted in `CW` direction as certain parts of the loop were one way only. Walking-based experiments were conducted in both directions.
- `tower_id`: Indicates the (anonymized) tower identifier.

Note: We found that availability (and at times even the values) of `lte_rssi`, `nr_ssRsrp`, `nr_ssRsrq` and `nr_ssSinr` were not reliable. Since these values were sampled every second, at certain times (e.g., boundary cases), we might still find NR-related values when `nrStatus` is not equal to `CONNECTED`. However, in this dataset, we still include all the raw values as reported by the APIs.


author = {Narayanan, Arvind and Ramadan, Eman and Mehta, Rishabh and Hu, Xinyue and Liu, Qingxu and Fezeu, Rostand A. K. and Dayalan, Udhaya Kumar and Verma, Saurabh and Ji, Peiqi and Li, Tao and Qian, Feng and Zhang, Zhi-Li},
title = {Lumos5G: Mapping and Predicting Commercial MmWave 5G Throughput},
year = {2020},
isbn = {9781450381383},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/3419394.3423629},
booktitle = {Proceedings of the ACM Internet Measurement Conference},
pages = {176–193},
numpages = {18},
keywords = {bandwidth estimation, mmWave, machine learning, Lumos5G, throughput prediction, deep learning, prediction, 5G},
location = {Virtual Event, USA},
series = {IMC '20}


Please feel free to contact the FiveGophers/Lumos5G team for questions or information about the data (,,,,


Lumos5G 1.0 dataset is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.


 Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.


In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.

Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:

  • image_id;
  • label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
  • type (4 classes - HP, NORM, HG, LG);
  • reference WSI;
  • reference region of interest in WSI (roi);
  • resolution (micron per pixels, mpp);
  • coordinates for the patch (x, y, w, h).

Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.

import torch

import torchvision

import numpy as np

import cv2

import os


class UNITOPatho(

def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):

self.path = path

self.df = df

self.T = T = target

self.subsample = subsample

self.mock = mock

self.gray = gray

allowed_target = ['type', 'grade', 'top_label']

if target not in allowed_target:

print(f'Target must be in {allowed_target}, got {target}')


print(f'Loaded {len(self.df)} images')

def __len__(self):

return len(self.df)

def __getitem__(self, index):

entry = self.df.iloc[index]

image_id = entry.image_id

image_id = os.path.join(self.path, entry.top_label_name, image_id)

img = None

if self.mock:

C = 1 if self.gray else 3

img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)


img = cv2.imread(image_id)

if self.subsample != -1:

w = img.shape[0]

while w//2 > self.subsample:

img = cv2.resize(img, (w//2, w//2))

w = w//2

img = cv2.resize(img, (self.subsample, self.subsample))

if self.gray:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

img = np.expand_dims(img, axis=2)


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.T is not None:

img = self.T(img)

return img, entry[]


This dataset contains thousands of Channel State Information (CSI) samples collected using the 64-antenna KU Leuven Massive MIMO testbed. The measurements focused on four different antenna array topologies; URA LoS, URA NLoS, ULA LoS and, DIS LoS. The users channel is collected using CNC-tables, resulting in a dataset where all samples are provided with a very accurate spatial label. The user position is sweeped across a 9 squared meter area, halting every 5 millimeter, resulting in a dataset size of 252,004 samples for each measured topology.


The dataset contains Channel State Information (CSI) samples, recorded with the KU Leuven Massive MIMO testbed. This was done for many user positions laying on a grid. The Base Station (BS) is equipped with 64 antennas, each receiving a predefined pilot signal from each position. Using these pilot signals, the CSI is estimated for 100 subcarriers, evenly spaced in frequency over a 20 MHz bandwidth. As a result, the complex numbered matrix H represents the measured CSI for one location. This matrix spans N rows and K columns, with N being the number of BS antennas and K the number of subcarriers. For further details about the system, the National Instruments Massive MIMO Application Framework documentation can be consulted.  

To collect the CSI from many different user locations, four single-antenna User Equipments were positioned in an office. Their antennas were moved along a predefined route using CNC XY-tables. This route zigzagged along a grid taking steps of 5 mm. The total grid spans 1.25 m by 1.25 m. By using these XY-tables the error on the positional label is less than 1 mm, which results in a very accurate dataset. This resulted in a dataset containing 252004 CSI samples spatially labelled with an accuracy of less than 1 mm.  
Furthermore, the testbed’s BS is designed to be very flexible in the deployment of the antenna array. This allowed for the creation of three different datasets, each with a unique antenna deployment. First, a Uniform Rectangular Array (URA) of 8 by 8 antennas was deployed, both in LoS and NLoS, using a metal blocker. Second, Uniform Linear Array (ULA) of 64 antennas on one line was deployed. Finally, the antennas were distributed over the room in pairs of eight, making up the distributed (DIS) scenario.  

The different deployments can be seen in the picture in attachment. In all cases, the antennas are placed 1 m from the XY-tables. The yellow rectangles on the figure depict the 1.25 m by 1.25 m areas where the XY-tables are able to move the users in. The spacing in between the XY-tables is dictated by the space needed for the motors powering the movements and the cables connecting them to the controllers. These tables were synchronised over Ethernet with the BS to ensure the sampled H has a correct spatial label, enabling a highly accurate dataset. 

During the measurements, the BS was configured to use a centre frequency of 2.61 GHz, giving a wavelength λ of 114.56 mm. The system used a bandwidth of 20 MHz. The origin of the space was defined as the middle of the URA. From this point in space, the x- and y-positions of the users and antennas were measured. These locations are provided in 3D in the dataset.


This study presented six datasets for DNA/RNA sequence alignment for one of the most common alignment algorithms, namely, the Needleman–Wunsch (NW) algorithm. This research proposed a fast and parallel implementation of the NW algorithm by using machine learning techniques. This study is an extension and improved version of our previous work . The current implementation achieves 99.7% accuracy using a multilayer perceptron with ADAM optimizer and up to 2912 giga cell updates per second on two real DNA sequences with a of length 4.1 M nucleotides.


The boring and repetitive task of monitoring video feeds makes real-time anomaly detection tasks difficult for humans. Hence, crimes are usually detected hours or days after the occurrence. To mitigate this, the research community proposes the use of a deep learning-based anomaly detection model (ADM) for automating the monitoring process.


This datset contains 2000  images of size 256 X256. The dataset is created by captuirng photos using mobile phone. This dataset is applicable for two classes namely water and wet surface.


This dataset can be used for two classes such as water and wet surface.