Aspect Sentiment Triplet Extraction (ASTE) is an Aspect-Based Sentiment Analysis subtask (ABSA). It aims to extract aspect-opinion pairs from a sentence and identify the sentiment polarity associated with them. For instance, given the sentence ``Large rooms and great breakfast", ASTE outputs the triplet T = {(rooms, large, positive), (breakfast, great, positive)}. Although several approaches to ASBA have recently been proposed, those for Portuguese have been mostly limited to extracting only aspects without addressing ASTE tasks.


The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project ( UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP).


Visit to have a full companion code where a U-Net model is trained over the dataset.


The accompanying dataset for the CVSports 2021 paper: DeepDarts: Modeling Keypoints as Objects for Automatic Scoring in Darts using a Single Camera

Paper Abstract:


The recommended way to load the labels is to use the pandas Python package:

import pandas as pd

labels = pd.read_pickle("labels.pkl")

See github repository for more information:


The dermoscopic images considered in the paper "Dermoscopic Image Classification with Neural Style Transfer" are available for public download through the ISIC database (!/topWithHeader/wideContentTop/main). These are 24-bit JPEG images with a typical resolution of 768 × 512 pixels. However, not all the images in the database are in satisfactory condition.


The Objects Mosaic Hyperspectral Database contains 10,666 hyperspectral cubes of size 256x256x29 in the 420-700nm spectral range. This original hyperspectral database of real objects was experimentally acquired as described in the paper "SHS-GAN: Synthetic enhancement of a natural hyperspectral database", by J. Hauser, G. Shtendel, A. Zeligman, A. Averbuch, and M. Nathan, in the IEEE Transactions on Computational Imaging.

In addition, the database contains the SHS-GAN algorithm, which enables to generate synthetic database of hyperspectral images. 


Arbitrarily falling dices were photographed individually and monochromatically inside an Ulbricht sphere from two fixed perspectives. Overall, 11 dices with edge size 16 mm were used for 2133 falling experiments repeatedly. 5 of these dices were modified manually to have the following anomalies: drilled holes, missing dots, sawing gaps and scratches. All pictures in the uploaded pickle containers have a resolution of 400 times 400 pixels with normalized grey scale floating point values of 0 (black) through 1 (white).


The datasets contain files for training (“x_training.pickle”, w/o anomalies) and testing (“x_test.pickle”, w/ and w/o anomalies). Labels were saved in “y_test.pickle” whereas label zero correspond to non-anomalous data. Because the pose of the falling dice was not constrained the two fixed perspectives had the chance to see anomalies at all in 60 out of 100 experiments. Hence the test dataset contains 60 anomalous samples. Furthermore, data is augmented w.r.t. erased patches, changes in image constituents like brightness, and altered geometry like flipping and rotating.The shapes of the pickles are

  • w/o augmentation, x_train.pickle: (2000, 2, 400, 400)
  • w/o augmentation, x_test.pickle: (133, 2, 400, 400)
  • w/o augmentation, y_test.pickle: (133,)
  • w/ augmentation, x_train.pickle: (4000, 2, 400, 400)
  • w/ augmentation, x_test.pickle: (133, 2, 400, 400)
  • w/ augmentation, y_test.pickle: (133,)

Of late, efforts are underway to build computer-assisted diagnostic tools for cancer diagnosis via image processing. Such computer-assisted tools require capturing of images, stain color normalization of images, segmentation of cells of interest, and classification to count malignant versus healthy cells. This dataset is positioned towards robust segmentation of cells which is the first stage to build such a tool for plasma cell cancer, namely, Multiple Myeloma (MM), which is a type of blood cancer. The images are provided after stain color normalization.



If you use this dataset, please cite below publications-

  1. Anubha Gupta, Rahul Duggal, Shiv Gehlot, Ritu Gupta, Anvit Mangal, Lalit Kumar, Nisarg Thakkar, and Devprakash Satpathy, "GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images," Medical Image Analysis, vol. 65, Oct 2020. DOI: (2020 IF: 11.148)
  2. Shiv Gehlot, Anubha Gupta and Ritu Gupta, "EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1389-1393.
  3. Anubha Gupta, Pramit Mallick, Ojaswa Sharma, Ritu Gupta, and Rahul Duggal, "PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma," PLoS ONE 13(12): e0207908, Dec 2018. DOI: 10.1371/journal.pone.0207908

A fundamental building block of any computer-assisted interventions (CAI) is the ability to automatically understand what the surgeons are performing throughout the surgery. In other words, recognizing the surgical activities being performed or the tools being used by the surgeon can be deemed as an essential steps toward CAI. The main motivation for these tasks is to design efficient solutions for surgical workflow analysis. The CATARACTS dataset was proposed in this context. This dataset consists of 50 cataract surgery.


The dataset consists of 50 videos of cataract surgeries performed in Brest University Hospital. Patients were 61 years old on average (minimum: 23,maximum: 83,standard deviation: 10). Each surgery was recorded in two videos: the microscope video and the surgical tray video. The frame definition was 1920x1080 pixels (full HD resolution) for both types of videos. The frame rate was approximately 30 frames per second for the tool-tissue interaction videos and 50 frames per second for the surgical tray videos. Microscope videos had a duration of 10 minutes and 56 s on average (minimum: 6 minutes 23 s, maximum: 40 minutes 34 s, standard deviation:6 minutes 5 s). Surgical tray videos had a duration of 11 minutes and 3 s on average (minimum: 6 minutes 30 s, maximum: 40 minutes 48 s, standard deviation: 6 minutes 3 s). In total, more than nine hours of surgery (for each video type) have been video recorded. For more details about the dataset and the different tasks proposed, please refer to the links provided in the abstract.

Please note that the evaluation scripts (for the microscope test set) used in the challenges are available now. For CATARACTS 2018, in addition to the videos, we provide the images ( used in the challenge and the ground truth.

If you use this dataset, please cite the following paper:
Al Hajj, Hassan, et al. "CATARACTS: Challenge on automatic tool annotation for cataRACT surgery." Medical image analysis 52 (2019): 24-41.


The LEDNet dataset consists of image data of a field area that are captured from a mobile phone camera.

Images in the dataset contain the information of an area where a PCB board is placed, containing 6 LEDs. Each state of the LEDs on the PCB board represents a binary number, with the ON state corresponding to binary 1 and the OFF state corresponding to binary 0. All the LEDs placed in sequence represent a binary sequence or encoding of an analog value.


The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build "good" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE).




Lumos5G 1.0 is a dataset that represents the `Loop` area of the IMC'20 paper - "Lumos5G: Mapping and Predicting Commercial mmWave 5G Throughput". The Loop area is a 1300 meter loop near U.S. Bank Stadium in Minneapolis downtown area that covers roads, railroad crossings, restaurants, coffee shops, and recreational outdoor parks.

This dataset is being made available to the research community.


The description of the columns in the dataset CSV, from left to right, are:

- `run_num`: Indicates the run number. For each trajectory and mobility mode, we conduct several runs of experiments.
- `seq_num`: This is the sequence number. For each run, the sequence number acts like an index or a per-second timeline.
- `abstractSignalStr`: Indicates the abstract signal strength as reported by Android API ( No matter whether the UE was connected to 5G service or not, this column always reported a value associated with the LTE/4G radio. Note, if one is interested to understand the signal strength values related to 5G-NR, we refer them to other columns such as `nr_ssRsrp`, `nr_ssRsrq`, and `nr_ssSinr`.
- `latitude`: The latitude in degrees as reported by Android's API (
- `longitude`: The longitude in degrees as reported by Android's API (
- `movingSpeed`: The ground mobility/moving speed of the UE as reported by Android's API ( The unit is meters per second.
- `compassDirection`: The bearing in degrees as reported by Android's API ( Bearing is the horizontal direction of travel of this device, and is not related to the device orientation. It is guaranteed to be in the range `(0.0, 360.0]` if the device has a bearing.
- `nrStatus`: Indicates if the UE was connected to 5G network or not. When `nrStatus=CONNECTED`, the UE was connected to 5G. All other values of `nrStatus` such as `NOT_RESTRICTED` and `NONE` indicate the UE was not connected to 5G. `nrStatus` was obtained by parsing the raw string representation of `ServiceState` object (
- `lte_rssi`: Get Received Signal Strength Indication (RSSI) in dBm of the primary serving LTE cell. The value range is [-113, -51] inclusively or CellInfo#UNAVAILABLE if unavailable. Reference: TS 27.007 8.5 Signal quality +CSQ.
- `lte_rsrp`: Get reference signal received power (RSRP) in dBm of the primary serving LTE cell.
- `lte_rsrq`: Get reference signal received quality (RSRQ) of the primary serving LTE cell.
- `lte_rssnr`: Get reference signal signal-to-noise ratio (RSSNR) of the primary serving LTE cell.
- `nr_ssRsrp`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssRsrp` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -140 dBm to -44 dBm.
- `nr_ssRsrq`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssRsrq` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -20 dB to -3 dB.
- `nr_ssSinr`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssSinr` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215 Sec 5.1.*, 3GPP TS 38.133 Range: -23 dB to 40 dB
- `Throughput`: Indicates the throughput perceived by the UE. iPerf 3.7 was used to measure the per-second TCP downlink at the UE.
- `mobility_mode`: Indicates the grouth truth about the mobility mode when the experiment was conducted. This value can either be walking or driving.
- `trajectory_direction`: Indicates the ground truth about the trajectory direction of the experiment conducted at the Loop area. `CW` indicates clockwise direction, while `ACW` indicates anti-clockwise. Note, the driving experiments were only conducted in `CW` direction as certain parts of the loop were one way only. Walking-based experiments were conducted in both directions.
- `tower_id`: Indicates the (anonymized) tower identifier.

Note: We found that availability (and at times even the values) of `lte_rssi`, `nr_ssRsrp`, `nr_ssRsrq` and `nr_ssSinr` were not reliable. Since these values were sampled every second, at certain times (e.g., boundary cases), we might still find NR-related values when `nrStatus` is not equal to `CONNECTED`. However, in this dataset, we still include all the raw values as reported by the APIs.


author = {Narayanan, Arvind and Ramadan, Eman and Mehta, Rishabh and Hu, Xinyue and Liu, Qingxu and Fezeu, Rostand A. K. and Dayalan, Udhaya Kumar and Verma, Saurabh and Ji, Peiqi and Li, Tao and Qian, Feng and Zhang, Zhi-Li},
title = {Lumos5G: Mapping and Predicting Commercial MmWave 5G Throughput},
year = {2020},
isbn = {9781450381383},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/3419394.3423629},
booktitle = {Proceedings of the ACM Internet Measurement Conference},
pages = {176–193},
numpages = {18},
keywords = {bandwidth estimation, mmWave, machine learning, Lumos5G, throughput prediction, deep learning, prediction, 5G},
location = {Virtual Event, USA},
series = {IMC '20}


Please feel free to contact the FiveGophers/Lumos5G team for questions or information about the data (,,,,


Lumos5G 1.0 dataset is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.