The Objects Mosaic Hyperspectral Database contains 10,666 hyperspectral cubes of size 256x256x29 in the 420-700nm spectral range. This original hyperspectral database of real objects was experimentally acquired as described in the paper "SHS-GAN: Synthetic enhancement of a natural hyperspectral database", by J. Hauser, G. Shtendel, A. Zeligman, A. Averbuch, and M. Nathan, in the IEEE Transactions on Computational Imaging.

In addition, the database contains the SHS-GAN algorithm, which enables to generate synthetic database of hyperspectral images. 


Arbitrarily falling dices were photographed individually and monochromatically inside an Ulbricht sphere from two fixed perspectives. Overall, 11 dices with edge size 16 mm were used for 2133 falling experiments repeatedly. 5 of these dices were modified manually to have the following anomalies: drilled holes, missing dots, sawing gaps and scratches. All pictures in the uploaded pickle containers have a resolution of 400 times 400 pixels with normalized grey scale floating point values of 0 (black) through 1 (white).


The datasets contain files for training (“x_training.pickle”, w/o anomalies) and testing (“x_test.pickle”, w/ and w/o anomalies). Labels were saved in “y_test.pickle” whereas label zero correspond to non-anomalous data. Because the pose of the falling dice was not constrained the two fixed perspectives had the chance to see anomalies at all in 60 out of 100 experiments. Hence the test dataset contains 60 anomalous samples. Furthermore, data is augmented w.r.t. erased patches, changes in image constituents like brightness, and altered geometry like flipping and rotating.The shapes of the pickles are

  • w/o augmentation, x_train.pickle: (2000, 2, 400, 400)
  • w/o augmentation, x_test.pickle: (133, 2, 400, 400)
  • w/o augmentation, y_test.pickle: (133,)
  • w/ augmentation, x_train.pickle: (4000, 2, 400, 400)
  • w/ augmentation, x_test.pickle: (133, 2, 400, 400)
  • w/ augmentation, y_test.pickle: (133,)

Of late, efforts are underway to build computer-assisted diagnostic tools for cancer diagnosis via image processing. Such computer-assisted tools require capturing of images, stain color normalization of images, segmentation of cells of interest, and classification to count malignant versus healthy cells. This dataset is positioned towards robust segmentation of cells which is the first stage to build such a tool for plasma cell cancer, namely, Multiple Myeloma (MM), which is a type of blood cancer. The images are provided after stain color normalization.



If you use this dataset, please cite below publications-

  1. Anubha Gupta, Rahul Duggal, Shiv Gehlot, Ritu Gupta, Anvit Mangal, Lalit Kumar, Nisarg Thakkar, and Devprakash Satpathy, "GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images," Medical Image Analysis, vol. 65, Oct 2020. DOI: (2020 IF: 11.148)
  2. Shiv Gehlot, Anubha Gupta and Ritu Gupta, "EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1389-1393.
  3. Anubha Gupta, Pramit Mallick, Ojaswa Sharma, Ritu Gupta, and Rahul Duggal, "PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma," PLoS ONE 13(12): e0207908, Dec 2018. DOI: 10.1371/journal.pone.0207908

A fundamental building block of any computer-assisted interventions (CAI) is the ability to automatically understand what the surgeons are performing throughout the surgery. In other words, recognizing the surgical activities being performed or the tools being used by the surgeon can be deemed as an essential steps toward CAI. The main motivation for these tasks is to design efficient solutions for surgical workflow analysis. The CATARACTS dataset was proposed in this context. This dataset consists of 50 cataract surgery.


The dataset consists of 50 videos of cataract surgeries performed in Brest University Hospital. Patients were 61 years old on average (minimum: 23,maximum: 83,standard deviation: 10). Each surgery was recorded in two videos: the microscope video and the surgical tray video. The frame definition was 1920x1080 pixels (full HD resolution) for both types of videos. The frame rate was approximately 30 frames per second for the tool-tissue interaction videos and 50 frames per second for the surgical tray videos. Microscope videos had a duration of 10 minutes and 56 s on average (minimum: 6 minutes 23 s, maximum: 40 minutes 34 s, standard deviation:6 minutes 5 s). Surgical tray videos had a duration of 11 minutes and 3 s on average (minimum: 6 minutes 30 s, maximum: 40 minutes 48 s, standard deviation: 6 minutes 3 s). In total, more than nine hours of surgery (for each video type) have been video recorded. For more details about the dataset and the different tasks proposed, please refer to the links provided in the abstract.

Please note that the evaluation scripts (for the microscope test set) used in the challenges are available now. For CATARACTS 2018, in addition to the videos, we provide the images ( used in the challenge and the ground truth.

If you use this dataset, please cite the following paper:
Al Hajj, Hassan, et al. "CATARACTS: Challenge on automatic tool annotation for cataRACT surgery." Medical image analysis 52 (2019): 24-41.


The LEDNet dataset consists of image data of a field area that are captured from a mobile phone camera.

Images in the dataset contain the information of an area where a PCB board is placed, containing 6 LEDs. Each state of the LEDs on the PCB board represents a binary number, with the ON state corresponding to binary 1 and the OFF state corresponding to binary 0. All the LEDs placed in sequence represent a binary sequence or encoding of an analog value.


The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build "good" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE).




Lumos5G 1.0 is a dataset that represents the `Loop` area of the IMC'20 paper - "Lumos5G: Mapping and Predicting Commercial mmWave 5G Throughput". The Loop area is a 1300 meter loop near U.S. Bank Stadium in Minneapolis downtown area that covers roads, railroad crossings, restaurants, coffee shops, and recreational outdoor parks.

This dataset is being made available to the research community.


The description of the columns in the dataset CSV, from left to right, are:

- `run_num`: Indicates the run number. For each trajectory and mobility mode, we conduct several runs of experiments.
- `seq_num`: This is the sequence number. For each run, the sequence number acts like an index or a per-second timeline.
- `abstractSignalStr`: Indicates the abstract signal strength as reported by Android API ( No matter whether the UE was connected to 5G service or not, this column always reported a value associated with the LTE/4G radio. Note, if one is interested to understand the signal strength values related to 5G-NR, we refer them to other columns such as `nr_ssRsrp`, `nr_ssRsrq`, and `nr_ssSinr`.
- `latitude`: The latitude in degrees as reported by Android's API (
- `longitude`: The longitude in degrees as reported by Android's API (
- `movingSpeed`: The ground mobility/moving speed of the UE as reported by Android's API ( The unit is meters per second.
- `compassDirection`: The bearing in degrees as reported by Android's API ( Bearing is the horizontal direction of travel of this device, and is not related to the device orientation. It is guaranteed to be in the range `(0.0, 360.0]` if the device has a bearing.
- `nrStatus`: Indicates if the UE was connected to 5G network or not. When `nrStatus=CONNECTED`, the UE was connected to 5G. All other values of `nrStatus` such as `NOT_RESTRICTED` and `NONE` indicate the UE was not connected to 5G. `nrStatus` was obtained by parsing the raw string representation of `ServiceState` object (
- `lte_rssi`: Get Received Signal Strength Indication (RSSI) in dBm of the primary serving LTE cell. The value range is [-113, -51] inclusively or CellInfo#UNAVAILABLE if unavailable. Reference: TS 27.007 8.5 Signal quality +CSQ.
- `lte_rsrp`: Get reference signal received power (RSRP) in dBm of the primary serving LTE cell.
- `lte_rsrq`: Get reference signal received quality (RSRQ) of the primary serving LTE cell.
- `lte_rssnr`: Get reference signal signal-to-noise ratio (RSSNR) of the primary serving LTE cell.
- `nr_ssRsrp`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssRsrp` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -140 dBm to -44 dBm.
- `nr_ssRsrq`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssRsrq` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -20 dB to -3 dB.
- `nr_ssSinr`: Obtained by parsing the raw string representation of `SignalStrength` object ( `nr_ssSinr` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215 Sec 5.1.*, 3GPP TS 38.133 Range: -23 dB to 40 dB
- `Throughput`: Indicates the throughput perceived by the UE. iPerf 3.7 was used to measure the per-second TCP downlink at the UE.
- `mobility_mode`: Indicates the grouth truth about the mobility mode when the experiment was conducted. This value can either be walking or driving.
- `trajectory_direction`: Indicates the ground truth about the trajectory direction of the experiment conducted at the Loop area. `CW` indicates clockwise direction, while `ACW` indicates anti-clockwise. Note, the driving experiments were only conducted in `CW` direction as certain parts of the loop were one way only. Walking-based experiments were conducted in both directions.
- `tower_id`: Indicates the (anonymized) tower identifier.

Note: We found that availability (and at times even the values) of `lte_rssi`, `nr_ssRsrp`, `nr_ssRsrq` and `nr_ssSinr` were not reliable. Since these values were sampled every second, at certain times (e.g., boundary cases), we might still find NR-related values when `nrStatus` is not equal to `CONNECTED`. However, in this dataset, we still include all the raw values as reported by the APIs.


author = {Narayanan, Arvind and Ramadan, Eman and Mehta, Rishabh and Hu, Xinyue and Liu, Qingxu and Fezeu, Rostand A. K. and Dayalan, Udhaya Kumar and Verma, Saurabh and Ji, Peiqi and Li, Tao and Qian, Feng and Zhang, Zhi-Li},
title = {Lumos5G: Mapping and Predicting Commercial MmWave 5G Throughput},
year = {2020},
isbn = {9781450381383},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/3419394.3423629},
booktitle = {Proceedings of the ACM Internet Measurement Conference},
pages = {176–193},
numpages = {18},
keywords = {bandwidth estimation, mmWave, machine learning, Lumos5G, throughput prediction, deep learning, prediction, 5G},
location = {Virtual Event, USA},
series = {IMC '20}


Please feel free to contact the FiveGophers/Lumos5G team for questions or information about the data (,,,,


Lumos5G 1.0 dataset is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.


Related to above sarch keywords following tweets were extracted b/w 15 nov 2020 to 10 jan 2021

29499  English TWEETS extracted,

4628 Japanese tweets extracted

678 Hindi tweets extracted 



YonseiStressImageDatabase is a database built for image-based stress recognition research. We designed an experimental scenario consisting of steps that cause or do not cause stress; Native Language Script Reading, Native Language Interview, Non-native Language Script Reading, Non-native Language Interview. And during the experiment, the subjects were photographed with Kinect v2. We cannot disclose the original image due to privacy issues, so we release feature maps obtained by passing through the network.



Database Structure

- YonseiStressImageDatabase

         - Subject Number (01~50)

                  - Data acquisition phase

                    (Native Language Script Reading, Native Language Interview, Non-native Language Script Reading, Non-native Language Interview)

                           - Data (*.npy, the filename is set to the time the data was acquired; YYYYMMDD_hhmmss_ms)


In the case 'Non-native_Language_Interview' data of subject 26, it was not acquired due to equipment problems.


Citing YonseiStressImageDatabase

If you use YonseiStressImageDatabase in a scientific publication, we would appreciate references to the following paper:

Now Reviewing.


Usage Policy

Copyright © 2019 AI Hub, Inc.,

AI data provided by AI Hub was built as part of a business National Information Society Agency's 'Intelligent information industry infrastructure construction project' in Korea, and the ownership of this database belongs to National Information Society Agency.

Specialized field AI data was built for artificial intelligence technology development and prototype production and can be used for research purposes in various fields such as intelligent services and chatbots.



The DREAM (Data Rang or EArth Monitoring): a multimode database including optics, radar, DEM and OSM labels for deep machine learning purposes.

DREAM, is a multimodal remote sensing database, developed from open-source data.

The database has been created using the Google Earth Engine platform, the GDAL python library; the “pyosm” python package developed by Alexandre Mayerowitz (Airbus, France) It includes two subsets:

France  on a 10mx10 m UTM Grid:


The two datasets are stored in two separate zip files : and After decompression, each directory contain different sub directories with different areas. Each available tile is a 1024x1024 tile GeoTiffs format.

In France:

CoupleZZ_S2_date1_date2_XX_YY, Uint16 GeoTiff, UTM, RGBCoupleZZ_SRTM_V2_XX_YY Int16 GeoTiffCoupleZZ_S1_date2_date1_XX_YY  Flot32 GeoTiff 2 bands, Red:VV, Green: HVCoupleZZ_S1moy_date2__dual_XX_YY Float32 GeoTiff 2 bands, Red:VV, Green: HVCoupleZZ_OSMraster_XX_YY  Uint8 3 bands RGB GeoTIff




The data-set used in the paper titled "Short-Term Load Forecasting Using an LSTM Neural Network."