This repository contains:

  • age-stratified Covid-19 case and fatality data for different countries and at different points in time, and
  • an interactive Jupyter notebook for mediation analysis of age-related causal effects on case fatality rates,

published as part of the following paper:

"Simpson's paradox in Covid-19 case fatality rates: a mediation analysis of age-related causal effects". J von Kügelgen*, L Gresele*, B Schölkopf. (*equal contribution).

We provide the following three separate datasets:

  • a dataset containing only the most recent numbers from: Argentina, China, Colombia, Italy, Netherlands, Portugal, South Africa, Spain, Sweden, Switzerland, South Korea and the Diamond Princess cruise ship (last checked: end of May 2020)
  • a longitudinal dataset containing several reports from Italy (9 March - 26 May 2020)
  • a longitudinal dataset containing several reports from Spain (22 March - 29 May 2020)

All numbers of confirmed cases and fatalities are stratified by age into groups of 10 years (0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+), and contain the date and country of reporting, as well as links to the corresponding sources (generally health agenices/ministries, or scientific publications).

Please consult the paper and notebook for further details.


Fall is a prominent issue due to its severe consequences both physically and mentally. Fall detection and prevention is a critical area of research because it can help elderly people to depend less on caregivers and allow them to live and move more independently. Using electrocardiograms (ECG) signals independently for fall detection and activity classification is a novel approach used in this paper.


see readme.pdf


Falls are a major health problem with one in three people over the age of 65 falling each year, oftentimes causing hip fractures, disability, reduced mobility, hospitalization and death. A major limitation in fall detection algorithm development is an absence of real-world falls data. Fall detection algorithms are typically trained on simulated fall data that contain a well-balanced number of examples of falls and activities of daily living. However, real-world falls occur infrequently, making them difficult to capture and causing severe data imbalance.


Follow instruction in readme file


This dataset is from apache access log server. It contains: ip address, datetime, gmt, request, status, size, user agent, country, label. The dataset show malicious activity in IP address, request, and so on. You can analyze more as intrusion detection parameter.


This dataset contains: ip address, datetime, gmt, request, status, size, user agent, country, label. Allowed traffic only from Indonesia, because the web is local purpose, so this dataset assume the traffic from abroad is prohobited.


The detection of settlements without electricity challenge track (Track DSE) of the 2021 IEEE GRSS Data Fusion Contest, organized by the Image Analysis and Data Fusion Technical Committee (IADF TC) of the IEEE Geoscience and Remote Sensing Society (GRSS), Hewlett Packard Enterprise, SolarAid, and Data Science Experts, aims to promote research in automatic detection of human settlements deprived of access to electricity using multimodal and multitemporal remote sensing data.

Last Updated On: 
Thu, 12/03/2020 - 04:16
Citation Author(s): 
Colin Prieur, Hana Malha, Frederic Ciesielski, Paul Vandame, Giorgio Licciardi, Jocelyn Chanussot, Pedram Ghamisi, Ronny Hänsch, Naoto Yokoya

A medium-scale synthetic 4D Light Field video dataset for depth (disparity) estimation. From the open-source movie Sintel. The dataset consists of 24 synthetic 4D LFVs with 1,204x436 pixels, 9x9 views, and 20–50 frames, and has ground-truth disparity values, so that can be used for training deep learning-based methods. Each scene was rendered with a clean pass after modifying the production file of Sintel with reference to the MPI Sintel dataset.



Light Field videos:
  • 24 synthetic scenes
  • 1,204x436 pixels
  • 9x9 views
  • 20--50 frames
Ground-truth disparity values:
  • Provides disparity values for all scenes, all views, all frames, and all pixels.
  • The disparity value was obtained by transforming the depth value obtained in Blender.
    • The unit of disparity is [mm], so if the unit of [px] is needed, it needs to be multiplied by 32 to convert. (Mentioned in this issue)
Light Field setup:
  • Rendering with a “clean” pass using Blender (render25 branch).
  • The Light Field was captured by moving the camera to 9x9 viewpoints with a baseline of 0.01[m] towards a common focal plane while keeping the optical axes parallel. 



Three types of datasets are provided on this page.

The reason for the three types is to eliminate the need to download extra data.

All types include all scene, all frames, and differ only in the RGB and disparity views.

  • Includes RGB sequences for 9x9 views and disparity sequences for 9x9 views.
  • The unzipped file has 190GiB.
  • It can be used for a variety of depth estimations, e.g. not only light field but also (multi) stereo, as it includes the disparity for all views.
  • Includes RGB sequences for 9x9 views and disparity sequences for center view.
  • The unzipped file has 51.4GiB.
  • It can be used for light field-based depth estimations using 9x9 views.
  • Includes RGB sequences for cross-hair views and disparity sequences for center view.
  • The unzipped file has 12.1GiB.
  • It can be used for light field-based depth estimations using cross-hair views.
    • This is the data we used in our paper. (Note: We didn't use the scene named shaman_b_2 because it was not completed at that time.)

* The datasets contain RGB in .png and disparity in .npy.


File structure.

The following is the case of Sintel_LFV_9x9_with_all_disp.

In other cases, there is no view directory or no disparity file.

The naming convention for the view directory is {viewpoint_y:02}_{viewpoint_x:02} with 00_00 being the upper left viewpoint.


  ┣━━ ambushfight_1/    ...    scene directory
  ┃          ┣━━ 00_00/ ...    view directory
  ┃          ┃         ┣━━ 000.png ...    RGB of frame 0
  ┃          ┃         ┣━━ 000.npy ...    disparity of frame 0
  ┃          ┃         ┣━━ 001.png ...    RGB of frame 1
  ┃          ┃         ┣━━ 001.npy ...    disparity of frame 1
  ┃          ┣━━ 04_04/ ...    center view directory 
  ┃          ┃         ┣━━ 000.png ...    RGB of frame 0
  ┃          ┃         ┣━━ 000.npy ...    disparity of frame 0
  ┃          ┗━━ .../
  ┣━━ ambushfight_2/
  ┣━━ ambushfight_3/
  ┗━━ .../



This dataset contains RF signals from drone remote controllers (RCs) of different makes and models. The RF signals transmitted by the drone RCs to communicate with the drones are intercepted and recorded by a passive RF surveillance system, which consists of a high-frequency oscilloscope, directional grid antenna, and low-noise power amplifier. The drones were idle during the data capture process. All the drone RCs transmit signals in the 2.4 GHz band. There are 17 drone RCs from eight different manufacturers and ~1000 RF signals per drone RC, each spanning a duration of 0.25 ms. 


The dataset contains ~1000 RF signals in .mat format from the remote controllers (RCs) of the following drones:

  • DJI (5): Inspire 1 Pro, Matrice 100, Matrice 600*, Phantom 4 Pro*, Phantom 3 
  • Spektrum (4): DX5e, DX6e, DX6i, JR X9303
  • Futaba (1): T8FG
  • Graupner (1): MC32
  • HobbyKing (1): HK-T6A
  • FlySky (1): FS-T6
  • Turnigy (1): 9X
  • Jeti Duplex (1): DC-16.

In the dataset, there are two pairs of RCs for the drones indicated by an asterisk above, making a total of 17 drone RCs. Each RF signal contains 5 million samples and spans a time period of 0.25 ms. 

The scripts provided with the dataset defines a class to create drone RC objects and creates a database of objects as well as a database in table format with all the available information, such as make, model, raw RF signal, sampling frequency, etc. The scripts also include functions to visualize data and extract a few example features from the raw RF signal (e.g., transient signal start point). Instructions for using the scripts are included at the top of each script and can also be viewed by typing help scriptName in MATLAB command window.  

The drone RC RF dataset was used in the following papers:

  • M. Ezuma, F. Erden, C. Kumar, O. Ozdemir, and I. Guvenc, "Micro-UAV detection and classification from RF fingerprints using machine learning techniques," in Proc. IEEE Aerosp. Conf., Big Sky, MT, Mar. 2019, pp. 1-13.
  • M. Ezuma, F. Erden, C. K. Anjinappa, O. Ozdemir, and I. Guvenc, "Detection and classification of UAVs using RF fingerprints in the presence of Wi-Fi and Bluetooth interference," IEEE Open J. Commun. Soc., vol. 1, no. 1, pp. 60-79, Nov. 2019.
  • E. Ozturk, F. Erden, and I. Guvenc, "RF-based low-SNR classification of UAVs using convolutional neural networks." arXiv preprint arXiv:2009.05519, Sept. 2020.

Other details regarding the dataset and data collection and processing can be found in the above papers and attached documentation.  


Author Contributions:

  • Experiment design: O. Ozdemir and M. Ezuma
  • Data collection:  M. Ezuma
  • Scripts: F. Erden and C. K. Anjinappa
  • Documentation: F. Erden
  • Supervision, revision, and funding: I. Guvenc 



This work was supported in part by NASA through the Federal Award under Grant NNX17AJ94A, and in part by NSF under CNS-1939334 (AERPAW, one of NSF's Platforms for Advanced Wireless Research (PAWR) projects).


This dataset is composed of side channel information (e.g., temperatures, voltages, utilization rates) from computing systems executing benign and malicious code.  The intent of the dataset is to allow aritificial intelligence tools to be applied to malware detection using side channel information.


Retinal Fundus Multi-disease Image Dataset (RFMiD) consisting of a wide variety of pathological conditions. 


Detailed instructions about this dataset are available on the challenge website:


Predicting energy consumption is currently a key challenge for the energy industry as a whole.  Predicting the consumption in a certain area is massively complicated due to the sudden changes in the way that energy is being consumed and generated at the current point in time. However, this prediction becomes extremely necessary to minimise costs and to enable adjusting (automatically) the production of energy and better balance the load between different energy sources.

Last Updated On: 
Wed, 12/23/2020 - 12:16
Citation Author(s): 
Isaac Triguero