The detection of settlements without electricity challenge track (Track DSE) of the 2021 IEEE GRSS Data Fusion Contest, organized by the Image Analysis and Data Fusion Technical Committee (IADF TC) of the IEEE Geoscience and Remote Sensing Society (GRSS), Hewlett Packard Enterprise, SolarAid, and Data Science Experts, aims to promote research in automatic detection of human settlements deprived of access to electricity using multimodal and multitemporal remote sensing data.

Last Updated On: 
Sun, 02/28/2021 - 07:59
Citation Author(s): 
Colin Prieur, Hana Malha, Frederic Ciesielski, Paul Vandame, Giorgio Licciardi, Jocelyn Chanussot, Pedram Ghamisi, Ronny Hänsch, Naoto Yokoya

A medium-scale synthetic 4D Light Field video dataset for depth (disparity) estimation. From the open-source movie Sintel. The dataset consists of 24 synthetic 4D LFVs with 1,204x436 pixels, 9x9 views, and 20–50 frames, and has ground-truth disparity values, so that can be used for training deep learning-based methods. Each scene was rendered with a clean pass after modifying the production file of Sintel with reference to the MPI Sintel dataset.



Light Field videos:
  • 24 synthetic scenes
  • 1,204x436 pixels
  • 9x9 views
  • 20--50 frames
Ground-truth disparity values:
  • Provides disparity values for all scenes, all views, all frames, and all pixels.
  • The disparity value was obtained by transforming the depth value obtained in Blender.
    • The unit of disparity is [mm], so if the unit of [px] is needed, it needs to be multiplied by 32 to convert. (Mentioned in this issue)
Light Field setup:
  • Rendering with a “clean” pass using Blender (render25 branch).
  • The Light Field was captured by moving the camera to 9x9 viewpoints with a baseline of 0.01[m] towards a common focal plane while keeping the optical axes parallel. 



Three types of datasets are provided on this page.

The reason for the three types is to eliminate the need to download extra data.

All types include all scene, all frames, and differ only in the RGB and disparity views.

  • Includes RGB sequences for 9x9 views and disparity sequences for 9x9 views.
  • The unzipped file has 190GiB.
  • It can be used for a variety of depth estimations, e.g. not only light field but also (multi) stereo, as it includes the disparity for all views.
  • Includes RGB sequences for 9x9 views and disparity sequences for center view.
  • The unzipped file has 51.4GiB.
  • It can be used for light field-based depth estimations using 9x9 views.
  • Includes RGB sequences for cross-hair views and disparity sequences for center view.
  • The unzipped file has 12.1GiB.
  • It can be used for light field-based depth estimations using cross-hair views.
    • This is the data we used in our paper. (Note: We didn't use the scene named shaman_b_2 because it was not completed at that time.)

* The datasets contain RGB in .png and disparity in .npy.


File structure.

The following is the case of Sintel_LFV_9x9_with_all_disp.

In other cases, there is no view directory or no disparity file.

The naming convention for the view directory is {viewpoint_y:02}_{viewpoint_x:02} with 00_00 being the upper left viewpoint.


  ┣━━ ambushfight_1/    ...    scene directory
  ┃          ┣━━ 00_00/ ...    view directory
  ┃          ┃         ┣━━ 000.png ...    RGB of frame 0
  ┃          ┃         ┣━━ 000.npy ...    disparity of frame 0
  ┃          ┃         ┣━━ 001.png ...    RGB of frame 1
  ┃          ┃         ┣━━ 001.npy ...    disparity of frame 1
  ┃          ┣━━ 04_04/ ...    center view directory 
  ┃          ┃         ┣━━ 000.png ...    RGB of frame 0
  ┃          ┃         ┣━━ 000.npy ...    disparity of frame 0
  ┃          ┗━━ .../
  ┣━━ ambushfight_2/
  ┣━━ ambushfight_3/
  ┗━━ .../



Retinal Fundus Multi-disease Image Dataset (RFMiD) consisting of a wide variety of pathological conditions. 


Detailed instructions about this dataset are available on the challenge website:


This dataset contains 1944 data, which are scanned by the HIS-RING PACT system.

the data sampling rate of our system is 40 MSa/s, a 128-elements 2.5MHz full-view ring-shaped transducer with 30mm radius. 

 continuous updating.....


本研究中使用的柑橘叶数据集来自 PlantVillage [24],用于以下方面的开放访问公共资源: 与农业有关的内容。数据集包括三种类型柑桔叶片:柑桔健康,柑桔HLB(黄龙病) 一般,柑橘HLB严重。原始数据集包含4577张柑橘叶片图像,分为三部分 分类


Computer vision in animal monitoring has become a research application in stable or confined conditions.

Detecting animals from the top view is challenging due to barn conditions.

In this dataset called ICV-TxLamb, images are proposed for the monitoring of lamb inside a barn.

This set of data is made up of two categories, the first is lamb (classifies the only lamb), the second consists of four states of the posture of lambs, these are: eating, sleeping, lying down, and normal (standing or without activity ).


SoftCast-based linear video coding and transmission (LVCT) schemes have been proposed as a promising alternative to traditional video coding and transmission schemes in wireless environments. Currently, the performance of LVCT schemes is evaluated by means of traditional objective scores such as PSNR or SSIM.


For more information, please refer to the following paper:

Anthony Trioux, Giuseppe Valenzise, Marco Cagnazzo, Michel Kieffer, François-Xavier Coudoux, et al., Subjective and Objective Quality Assessment of the SoftCast Video Transmission Scheme. 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Dec 2020, Macau, China.

The SoftCast Quality-of-Experience Database consists of 5 RAW HD reference videos and 85 videos transmitted and received through the SoftCastlinear video coding and transmission scheme, each with a duration of 5 seconds. Note that only the luminance is considered in this database. Furthermore, the number of frames depends on the framerate of the video (125 frames for 25fps and 150frames for 30fps).

In order to generate several scenario of transmission, three parameters were considered: Two GoP-sizes (8 and 32 frames), two Compression Ratios (CR=1 and 0.25) and Channel Signal-to-Noise Ratio (CSNR varying from 0 to 30dB by 3dB step). This database was evaluated by 21 participants (8 women and 13 men). They were asked to score the quality of each received video sequence according to the original one on a numerical impairment scale [0-100]. A training session including 10 stimuli was organized for each observer prior to the test in order to familiarize them with the procedure, the specific artifacts of SoftCast as well as the impairment scale.

Video files are named using the following structure:

Video_filename_y_only_GoP_X_CR_Y_CSNR_ZdB.yuv where X equals either 8 or 32 frames, Y equals either 1 or 0.25 and Z either equals 0,3,6,9,12,15,18,21,24,27,30dB.

The original video files are denoted: Video_filename_y_only_ori.yuv.

Each video file is in *.yuv format (4:2:0) where the chrominance plans are all set to 128. (This process allows to perform the VMAF computation as VMAF requires either a yuv420p, yuv422p, yuv444p, yuv420p10le, yuv422p10le or yuv444p10le video format).

The subjective scores are available in the MOS_scores.xls file.

The objective scores (frame by frame) for each videos are available in the file.