One of the weak points of most of denoising algoritms (deep learning based ones) is the training data. Due to no or very limited amount of groundtruth data available, these algorithms are often evaluated using synthetic noise models such as Additive Zero-Mean Gaussian noise. The downside of this approach is that these simple model do not represent noise present in natural imagery. For evaluation of denoising algorithms’ performance in poor light conditions, we need either representative models or real noisy images paired with those we can consider as groundtruth.


Retail Gaze, a dataset for remote gaze estimation in real-world retail environments. Retail Gaze is composed of 3,922 images of individuals looking at products in a retail environment, with 12 camera capture angles.Each image captures the third-person view of the customer and shelves. Location of the gaze point, the Bounding box of the person's head, segmentation masks of the gazed at product areas are provided as annotations.


A dataset with more comprehensive category labels, richer data scenes, and more diverse image sizes were constructed. All images had been labeled.
The num of all annotations is 8232. This dataset is openly accessible to all future research workers for rapid deployment of mask detection subtasks during the New Crown out- break and in all possible future scenarios.


In this paper, we propose a framework for 3D human pose estimation using a single 360° camera mounted on the user's wrist. Perceiving a 3D human pose with such a simple setup has remarkable potential for various applications (e.g., daily-living activity monitoring, motion analysis for sports training). However, no existing method has tackled this task due to the difficulty of estimating a human pose from a single camera image in which only a part of the human body is captured, and because of a lack of training data.


Document layout analysis (DLA) plays an important role for identifying and classifying the different regions of digital documents in the context of Document Understanding tasks. In light of this, SciBank seeks to provide a considerable amount  of data from text (abstract, text blocks, caption, keywords, reference, section, subsection, title), tables, figures and equations (isolated equations and inline equations) of 74435 scientific articles pages. Human curators validated that these 12 regions were properly labeled.

  1. Datasheet_for_SciBank_Dataset.pdf. The Datasheet for this Dataset includes all the relevant details of the composition, collection, preprocessing, cleaning and labeling process used to construct SciBank.
  2. METADATA_FINAL.csv. Each row represent the metadata for every region according to the following fields
    1. Folder: the name of the folder within the main folder PAPER_TAR
    2. Page: png filename of the image where the region is located
    3. Height_Page, Width_Page: dimensions in pixels of the png image page
    4. CoodX, CoodY, Width, Height: coordinates of the region in pixels 
    5. Class: region label
    6. Page_in_pdf: page number within the PDF containing the page of the region
  3. PAPER_TAR folder includes the PNG images from all paper pages and the PDF papers in hierarchical subdirectories, both referenced by METADATA_FINAL.csv. 



This dataset was prepared to aid in the creation of a machine learning algorithm that would classify the white blood cells in thin blood smears of juvenile Visayan warty pigs. The creation of this dataset was deemed imperative because of the limited availability of blood smear images collected from the critically endangered species on the internet. The dataset contains 3,457 images of various types of white blood cells (JPEG) with accompanying cell type labels (XLSX).


Automated driving in public traffic still faces many technical and legal challenges. However, automating vehicles at low speeds in controlled industrial environments is already achievable today. A reliable obstacle detection is mandatory to prevent accidents. Recent advances in convolutional neural network-based algorithms have made it conceivable to replace distance measuring laser scanners with common monocameras.


The Active-Passive SimStereo dataset is a simulated dataset created with Blender containing high quality both realistic and abstract looking images. Each image pair is rendered in classic RGB domain, as well as Near-Infrared with an active pattern. It is meant to be used as a dataset to study domain transfert between active and passive stereo vision, as well as providing a high quality active stereo dataset, which are far less common than passive stereo datasets.


Dataset content

The dataset contains 528 image pairs, which are pre-splitted into a test set of 103 image pairs  and a train set of 425 image pairs.

The images have a standard resolution of 640x480 pixels.

The provided ground truth is precise up to floating point precision.

Raw images and ground truth disparities

The raw images in linear color space, as well as the ground truth disparities, are provided as exr images with 32bits floating point precision.

To read the different layers, any exr library should work fine. We are using the python module provided by the exr-tools project by french-paragon:

You can also inspect the content of the images with the embeded image viewer in blender or any software capable of reading exr images.

  • The left and right images are contained in the Left, Respectively Right, Layers.
  • The Color, Nir and Disparity Images are contained in the Color, Nir and Disp passes.
  • The Color passe is made up of the standard R, G and B channels
  • The Nir pass is made up of a single A channel
  • The Disp pass is made up of a single D channel

Additional Disparity formats

To make the ground truth disparities easier to use in a variety of situation, we do provide them in npy (python numpy) and pfm (portable float map) formats as well.

Color managed images

We also exported colormanaged perceptual images (sRGB instead of RGB Linear color space) in standard jpg format if you prefer to use these.

The Images are in the rgbColormanaged and nirColormanaged folder. Each image pair is made up of two images with the same name, but the left or right suffix. You can open them with any software or library able to read .jpg images


[EDIT 04-26-2022] Simulation toolkit

We made the simulation toolkit we used to produce the dataset available, so that you can create your own images if you need to. See the file.

[EDIT 06-15-2022] Additional formats for the disparities

We added more image formats for the ground truth disparities to make them easier to use in a variety of scenarios.