Dataset was created as part of joint efforts of two research groups from the University of Novi Sad, which were aimed towards development of vision based systems for automatic identification of insect species (in particular hoverflies) based on characteristic venation patterns in the images of the insects' wings.The set of wing images consists of high-resolution microscopic wing images of several hoverfly species. There is a total of 868 wing images of eleven selected hoverfly species from two different genera, Chrysotoxum and Melanostoma.



## University of Novi Sad (UNS), Hoverflies classification dataset - ReadMe file


Version 1.0

Published: December, 2014


## Dataset authors:

* Zorica Nedeljković    (zoricaned14 a_t, A1

* Jelena Ačanski    (jelena.acanski a_t, A1

* Marko Panić    (mpanic a_t, A2

* Ante Vujić    (ante.vujic a_t, A1

* Branko Brkljač    (brkljacb a_t, A2, *corr. auth.


Dataset was created as part of joint efforts of two research groups from the University of Novi Sad, which were aimed towards development of vision based systems for automatic identification of insect species (in particular hoverflies) based on characteristic venation patterns in the images of the insects' wings. At the time of dataset's development, authors affiliations were:

 * A1: Department of Biology and Ecology, Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 2, 21000 Novi Sad, Republic of Serbia


* A2: Department of Power, Electronic and Telecommunication Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, 21000 Novi Sad, Republic of Serbia

University of Novi Sad:


# Dataset description:

The set of wing images consists of high-resolution microscopic wing images of several hoverfly species. There is a total of 868 wing images of eleven selected hoverfly species from two different genera, Chrysotoxum and Melanostoma. 

The wings have been collected from many different geographic locations in the Republic of Serbia during a relatively long period of time of more than two decades. Wing images were obtained from the wing specimens mounted in the glass microscopic slides by a microscopic device equipped with a digital camera with image resolution of 2880 × 1550 pixels and were originally stored in the TIFF image format.

Each wing specimen was uniquely numbered and associated with the taxonomy group it belongs to. Association of eachwing with a particular species was based on the classification of the insect at the time when it was collected and beforethe wings were detached. This classification was done after examination by a skilled expert.  

In the next step, digital images were acquired by biologists, under a relatively uncontrolled conditions of nonuniform background illumination and variable scene configuration, and without camera calibration. In that sense, originally obtained digital images were not particularly suitable for exact measurements. Other shortcomings of the samples in the initial image dataset were result of variable wing specimens' quality, damaged or badly mounted wings, existence of artifacts, variable wing positions during image acquisitions, and dust.

In order to overcome these limitations and make images amenable to automatic discrimination of hoverflyspecies, they were first preprocessed. The preprocessing of each image consisted of image rotation to a unified horizontalposition, wing cropping, and subsequent scaling of the cropped wing image. Cropping eliminated unnecessary background containing artifacts, while the aspect ratio-preserving image scaling enabled overcoming of the problem of variable size among the wings of the same species. Described scaling was performed after computing average width and average height of all cropped images, which were then interpolated to the same width of 1680 pixels using bicubic interpolation. Given width value was selected based on the prevailing image width among the wing images of different species.

Wing images obtained in this way formed the final wing images dataset used for the sliding-window detector training, its performance evaluation, and subsequent hoverfly species discrimination using the trained landmark points detector, described in [1, 2].

* Besides images of the whole wings (in the folder "Wing images"), provided "UNS_Hoverflies" dataset also consists of the small image patches (64x64 pixels) corresponding to 18 predetermined landmark points in each wing, which were systematically collected and organized inside the second root folder named "Training - test set". Each patch among the "Patch_positives" was manually cropped from the preprocessed wing image (i.e. rotated, cropped and scaled to the same predefined image width). However, images of the whole wings that were stored in the folder "Wing images", are provided without additional scaling step in the preprocessing procedure, and correspond to wing images that were only rotated and cropped.

"Wing images" are organized in two subfolders named "disk_1" and "disk_2", which correspond to two DVD drives where they were initially stored. Each folder also comes with additional .xml file containing some metadata. In "Wing images", .xml files contain average spatial size of the images in the given folder, while in the "Training - test set", individual .xml files contain additional data about created image patches (in case of patches corresponding to landmark points, "Patch_positives", each .xml contains image intrinsic spatial coordinates of each landmark point, as well as additional data about the corresponding specimen - who created it, when and where it was gathered, taxonomy, etc. Landmark points have unique numeration from 1 to 18, also provided by figures in [1,2]. In case of "Patch_negatives", each subfolder named after wing identifier, e.g. "W0034_neg", contains 40 randomly selected image patches that correspond to any part of the preprocessed image excluding one of the 18 landmark points and their closest surrounding. Although image patches were generated for all species, only a subset of images corresponding to the species with the highest number of specimens was used in the original classification studies described in [1, 2]. However, in the present form "UNS_Hoverflies" dataset contains all initially processed wing images and image patches.

Besides previously described data, which are the main part of the dataset, repository also contains the original microscopic images of insects' wings, stored without any additional processing after acquisition. These files are available in the second .zip archive denoted by the suffix "unprocessed".


Directory structure:

UNS_Hoverflies_Dataset├── Training - test set│   ├── Patch_negatives│   ├── Patch_positives└── Wing images    ├── disk_1    └── disk_2


UNS_Hoverflies_Dataset_unprocessed│└── Unprocessed wing images    ├── disk_1    └── disk_2


# How to cite:

We would be glad if you intend to use this dataset. In such case, please consider to cite our work as:


@article{UNShoverfliesDataset2019,author = {Zorica Nedeljković and Jelena Ačanski and Marko Panić and Ante Vujić and Branko Brkljač},title = {University of Novi Sad (UNS), Hoverflies classification dataset},journal = {{IEEE} DataPort},year = {2019}} and/or any of the corresponding original publications:

## References:

[1] Branko Brkljač, Marko Panić, Dubravko Ćulibrk, Vladimir Crnojević, Jelena Ačanski, and Ante Vujić, “Automatic hoverfly species discrimination,” in Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, vol. 2, pp. 108–115, SciTePress, Vilamoura, 2012.

[2] Vladimir Crnojević, Marko Panić, Branko Brkljač, Dubravko Ćulibrk, Jelena Ačanski, and Ante Vujić, “Image Processing Method for Automatic Discrimination of Hoverfly Species,” Mathematical Problems in Engineering, vol. 2014, Article ID 986271, 12 pages, 2014.


** This dataset is published on IEEE DataPort repository under CC BY-NC-SA 4.0 license by the authors (for more information please visit:


We introduce a new robotic RGBD dataset with difficult luminosity conditions: ONERA.ROOM. It comprises RGB-D data (as pairs of images) and corresponding annotations in PASCAL VOC format (xml files)

It aims at People detection, in (mostly) indoor and outdoor environments. People in the field of view can be standing, but also lying on the ground as after a fall.


To facilitate use of some deep learning softwares, a folder tree with relative symbolic link (thus avoiding extra space) will gather all the sequences in three folders : | |— image |        | — sequenceName0_imageNumber_timestamp0.jpg |        | — sequenceName0_imageNumber_timestamp1.jpg |        | — sequenceName0_imageNumber_timestamp2.jpg |        | — sequenceName0_imageNumber_timestamp3.jpg |        | — … | |— depth_8bits |        | — sequenceName0_imageNumber_timestamp0.png |        | — sequenceName0_imageNumber_timestamp1.png |        | — sequenceName0_imageNumber_timestamp2.png |        | — sequenceName0_imageNumber_timestamp3.png |        | — … | |— annotations |        | — sequenceName0_imageNumber_timestamp0.xml |        | — sequenceName0_imageNumber_timestamp1.xml |        | — sequenceName0_imageNumber_timestamp2.xml |        | — sequenceName0_imageNumber_timestamp3.xml |        | — … |


The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits.


#Basic Intructions for usage

Make sure you have the following folder structure in the data directory after you unzip the file:


├── splits

├── test_once

│   ├── test1_labels.npy

│   ├── test1_seismic.npy

│   ├── test2_labels.npy

│   └── test2_seismic.npy

└── train

    ├── train_labels.npy

    └── train_seismic.npy

The train and test data are in NumPy .npy format ideally suited for Python. You can open these file in Python as such: 

import numpy as np

train_seismic = np.load('data/train/train_seismic.npy')

Make sure the testing data is only used once after all models are trained. Using the test set multiple times makes it a validation set.

We also provide fault planes, and the raw horizons that were used to generate the data volumes in addition to the processed data volumes before splitting to training and testing.

# References:

1- Netherlands Offshore F3 block. [Online]. Available: OffshoreF3BlockComplete4GB

2- Alaudah, Yazeed, et al. "A machine learning benchmark for facies classification." Interpretation 7.3 (2019): 1-51.



This dataset was developed at the School of Electrical and Computer Engineering (ECE) at the Georgia Institute of Technology as part of the ongoing activities at the Center for Energy and Geo-Processing (CeGP) at Georgia Tech and KFUPM. LANDMASS stands for “LArge North-Sea Dataset of Migrated Aggregated Seismic Structures”. This dataset was extracted from the North Sea F3 block under the Creative Commons license (CC BY-SA 3.0).


The LANDMASS database includes two different datasets. The first, denoted LANDMASS-1, contains 17667 small “patches” of size 99x99 pixels. it includes 9385 Horizon patches, 5140 chaotic patches, 1251 Fault patches, and 1891 Salt Dome patches. The images in this database have values in the range [-1,1]. The second dataset, denoted LANDMASS-2, contains 4000 images. Each image is of size 150x300 pixels and normalized to values in the range [0,1]. Each one of the four classes has 1000 images. Sample images from each database for each class can be found under the /samples file.


We present a dataset of human visual attention on 2D images during scene free viewing. This dataset includes 1900 images, which are corrputed by various image transformations. This dataset is manually annotated with human eye-movement data recorded by Tobii X120 eye-tracker. This dataset provides a new benchmark to measure the robustness of saliency prediction models on various transformed scenes.


The is a dataset for indoor depth estimation that contains 1803 synchronized image triples (left, right color image and depth map), from 6 different scenes, including a library, some bookshelves, a conference room, a cafe, a study area, and a hallway. Among these images, 1740 high-quality ones are marked as high-quality imagery. The left view and the depth map are aligned and synchronized and can be used to evaluate monocular depth estimation models. Standard training/testing splits are provided.


Please refer to the README file for detailed instructions.

Dataset usage must comply with the LICENSE provided.


PRECIS HAR represents a RGB-D dataset for human activity recognition, captured with the 3D camera Orbbec Astra Pro. It consists of 16 different activities (stand up, sit down, sit still, read, write, cheer up, walk, throw paper, drink from a bottle, drink from a mug, move hands in front of the body, move hands close to the body, raise one hand up, raise one leg up, fall from bed, and faint), performed by 50 subjects.


The dataset consists of RGB data (.mp4 files) and depth data (.oni files). We provide both cropped and raw versions. The cropped videos are shorter, containing only the seconds of interest, i.e. where the activity is performed. The raw videos are longer, containing all the video that we captured while filming the dataset. We included both variants, because they can all be useful for different applications.

Video names follow the pattern <subject_id>_<activity_id>.<extension>, where:

  • <subject_id> is an integer between 1 and 50;

  • <activity_id> is an integer between 1 and 16, with the following mapping: 1 = stand up, 2 = sit down, 3 = sit still, 4 = read, 5 = write, 6 = cheer up, 7 = walk, 8 = throw paper, 9 = drink from a bottle, 10 = drink from a mug, 11 = move hands in front of the body, 12 = move hands close to the body, 13 = raise one hand up, 14 = raise one leg up, 15 = fall from bed, 16 = faint;

  • <extension> is .mp4 or .oni, depending on the type of data (RGB or depth).

 In order to manipulate .oni files, we recommend using pyoni.


The dataset consists of 60285 character image files which has been randomly divided into 54239 (90%) images as training set 6046 (10%) images as test set. The collection of data samples was carried out in two phases. The first phase consists of distributing a tabular form and asking people to write the characters five times each. Filled-in forms were collected from around 200 different individuals in the age group 12-23 years. The second phase was the collection of handwritten sheets such as answer sheets and classroom notes from students in the same age group.


Water meter dataset. Contains 1244 water meter images. Assembled using a crowdsourcing platform Yandex.Toloka.


The dataset consists of 1244 images.

File name consists of:

1) water meter id

2) water meter readings


As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on the robustness of data-driven algorithms under diverse challenging conditions where trained models can possibly be depolyed. To achieve this goal, we introduced a large-sacle (1.M images) object recognition dataset (CURE-OR) which is among the most comprehensive datasets with controlled synthetic challenging conditions. In CURE




Image name format : 




1: White 2: Texture 1 - living room 3: Texture 2 - kitchen 4: 3D 1 - living room 5: 3D 2 – office




1: Front (0 º) 2: Left side (90 º) 3: Back (180 º) 4: Right side (270 º) 5: Top








No challenge 02: Resize 03: Underexposure 04: Overexposure 05: Gaussian blur 06: Contrast 07: Dirty lens 1 08: Dirty lens 2 09: Salt & pepper noise 10: Grayscale 11: Grayscale resize 12: Grayscale underexposure 13: Grayscale overexposure 14: Grayscale gaussian blur 15: Grayscale contrast 16: Grayscale dirty lens 1 17: Grayscale dirty lens 2 18: Grayscale salt & pepper noise


A number between [0, 5], where 0 indicates no challenge, 1 the least severe and 5 the most severe challenge. Challenge type 1 (no challenge) and 10 (grayscale) has a level of 0 only. Challenge types 2 (resize) and 11 (grayscale resize) has 4 levels (1 through 4). All other challenges have levels 1 to 5.