The Data Fusion Contest 2016: Goals and Organization

The 2016 IEEE GRSS Data Fusion Contest, organized by the IEEE GRSS Image Analysis and Data Fusion Technical Committee, aimed at promoting progress on fusion and analysis methodologies for multisource remote sensing data.

New multi-source, multi-temporal data including Very High Resolution (VHR) multi-temporal imagery and video from space were released. First, VHR images (DEIMOS-2 standard products) acquired at two different dates, before and after orthorectification:



After unzip, each directory contains:

  • original GeoTiff for panchromatic (VHR) and multispectral (4bands) images,

  • quick-view image for both in png format,

  • capture parameters (RPC file).



Wide varieties of scripts are used in writing languages throughout the world. In a multiscript and multi-language environment, it is necessary to know the different scripts used in every part of a document to apply the appropriate document analysis algorithm. Consequently, several approaches for automatic script identification have been proposed in the literature, and can be broadly classified under two categories of techniques: those that are structure and visual appearance-based and those that are deep learning-based.



The database consists of printed and handwritten documents. We realized that the documents from each script contain some sort of watermark owing to the fact that each script’s documents came from a different original native location. Therefore, the sheets and some layouts were different, depending on their origins. This poses a risk of the document watermark, rather than the script, being recognized, which could be the case with a deep learning-based classifier.

Segmenting text from the backgrounds of some documents was challenging. Even with state-of-the art segmentation techniques used, the result was not satisfactory, and included a lot of salt and pepper noise or black patches, or was missing some parts of the text.

To avoid these drawbacks and provide a dataset for script recognition, all the documents were preprocessed and converted to a white background, while the foreground text ink was equalized. Furthermore, all documents were manually revised. Both original and processed documents are included in the database.

To allow for script recognition at different levels (i.e., document, line and word), each document was divided into lines and each line into words. In the division, a line is defined as an image with 2 or more words, and a word is defined as an image with 2 or more characters.


The printed part of the database was recorded from a wide range of local newspapers and magazines to ensure that the samples would be as realistic as possible. The newspaper samples were collected mainly from India (as a wide verity of scripts are used there), Thailand, Japan, the United Arab Emirates and Europe. The database includes 13 different scripts: Arabic, Bengali, Gujarati, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu and Thai.

The newspapers were scanned at a 300 dpi resolution. Paragraphs with only one script were selected for the database (paragraph here means the headline and body text). Thus, different text sizes, fonts, and styles are included in the database. Further, we tried to ensure that all the text lines were not skewed horizontally. All images were saved in png format, and using the script_xxx.png naming convention, with script being an abbreviation or memo for each script, and xxx, the file number starting at 001 for each script.


Similar to the printed part in the handwritten database, we also included 13 different scripts: Persian as Arabic, Bengali, Gujarati, Punjabi, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu and Thai.

Most of the documents were provided by native volunteers capable of writing documents in their respective scripts. Each volunteer wrote a document, scanned it at 300 dpi, and then sent it to us by email. Consequently, the documents had large ink, sheet and scanner quality variations. Some of the Roman sheets came from the IAM handwritten database.


Due to the broad quality range of the documents, a two-step preprocessing was performed. In the first step, images are binarized by transforming the background into white, while in the second step, an ink equalization is performed.

Because the background texture, noise and illumination condition are primary factors degrading document image binarization performance, we used an iterative refinement framework in this paper to support robust binarization, In the process, the input image is initially transformed into a Bhattacharyya similarity matrix with a Gaussian kernel, which is subsequently converted into a binary image using a maximum entropy classifier. Then, the run-length histogram estimates the character stroke width. After noise elimination, the output image is used for the next round of refinement, and the process terminates when the estimated stroke width is stable. However, some documents are not correctly binarized, and in such cases, a manual binarization is performed using local thresholds. All the documents were revised and some noise was removed manually.

For ink equalization, we used an ink deposition model.  All the black pixels on the binarized images were considered as ink spots and correlated with a Gaussian of width 0.2 mm.  Finally, the image was equalized to duplicate fluid ink.


For the lines from a document to be segmented, they must be horizontal, otherwise a skew correction algorithm must be used ADDIN CSL_CITATION
of Pattern Recognition and Computer
SCIENTIFIC","title":"Texture Analysis with Local Binary

For the line segmentation, each connected object/component of the image is detected, and its convex hull obtained. The result is dilated horizontally in order to connect the objects belonging to the same line  and each connected object is labeled. The next step is a line-by-line extraction, performed as follows:

1.     Select the top object of the dilated lines and determine its horizontal histogram.

2.     If its histogram has a single maximum, then it should be a single line, and the object is used as a mask to segment the line (see Figure 4).

3.     If the object has several peaks, we assume that there are several lines. To separate them, we follow the next steps:

a.     The object is horizontally eroded until the top object contains a single peak.

b.     The new top object is dilated to recover the original shape and is used as a mask to segment the top line.

4.     The top line is deleted, and the process is repeated from step 1 to the end.


The segmentation results were manually reviewed, and lines that had been wrongly segmented were manually repaired. The lines were saved as image files and named using the script_xxx_yyy.png format, where yyy is the line number, xxx isthe document number and script is the abbreviation for the script, as previously mentioned. Figure 3 presents an example of a segmented line for handwriting. These images are saved in grayscale format.


The words were segmented from the lines in two steps, with the first step being completely automatic. Each line was converted to a black and white component, a vertical histogram was obtained, and points where the value of the histogram was found to be zero were identified as the gaps or the intersection. Gaps wider than one-third of the line height were labeled as word separations.

In the second step, failed word segmentations were manually corrected. Each word was saved individually as a black and white image. The files were named using the script_xxx_yyy_zzz.png format, with zzz being the word number of the line script_xxx_yyy. For instance, a file named roma_004_012_004.png contains the black and white image of the fourth word on the 12th line of the 4th document in Roman script.

In Thai and Japanese, word segmentation is done heuristically because their lines consist of two or three long sequences of characters separated by a greater space. This is because in these scripts, there is generally no gap between two words, and contextual meaning is generally used to decide which characters comprise a word. Since we do not use contextual meaning in the present database, we used the following approach for pseudo-segmentation of Thai and Japanese scripts: for each sequence of characters, the first two characters are the first pseudo-word; the third to the fifth characters are the second pseudo-word; the sixth to the ninth character are the third pseudo-word, and so on, up to the end of the sequence.


It should be noted that in this work, our intention is not to develop a new line/word segmentation system. We only use this simple procedure to segment lines and words in a bid to build our database. We thus use a semi-automatic approach, with human verification and correction in case of erroneous segmentation.





Dataset was created as part of joint efforts of two research groups from the University of Novi Sad, which were aimed towards development of vision based systems for automatic identification of insect species (in particular hoverflies) based on characteristic venation patterns in the images of the insects' wings.The set of wing images consists of high-resolution microscopic wing images of several hoverfly species. There is a total of 868 wing images of eleven selected hoverfly species from two different genera, Chrysotoxum and Melanostoma.



## University of Novi Sad (UNS), Hoverflies classification dataset - ReadMe file


Version 1.0

Published: December, 2014


## Dataset authors:

* Zorica Nedeljković    (zoricaned14 a_t, A1

* Jelena Ačanski    (jelena.acanski a_t, A1

* Marko Panić    (mpanic a_t, A2

* Ante Vujić    (ante.vujic a_t, A1

* Branko Brkljač    (brkljacb a_t, A2, *corr. auth.


Dataset was created as part of joint efforts of two research groups from the University of Novi Sad, which were aimed towards development of vision based systems for automatic identification of insect species (in particular hoverflies) based on characteristic venation patterns in the images of the insects' wings. At the time of dataset's development, authors affiliations were:

 * A1: Department of Biology and Ecology, Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 2, 21000 Novi Sad, Republic of Serbia


* A2: Department of Power, Electronic and Telecommunication Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, 21000 Novi Sad, Republic of Serbia

University of Novi Sad:


# Dataset description:

The set of wing images consists of high-resolution microscopic wing images of several hoverfly species. There is a total of 868 wing images of eleven selected hoverfly species from two different genera, Chrysotoxum and Melanostoma. 

The wings have been collected from many different geographic locations in the Republic of Serbia during a relatively long period of time of more than two decades. Wing images were obtained from the wing specimens mounted in the glass microscopic slides by a microscopic device equipped with a digital camera with image resolution of 2880 × 1550 pixels and were originally stored in the TIFF image format.

Each wing specimen was uniquely numbered and associated with the taxonomy group it belongs to. Association of eachwing with a particular species was based on the classification of the insect at the time when it was collected and beforethe wings were detached. This classification was done after examination by a skilled expert.  

In the next step, digital images were acquired by biologists, under a relatively uncontrolled conditions of nonuniform background illumination and variable scene configuration, and without camera calibration. In that sense, originally obtained digital images were not particularly suitable for exact measurements. Other shortcomings of the samples in the initial image dataset were result of variable wing specimens' quality, damaged or badly mounted wings, existence of artifacts, variable wing positions during image acquisitions, and dust.

In order to overcome these limitations and make images amenable to automatic discrimination of hoverflyspecies, they were first preprocessed. The preprocessing of each image consisted of image rotation to a unified horizontalposition, wing cropping, and subsequent scaling of the cropped wing image. Cropping eliminated unnecessary background containing artifacts, while the aspect ratio-preserving image scaling enabled overcoming of the problem of variable size among the wings of the same species. Described scaling was performed after computing average width and average height of all cropped images, which were then interpolated to the same width of 1680 pixels using bicubic interpolation. Given width value was selected based on the prevailing image width among the wing images of different species.

Wing images obtained in this way formed the final wing images dataset used for the sliding-window detector training, its performance evaluation, and subsequent hoverfly species discrimination using the trained landmark points detector, described in [1, 2].

* Besides images of the whole wings (in the folder "Wing images"), provided "UNS_Hoverflies" dataset also consists of the small image patches (64x64 pixels) corresponding to 18 predetermined landmark points in each wing, which were systematically collected and organized inside the second root folder named "Training - test set". Each patch among the "Patch_positives" was manually cropped from the preprocessed wing image (i.e. rotated, cropped and scaled to the same predefined image width). However, images of the whole wings that were stored in the folder "Wing images", are provided without additional scaling step in the preprocessing procedure, and correspond to wing images that were only rotated and cropped.

"Wing images" are organized in two subfolders named "disk_1" and "disk_2", which correspond to two DVD drives where they were initially stored. Each folder also comes with additional .xml file containing some metadata. In "Wing images", .xml files contain average spatial size of the images in the given folder, while in the "Training - test set", individual .xml files contain additional data about created image patches (in case of patches corresponding to landmark points, "Patch_positives", each .xml contains image intrinsic spatial coordinates of each landmark point, as well as additional data about the corresponding specimen - who created it, when and where it was gathered, taxonomy, etc. Landmark points have unique numeration from 1 to 18, also provided by figures in [1,2]. In case of "Patch_negatives", each subfolder named after wing identifier, e.g. "W0034_neg", contains 40 randomly selected image patches that correspond to any part of the preprocessed image excluding one of the 18 landmark points and their closest surrounding. Although image patches were generated for all species, only a subset of images corresponding to the species with the highest number of specimens was used in the original classification studies described in [1, 2]. However, in the present form "UNS_Hoverflies" dataset contains all initially processed wing images and image patches.

Besides previously described data, which are the main part of the dataset, repository also contains the original microscopic images of insects' wings, stored without any additional processing after acquisition. These files are available in the second .zip archive denoted by the suffix "unprocessed".


Directory structure:

UNS_Hoverflies_Dataset├── Training - test set│   ├── Patch_negatives│   ├── Patch_positives└── Wing images    ├── disk_1    └── disk_2


UNS_Hoverflies_Dataset_unprocessed│└── Unprocessed wing images    ├── disk_1    └── disk_2


# How to cite:

We would be glad if you intend to use this dataset. In such case, please consider to cite our work as:


@article{UNShoverfliesDataset2019,author = {Zorica Nedeljković and Jelena Ačanski and Marko Panić and Ante Vujić and Branko Brkljač},title = {University of Novi Sad (UNS), Hoverflies classification dataset},journal = {{IEEE} DataPort},year = {2019}} and/or any of the corresponding original publications:

## References:

[1] Branko Brkljač, Marko Panić, Dubravko Ćulibrk, Vladimir Crnojević, Jelena Ačanski, and Ante Vujić, “Automatic hoverfly species discrimination,” in Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, vol. 2, pp. 108–115, SciTePress, Vilamoura, 2012.

[2] Vladimir Crnojević, Marko Panić, Branko Brkljač, Dubravko Ćulibrk, Jelena Ačanski, and Ante Vujić, “Image Processing Method for Automatic Discrimination of Hoverfly Species,” Mathematical Problems in Engineering, vol. 2014, Article ID 986271, 12 pages, 2014.


** This dataset is published on IEEE DataPort repository under CC BY-NC-SA 4.0 license by the authors (for more information please visit:


We introduce a new robotic RGBD dataset with difficult luminosity conditions: ONERA.ROOM. It comprises RGB-D data (as pairs of images) and corresponding annotations in PASCAL VOC format (xml files)

It aims at People detection, in (mostly) indoor and outdoor environments. People in the field of view can be standing, but also lying on the ground as after a fall.


To facilitate use of some deep learning softwares, a folder tree with relative symbolic link (thus avoiding extra space) will gather all the sequences in three folders : | |— image |        | — sequenceName0_imageNumber_timestamp0.jpg |        | — sequenceName0_imageNumber_timestamp1.jpg |        | — sequenceName0_imageNumber_timestamp2.jpg |        | — sequenceName0_imageNumber_timestamp3.jpg |        | — … | |— depth_8bits |        | — sequenceName0_imageNumber_timestamp0.png |        | — sequenceName0_imageNumber_timestamp1.png |        | — sequenceName0_imageNumber_timestamp2.png |        | — sequenceName0_imageNumber_timestamp3.png |        | — … | |— annotations |        | — sequenceName0_imageNumber_timestamp0.xml |        | — sequenceName0_imageNumber_timestamp1.xml |        | — sequenceName0_imageNumber_timestamp2.xml |        | — sequenceName0_imageNumber_timestamp3.xml |        | — … |


The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits.


#Basic Intructions for usage

Make sure you have the following folder structure in the data directory after you unzip the file:


├── splits

├── test_once

│   ├── test1_labels.npy

│   ├── test1_seismic.npy

│   ├── test2_labels.npy

│   └── test2_seismic.npy

└── train

    ├── train_labels.npy

    └── train_seismic.npy

The train and test data are in NumPy .npy format ideally suited for Python. You can open these file in Python as such: 

import numpy as np

train_seismic = np.load('data/train/train_seismic.npy')

Make sure the testing data is only used once after all models are trained. Using the test set multiple times makes it a validation set.

We also provide fault planes, and the raw horizons that were used to generate the data volumes in addition to the processed data volumes before splitting to training and testing.

# References:

1- Netherlands Offshore F3 block. [Online]. Available: OffshoreF3BlockComplete4GB

2- Alaudah, Yazeed, et al. "A machine learning benchmark for facies classification." Interpretation 7.3 (2019): 1-51.



This dataset was developed at the School of Electrical and Computer Engineering (ECE) at the Georgia Institute of Technology as part of the ongoing activities at the Center for Energy and Geo-Processing (CeGP) at Georgia Tech and KFUPM. LANDMASS stands for “LArge North-Sea Dataset of Migrated Aggregated Seismic Structures”. This dataset was extracted from the North Sea F3 block under the Creative Commons license (CC BY-SA 3.0).


The LANDMASS database includes two different datasets. The first, denoted LANDMASS-1, contains 17667 small “patches” of size 99x99 pixels. it includes 9385 Horizon patches, 5140 chaotic patches, 1251 Fault patches, and 1891 Salt Dome patches. The images in this database have values in the range [-1,1]. The second dataset, denoted LANDMASS-2, contains 4000 images. Each image is of size 150x300 pixels and normalized to values in the range [0,1]. Each one of the four classes has 1000 images. Sample images from each database for each class can be found under the /samples file.


We present a dataset of human visual attention on 2D images during scene free viewing. This dataset includes 1900 images, which are corrputed by various image transformations. This dataset is manually annotated with human eye-movement data recorded by Tobii X120 eye-tracker. This dataset provides a new benchmark to measure the robustness of saliency prediction models on various transformed scenes.


The is a dataset for indoor depth estimation that contains 1803 synchronized image triples (left, right color image and depth map), from 6 different scenes, including a library, some bookshelves, a conference room, a cafe, a study area, and a hallway. Among these images, 1740 high-quality ones are marked as high-quality imagery. The left view and the depth map are aligned and synchronized and can be used to evaluate monocular depth estimation models. Standard training/testing splits are provided.


Please refer to the README file for detailed instructions.

Dataset usage must comply with the LICENSE provided.


PRECIS HAR represents a RGB-D dataset for human activity recognition, captured with the 3D camera Orbbec Astra Pro. It consists of 16 different activities (stand up, sit down, sit still, read, write, cheer up, walk, throw paper, drink from a bottle, drink from a mug, move hands in front of the body, move hands close to the body, raise one hand up, raise one leg up, fall from bed, and faint), performed by 50 subjects.


The dataset consists of RGB data (.mp4 files) and depth data (.oni files). We provide both cropped and raw versions. The cropped videos are shorter, containing only the seconds of interest, i.e. where the activity is performed. The raw videos are longer, containing all the video that we captured while filming the dataset. We included both variants, because they can all be useful for different applications.

Video names follow the pattern <subject_id>_<activity_id>.<extension>, where:

  • <subject_id> is an integer between 1 and 50;

  • <activity_id> is an integer between 1 and 16, with the following mapping: 1 = stand up, 2 = sit down, 3 = sit still, 4 = read, 5 = write, 6 = cheer up, 7 = walk, 8 = throw paper, 9 = drink from a bottle, 10 = drink from a mug, 11 = move hands in front of the body, 12 = move hands close to the body, 13 = raise one hand up, 14 = raise one leg up, 15 = fall from bed, 16 = faint;

  • <extension> is .mp4 or .oni, depending on the type of data (RGB or depth).

 In order to manipulate .oni files, we recommend using pyoni.


The dataset consists of 60285 character image files which has been randomly divided into 54239 (90%) images as training set 6046 (10%) images as test set. The collection of data samples was carried out in two phases. The first phase consists of distributing a tabular form and asking people to write the characters five times each. Filled-in forms were collected from around 200 different individuals in the age group 12-23 years. The second phase was the collection of handwritten sheets such as answer sheets and classroom notes from students in the same age group.