This dataset includes UWB range measurements performed with Pozyx devices. The measurements were collected between two tags placed at several distances and in two different conditions: with Line of Sight (LOS) and Non-Line of Sight (NLOS). The measurements include the range estimated by the Pozyx tag, the actual distance between devices, the timestamp of each measurement and the values corresponding to the samples of the Channel Impulse Response (CIR) after each transmission.


Dataset contains two zip files. One contains the raw rosbag records and the second one includes two matlab files (one for the LOS scenario, other for the NLOS) that include the final data once the actual distance is added and the CIR measurements are processed.

Rosbag files contain messages of type PozyxRangingWithCir. This type of message can be found in the next repository:

Each of the matlab files contains an array of structs. Each struct has these fields:

  • range: Pozyx estimation of distance.
  • distance: Current distance between devices.
  • rss: Estimation of received power.
  • seq: A ranging sequence number, between 0 and 255.
  • timestamp: Timestamp of the measure. 
  • cirPower: CIR power calculated using this formula: 10*log10(abs(cirRealPart.^2)).
  • cir: Samples of the CIR. 1016 complex values.

This data was used for the paper: J. Wang, P. Aubry, and A. Yarovoy, " Three-Dimensional Short-Range Imaging With Irregular MIMO Arrays Using NUFFT-Based Range Migration Algorithm", IEEE Transactions on Geoscience and Remote Sensing. It includes two synthetic electromagnetic datasets and one experimental measured data with multiple-input-multiple-output (MIMO) arrays.


The detailed instructions about the dataset can be found in the readme file.


Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists.  This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales.


When using this dataset, please use the following citation:

author = {Roberts,Timothy and Paliwal,Kuldip K. },
title = {A time-scale modification dataset with subjective quality labels},
journal = {The Journal of the Acoustical Society of America},
volume = {148},
number = {1},
pages = {201-210},
year = {2020},
doi = {10.1121/10.0001567},
URL = {},
eprint = {}


Audio files are named using the following structure: SourceName_TSMmethod_TSMratio_per.wav and split into multiple zip files.For 'TSMmethod', PV is the Phase Vocoder algorithm, PV_IPL is the Identity Phase Locking Phase Vocoder algorithm, WSOLA is the Waveform Similarity Overlap-Add algorithm, FESOLA is the Fuzzy Epoch Synchronous Overlap-Add algorithm, HPTSM is the Harmonic-Percussive Separation Time-Scale Modification algorithm and uTVS is the Mel-Scale Sub-Band Modelling Filterbank algorithm. Elastique is the z-Plane Elastique algorithm, NMF is the Non-Negative Matrix Factorization algorithm and FuzzyPV is the Phase Vocoder algorithm using Fuzzy Classification of Spectral Bins.TSM ratios range from 33% to 192% for training files, 20% to 200% for testing files and 22% to 220% for evaluation files.

  • Train: Contains 5280 processed files for training neural networks
  • Test: Contains 240 processed files for testing neural networks
  • Ref_Train: Contains the 88 reference files for the processed training files
  • Ref_Test: Contains the 20 reference files for the processed testing files
  • Eval: Contains 6000 processed files for evaluating TSM methods.  The 20 reference test files were processed at 20 time-scales using the following methods:
    • Phase Vocoder (PV)
    • Identity Phase-Locking Phase Vocoder (IPL)
    • Scaled Phase-Locking Phase Vocoder (SPL)
    • Phavorit IPL and SPL
    • Phase Vocoder with Fuzzy Classification of Spectral Bins (FuzzyPV)
    • Waveform Similarity Overlap-Add (WSOLA)
    • Epoch Synchronous Overlap-Add (ESOLA)
    • Fuzzy Epoch Synchronous Overlap-Add (FESOLA)
    • Driedger's Identity Phase-Locking Phase Vocoder (DrIPL)
    • Harmonic Percussive Separation Time-Scale Modification (HPTSM)
    • uTVS used in Subjective testing (uTVS_Subj)
    • updated uTVS (uTVS)
    • Non-Negative Matrix Factorization Time-Scale Modification (NMFTSM)
    • Elastique.


TSM_MOS_Scores.mat is a version 7 MATLAB save file and contains a struct called data that has the following fields:

  • test_loc: Legacy folder location of the test file.
  • test_name: Name of the test file.
  • ref_loc: Legacy folder location of reference file.
  • ref_name: Name of the reference file.
  • method: The method used for processing the file.
  • TSM: The time-scale ratio (in percent) used for processing the file. 100(%) is unity processing. 50(%) is half speed, 200(%) is double speed.
  • MeanOS: Normalized Mean Opinion Score.
  • MedianOS: Normalized Median Opinion Score.
  • std: Standard Deviation of MeanOS.
  • MeanOS_RAW: Mean Opinion Score before normalization.
  • MedianOS_RAW: Median Opinion Scores before normalization.
  • std_RAW: Standard Deviation of MeanOS before normalization.


TSM_MOS_Scores.csv is a csv containing the same fields as columns.

Source Code and method implementations are available at

Please Note: Labels for the files will be uploaded after paper publication.


The target scene consists of a black card with six cocoa beans of three different fermentation levels (High, correct, and low fermentation), two beans for each class, whose false-color composite is shown in the provided Figure (a), ground-truth map is shown in Fig. (b), and Fig. (c) presents its representative spectral signatures. The spectral image was acquired by the AVT Stingray F-080B camera by acquiring one band each time from  350 - 950 nm. The acquired image has a spatial resolution of 1096x712 pixels and 300 spectral bands of 2 nm width.



The bearing dataset  is acquired by the electrical engineering laboratory of Case Western Reserve University and published on the Bearing Data Center Website. The gearbox dataset  is from IEEE PHM Challenge Competition in 2009


Wide varieties of scripts are used in writing languages throughout the world. In a multiscript and multi-language environment, it is necessary to know the different scripts used in every part of a document to apply the appropriate document analysis algorithm. Consequently, several approaches for automatic script identification have been proposed in the literature, and can be broadly classified under two categories of techniques: those that are structure and visual appearance-based and those that are deep learning-based.



The database consists of printed and handwritten documents. We realized that the documents from each script contain some sort of watermark owing to the fact that each script’s documents came from a different original native location. Therefore, the sheets and some layouts were different, depending on their origins. This poses a risk of the document watermark, rather than the script, being recognized, which could be the case with a deep learning-based classifier.

Segmenting text from the backgrounds of some documents was challenging. Even with state-of-the art segmentation techniques used, the result was not satisfactory, and included a lot of salt and pepper noise or black patches, or was missing some parts of the text.

To avoid these drawbacks and provide a dataset for script recognition, all the documents were preprocessed and converted to a white background, while the foreground text ink was equalized. Furthermore, all documents were manually revised. Both original and processed documents are included in the database.

To allow for script recognition at different levels (i.e., document, line and word), each document was divided into lines and each line into words. In the division, a line is defined as an image with 2 or more words, and a word is defined as an image with 2 or more characters.


The printed part of the database was recorded from a wide range of local newspapers and magazines to ensure that the samples would be as realistic as possible. The newspaper samples were collected mainly from India (as a wide verity of scripts are used there), Thailand, Japan, the United Arab Emirates and Europe. The database includes 13 different scripts: Arabic, Bengali, Gujarati, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu and Thai.

The newspapers were scanned at a 300 dpi resolution. Paragraphs with only one script were selected for the database (paragraph here means the headline and body text). Thus, different text sizes, fonts, and styles are included in the database. Further, we tried to ensure that all the text lines were not skewed horizontally. All images were saved in png format, and using the script_xxx.png naming convention, with script being an abbreviation or memo for each script, and xxx, the file number starting at 001 for each script.


Similar to the printed part in the handwritten database, we also included 13 different scripts: Persian as Arabic, Bengali, Gujarati, Punjabi, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu and Thai.

Most of the documents were provided by native volunteers capable of writing documents in their respective scripts. Each volunteer wrote a document, scanned it at 300 dpi, and then sent it to us by email. Consequently, the documents had large ink, sheet and scanner quality variations. Some of the Roman sheets came from the IAM handwritten database.


Due to the broad quality range of the documents, a two-step preprocessing was performed. In the first step, images are binarized by transforming the background into white, while in the second step, an ink equalization is performed.

Because the background texture, noise and illumination condition are primary factors degrading document image binarization performance, we used an iterative refinement framework in this paper to support robust binarization, In the process, the input image is initially transformed into a Bhattacharyya similarity matrix with a Gaussian kernel, which is subsequently converted into a binary image using a maximum entropy classifier. Then, the run-length histogram estimates the character stroke width. After noise elimination, the output image is used for the next round of refinement, and the process terminates when the estimated stroke width is stable. However, some documents are not correctly binarized, and in such cases, a manual binarization is performed using local thresholds. All the documents were revised and some noise was removed manually.

For ink equalization, we used an ink deposition model.  All the black pixels on the binarized images were considered as ink spots and correlated with a Gaussian of width 0.2 mm.  Finally, the image was equalized to duplicate fluid ink.


For the lines from a document to be segmented, they must be horizontal, otherwise a skew correction algorithm must be used ADDIN CSL_CITATION
of Pattern Recognition and Computer
SCIENTIFIC","title":"Texture Analysis with Local Binary

For the line segmentation, each connected object/component of the image is detected, and its convex hull obtained. The result is dilated horizontally in order to connect the objects belonging to the same line  and each connected object is labeled. The next step is a line-by-line extraction, performed as follows:

1.     Select the top object of the dilated lines and determine its horizontal histogram.

2.     If its histogram has a single maximum, then it should be a single line, and the object is used as a mask to segment the line (see Figure 4).

3.     If the object has several peaks, we assume that there are several lines. To separate them, we follow the next steps:

a.     The object is horizontally eroded until the top object contains a single peak.

b.     The new top object is dilated to recover the original shape and is used as a mask to segment the top line.

4.     The top line is deleted, and the process is repeated from step 1 to the end.


The segmentation results were manually reviewed, and lines that had been wrongly segmented were manually repaired. The lines were saved as image files and named using the script_xxx_yyy.png format, where yyy is the line number, xxx isthe document number and script is the abbreviation for the script, as previously mentioned. Figure 3 presents an example of a segmented line for handwriting. These images are saved in grayscale format.


The words were segmented from the lines in two steps, with the first step being completely automatic. Each line was converted to a black and white component, a vertical histogram was obtained, and points where the value of the histogram was found to be zero were identified as the gaps or the intersection. Gaps wider than one-third of the line height were labeled as word separations.

In the second step, failed word segmentations were manually corrected. Each word was saved individually as a black and white image. The files were named using the script_xxx_yyy_zzz.png format, with zzz being the word number of the line script_xxx_yyy. For instance, a file named roma_004_012_004.png contains the black and white image of the fourth word on the 12th line of the 4th document in Roman script.

In Thai and Japanese, word segmentation is done heuristically because their lines consist of two or three long sequences of characters separated by a greater space. This is because in these scripts, there is generally no gap between two words, and contextual meaning is generally used to decide which characters comprise a word. Since we do not use contextual meaning in the present database, we used the following approach for pseudo-segmentation of Thai and Japanese scripts: for each sequence of characters, the first two characters are the first pseudo-word; the third to the fifth characters are the second pseudo-word; the sixth to the ninth character are the third pseudo-word, and so on, up to the end of the sequence.


It should be noted that in this work, our intention is not to develop a new line/word segmentation system. We only use this simple procedure to segment lines and words in a bid to build our database. We thus use a semi-automatic approach, with human verification and correction in case of erroneous segmentation.




This dataset was developed at the School of Electrical and Computer Engineering (ECE) at the Georgia Institute of Technology as part of the ongoing activities at the Center for Energy and Geo-Processing (CeGP) at Georgia Tech and KFUPM. LANDMASS stands for “LArge North-Sea Dataset of Migrated Aggregated Seismic Structures”. This dataset was extracted from the North Sea F3 block under the Creative Commons license (CC BY-SA 3.0).


The LANDMASS database includes two different datasets. The first, denoted LANDMASS-1, contains 17667 small “patches” of size 99x99 pixels. it includes 9385 Horizon patches, 5140 chaotic patches, 1251 Fault patches, and 1891 Salt Dome patches. The images in this database have values in the range [-1,1]. The second dataset, denoted LANDMASS-2, contains 4000 images. Each image is of size 150x300 pixels and normalized to values in the range [0,1]. Each one of the four classes has 1000 images. Sample images from each database for each class can be found under the /samples file.


Cluster analysis, which focuses on the grouping and categorization of similar elements, is widely used in various fields of research. Inspired by the phenomenon of atomic fission, this paper proposes  a novel density-based clustering algorithm, called fission clustering (FC). It focuses on mining the dense families of clusters in the dataset and utilizes the information of the distance matrix to fissure the dataset into subsets.


Modern technologies have made the capture and sharing of digital video commonplace; the combination of modern smartphones, cloud storage, and social media platforms have enabled video to become a primary source of information for many people and institutions. As a result, it is important to be able to verify the authenticity and source of this information, including identifying the source camera model that captured it. While a variety of forensic techniques have been developed for digital images, less research has been conducted towards the forensic analysis of videos.


This dataset provides digital images and videos of surface ice conditions were collected from two Alberta rivers - North Saskatchewan River and Peace River - in the 2016-2017 winter seasons.

Images from North Saskatchewan River were collected using both Reconyx PC800 Hyperfire Professional game cameras mounted on two bridges in Edmonton as well as a Blade Chroma UAV equipped with a CGO3 4K camera at the Genesee boat launch.

Data for the Peace River was collected using only the UAV at the Dunvegan Bridge boat launch and Shaftesbury Ferry crossing.


Python code and instructions for using the dataset are available in this repository: