Reverse transcription-polymerase chain reaction (RT-PCR) is currently the gold standard in COVID-19 diagnosis. It can, however, take days to provide the diagnosis, and false negative rate is relatively high. Imaging, in particular chest computed tomography (CT), can assist with diagnosis and assessment of this disease. Nevertheless, it is shown that standard dose CT scan gives significant radiation burden to patients, especially those in need of multiple scans.

Instructions: 

 

“Dataset-S1” contains two folders for COVID-19 and Normal DICOM images, named as “COVID-S1” and “Normal-S1”, respectively. Within the same folder, three CSV files are available. The first one, named as “Radiologist-S1.csv”, contains labels assigned to the corresponding cases by three experienced radiologists. The second CSV file, “Clinical-S1.csv”, includes the clinical information as well as the result of the RT-PCR test, if available. The third file is named “LDCT-SL-Labels-S1.csv” and contains the slice-level labels related to COVID-19 cases. In other words, slices demonstrating infection are specified in this file.

Each row in this CSV file corresponds to a specific case, and each column represents the slice number in the volumetric CT scan. Label 1 indicates a slice with the evidence of infection, while 0 is assigned to slices with no evidence of infection.

Note that slices in each case should be sorted based on the “Slice-Location” value to match with the provided labels in the CSV file. The Slice Location values are stored in DICOM files and accessible from the following DICOM tag: (0020,1041) – DS – Slice Location

 “Dataset-S2” contains 100 COVID-19 positive cases, confirmed with RT-PCR test. 68 cases have related imaging findings, whereas 32 do not reveal signs of infection. These two groups are placed in two folders of “PCP-Lung-Positive “and “PCP-Lung-Negative”. “Dataset-S2” also includes a CSV file, namely “Clinical-S2.csv” presenting the clinical information.

 

Categories:
751 Views

The AOLAH databases are contributions from Aswan faculty of engineering to help researchers in the field of online handwriting recognition to build a powerful system to recognize Arabic handwritten script. AOLAH stands for Aswan On-Line Arabic Handwritten where “Aswan” is the small beautiful city located at the south of Egypt, “On-Line” means that the databases are collected the same time as they are written, “Arabic” cause these databases are just collected for Arabic characters, and “Handwritten” written by the natural human hand.

Instructions: 

* There are two databases; first database is for Arabic characters, it consists of 2,520 sample files written by 90 writers using simulation of a stylus pen and a touch screen. The second database is for Arabic characters’ strokes, it consists of 1,530 sample files for 17 strokes. The second database is extracted from the previous accepted database by extracting strokes from characters.
* Writers are volunteers from Aswan faculty of engineering with ages from 18 to 20 years old.
* Natural writings with unrestricted writing styles.
* Each volunteer writes the 28 characters of Arabic script using the GUI.
* It can be used for Arabic online characters recognition.
* The developed tools for collecting the data is code acts as a simulation of a stylus pen and a touch screen, pre-processing data samples of characters are also available for researchers.
* The database is available free of charge (for academic and research purposes) to the researchers.
* The databases available here are the training databases.

Categories:
159 Views

The images containing honey bees were extracted from the video recorded in the Botanic Garden of the University of Ljubljana, where a beehive with a colony of the Carnolian Grey, the native Slovene species, is placed. We set the camera above the beehive entrance and recorded the honey bees on the shelf in front of the entrance and the honey bees entering and exiting the hive. With such a setup, we ensured a non-invasive recording of the honey bees in their natural environment. The dataset contains 65 images of size 2688 x 1504 pixels.

Categories:
87 Views

The dataset consists of two classes: COVID-19 cases and Healthy cases 

Instructions: 

Unzip the dataset

Categories:
818 Views

The images containing honey bees were extracted from the video recorded  in the Botanic Garden of the University of Ljubljana, where a beehive with a colony of the Carnolian Grey, the native Slovene species, is placed. We set the camera above the beehive entrance and recorded the honey bees on the shelf in front of the entrance and the honey bees entering and exiting the hive. With such a setup, we ensured a non-invasive recording of the honey bees in their natural environment. The dataset contains 65 images of size 2688 x 1504 pixels.

Categories:
26 Views

This dataset consists of 2579 image pairs (5158 images in total) of wood veneers before and after drying. The high-resolution .png images (generally over 4000x4000) have a white background. The data has been collected from a real plywood factory. Raute Corporation is acknowledged for making this dataset public. The manufacturing process is well visualized here: https://www.youtube.com/watch?v=tjkIYCEVXko.

Instructions: 

There are two folders: "Dry" and "Wet". The "Wet" folder contains wet veneer images and the "Dry" folder dry veneer images. The files are numbered so that e.g. Wet_10 is an image of the same veneer as Dry_10, but the veneer has been dried in between.

Categories:
160 Views

This dataset contains three benchmark datasets as part of the scholarly output of an ICDAR 2021 paper: 

Meng Ling, Jian Chen, Torsten Möller, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Robert S. Laramee, Han-Wei Shen, Jian Wu, and C. Lee Giles, Document Domain Randomization for Deep Learning Document Layout Extraction, 16th International Conference on Document Analysis and Recognition (ICDAR) 2021. September 5-10, Lausanne, Switzerland. 

This dataset contains nine class lables: abstract, algorithm, author, body text, caption, equation, figure, table, and title.

Instructions: 

Image files are in png formats and the metafiles are in plain text. 

Categories:
165 Views

Without publicly available dataset, specifically in handwritten document recognition (HDR), we cannot make a fair and/or reliable comparison between the methods. Considering HDR, Indic script’s document recognition is still in its early stage compared to others such as Roman and Arabic. In this paper, we present a page-level handwritten document image dataset (PHDIndic_11), of 11 official Indic scripts: Bangla, Devanagari, Roman, Urdu, Oriya, Gurumukhi, Gujarati, Tamil, Telugu, Malayalam and Kannada.

Instructions: 

See the attached pdf in documentation for more details about the dataset and benchmark results. Cite the following paper if you use the dataset for research purpose.

Obaidullah, S.M., Halder, C., Santosh, K.C. et al. PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimed Tools Appl 77, 1643–1678 (2018). https://doi.org/10.1007/s11042-017-4373-y

Categories:
282 Views

An offline handwritten signature dataset from two most popular scripts in India namely Roman and Devanagari is proposed here. 

Instructions: 

Writer identification dataset availability on Indic scripts is a major issue to carry forward research in this domain. Devanagari and Roman are two most popular and widely used scripts of India. We have a total of 5433 signatures of 126 writers, out of which 3929 signatures from 80 writers in Roman script and 1504 signatures from 46 writers in Devanagari scripts. Script-wise per writer 49 signatures from Roman and 32 signatures from Devanagari are considered making an average of 43 signatures per writer on whole dataset. We have reported a benchmark results on this dataset for writer identification task using a lightweight CNN architecture. Our proposed method is compared with state-of-the-art handcrafted feature based method such as gray level co-occurrence matrix (GLCM), Zernike moments, histogram of oriented gradients (HOG), local binary pattern (LBP), weber local descriptor (WLD), gabor wavelet transform (GWT) and it outperforms. In addition, few well known CNN arechitechture is also compared with the proposed method and it shows comparable performance. 

User guidance: The images are available in .jpg format with 24 bit color. The dataset is freely available for research work. Cite the following paper while using the dataset

Sk Md Obaidullah, Mridul Ghosh, Himadri Mukherjee, Kaushik Roy and Umapada Pal “Automatic Signature-based Writer Identification in Mixed-script Scenarios”, in 16th International Conference on Document Analysis and Recognition (ICDAR 2021), Lussane, Switzerland, 2021

Categories:
234 Views

The proposed dataset, termed PC-Urban (Urban Point Cloud), is captured with an Ouster LiDAR sensor with 64 channels. The sensor is installed on an SUV that drives through the downtown of Perth, Western Australia (WA), Australia. The dataset comprises over 4.3 billion points captured for 66K sensor frames. The labelled data is organized as registered and raw point cloud frames, where the former has a different number of registered consecutive frames. We provide 25 class labels in the dataset covering 23 million points and 5K instances.

Categories:
516 Views

Pages