This dataset contains three benchmark datasets as part of the scholarly output of an ICDAR 2021 paper:
Meng Ling, Jian Chen, Torsten Möller, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Robert S. Laramee, Han-Wei Shen, Jian Wu, and C. Lee Giles, Document Domain Randomization for Deep Learning Document Layout Extraction, 16th International Conference on Document Analysis and Recognition (ICDAR) 2021. September 5-10, Lausanne, Switzerland.
This dataset contains nine class lables: abstract, algorithm, author, body text, caption, equation, figure, table, and title.
Image files are in png formats and the metafiles are in plain text.
A collection of about 30K images that represents figures and tables from each track of the IEEE Visualization conference series (Vis, SciVis, InfoVis, VAST).
These files are in PNG format. Due to upload size limit, these files are divided into five zip files organized by year.
The full collection in one-file is about 21.2G and can also be found online at http://www.cse.osu.edu/~chen.8028/VIS30K/VIS30K.tar.gz.