Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco

Document layout analysis (DLA) plays an important role for identifying and classifying the different regions of digital documents in the context of Document Understanding tasks. In light of this, SciBank seeks to provide a considerable amount  of data from text (abstract, text blocks, caption, keywords, reference, section, subsection, title), tables, figures and equations (isolated equations and inline equations) of 74435 scientific articles pages. Human curators validated that these 12 regions were properly labeled.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco, "SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis", IEEE Dataport, 2022. [Online]. Available: http://dx.doi.org/https://doi.org/10.1109/ACCESS.2021.3125913. Accessed: Dec. 14, 2024.
@data{-22,
doi = {https://doi.org/10.1109/ACCESS.2021.3125913},
url = {http://dx.doi.org/https://doi.org/10.1109/ACCESS.2021.3125913},
author = {Felipe Grijalva; Carla Parra; Marco Gallardo; Erick Santos; Byron Acuña; Juan Carlos Rodríguez; Julio Larco },
publisher = {IEEE Dataport},
title = {SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis},
year = {2022} }
TY - DATA
T1 - SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis
AU - Felipe Grijalva; Carla Parra; Marco Gallardo; Erick Santos; Byron Acuña; Juan Carlos Rodríguez; Julio Larco
PY - 2022
PB - IEEE Dataport
UR - https://doi.org/10.1109/ACCESS.2021.3125913
ER -
Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco. (2022). SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis. IEEE Dataport. http://dx.doi.org/https://doi.org/10.1109/ACCESS.2021.3125913
Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco, 2022. SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis. Available at: http://dx.doi.org/https://doi.org/10.1109/ACCESS.2021.3125913.
Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco. (2022). "SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis." Web.
1. Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco. SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis [Internet]. IEEE Dataport; 2022. Available from : http://dx.doi.org/https://doi.org/10.1109/ACCESS.2021.3125913
Felipe Grijalva, Carla Parra, Marco Gallardo, Erick Santos, Byron Acuña, Juan Carlos Rodríguez, Julio Larco. "SciBank: A Large Dataset of Annotated Scientific Paper Regions for Document Layout Analysis." doi: https://doi.org/10.1109/ACCESS.2021.3125913