SHIBR—The Swedish Historical Birth Records: a semi-annotated dataset

Citation Author(s):
Abbas
Cheddad
Abbas Cheddad, Associate Professor, Blekinge Institute of Technology, Sweden
Submitted by:
Abbas Cheddad
Last updated:
Tue, 11/22/2022 - 08:03
DOI:
10.21227/0dsh-8x30
Data Format:
Research Article Link:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This paper presents a digital image dataset of historical handwritten birth records stored in the archives of several parishes
across Sweden, together with the corresponding metadata that supports the evaluation of document analysis algorithms’
performance. The dataset is called SHIBR (the Swedish Historical Birth Records). The contribution of this paper is
twofold. First, we believe it is the first and the largest Swedish dataset of its kind provided as open access (15,000 high-
resolution colour images of the era between 1800 and 1840). We also perform some data mining of the dataset to uncover
some statistics and facts that might be of interest and use to genealogists. Second, we provide a comprehensive survey of
contemporary datasets in the field that are open to the public along with a compact review of word spotting techniques. The
word transcription file contains 17 columns of information pertaining to each image (e.g., child’s first name, birth date, date
of baptism, father’s first/last name, mother’s first/last name, death records, town, job title of the father/mother, etc.).
Moreover, we evaluate some deep learning models, pre-trained on two other renowned datasets, for word spotting in
SHIBR. However, our dataset proved challenging due to the unique handwriting style. Therefore, the dataset could also be
used for competitions dedicated to a large set of document analysis problems, including word spotting.

Funding Agency: 
STINT, the Swedish Foundation for International Cooperation in Research and Higher Education
Grant Number: 
AF2020-8892