Dental Digital Scribe Dataset

Citation Author(s):: Fabian Villena
Submitted by:: Jocelyn Dunstan
Last updated:: Thu, 06/20/2024 - 02:53
DOI:: 10.21227/r3tf-xz68

130 views

Categories:

Artificial Intelligence

Keywords:

Clinical Notes

Audio

ACCESS DATASET CITE

Abstract

The large and diverse access to data sources in healthcare has boosted the application of novel computer techniques that can extract meaningful information to improve patients' prognoses and other important medical uses. However, current systems require the professional to manually type the information, which increases the risk of transcription errors and cross-contamination. We propose an automated system that allows healthcare professionals to dictate clinical information that can be transcribed and analyzed. Since most existing systems have been developed for the English language, we propose a unified system to automatically record, transcribe, and identify key information content in Spanish audio. This system consists of two stages: a commercial Speech-to-Text API and an in-house Named Entity Recognition model trained on Spanish clinical narratives. To understand the capacity of our system, we performed a detailed error analysis from a linguistic and computational point of view using word error rates (WER) and $F_1$ scores metrics. Our results show a mean WER
of 10.44\%, 9.98\%, and 9.06\% for dental, medical, and general domains, respectively. The mean $F_1$ score for automatic entity recognition is 0.86 for texts in the medical domain and 0.80 for the dental domain. In the transcription, typical errors are words changed from plural to singular or vice versa, and many verbs changed pronoun or tense verbs.