Skip to main content

Datasets

Standard Dataset

Thai Deaf Corpus

Citation Author(s):
Supachan Traitruengsakul (Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University)
Ekapol Chuangsuwanich (Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University)
Submitted by:
Supachan Traitruengsakul
Last updated:
DOI:
10.21227/53w2-1k42
Data Format:
No Ratings Yet

Abstract

The Thai Deaf Corpus (TDC) is constructed from a writing activity where deaf students randomly select picture words using the image picker wheel, then write sentences corresponding to these words on the writing sheet. The sentences are transcribed and corrected manually to create the TDC.

  • It contains 22,719 sentences written by deaf students, with their corresponding corrections separated by "|||".
  • For example, sentence x may have one or more possible corrections such as sentence y1, sentence y2, and so on. We may get sentence pairs: sentence x ||| sentence y1, sentence x ||| sentence y2, ...

Instructions:

In TDC, each line contains (1) an original sentence written by deaf students, (2) "|||" separation, and (3) its correction by native speakers.

Funding Agency
Foundation for the Deaf under the Royal Patronage of Her Majesty the Queen