Text to Speech

With the progress made in speaker-adaptive TTS approaches, advanced approaches have shown a remarkable capacity to reproduce the speaker’s voice in the commonly used TTS datasets. However, mimicking voices characterized by substantial accents, such as non-native English speakers, is still challenging. Regrettably, the absence of a dedicated TTS dataset for speakers with substantial accents inhibits the research and evaluation of speaker-adaptive TTS models under such conditions. To address this gap, we developed a corpus of non-native speakers' English utterances.


This paper is a novel digital signal processing software of the advanced conversion of text-to-speech synthesis technology, which has been available as a range of hardware products for more than ten years, to software. It was initially created as a replacement for character cell terminals and telephony applications, but it is now also used to give people who are visually impaired access to information. With a digital formant synthesizer used to mimic the human vocal tract, text-to-speech quality is very high in both understandability and naturalness.