Most of Facial Expression Recognition (FER) systems rely on machine learning approaches that require large databases (DBs) for an effective training. As these are not easily available, a good solution is to augment the DBs with appropriate techniques, which are typically based on either geometric transformation or deep learning based technologies (e.g., Generative Adversarial Networks (GANs)). Whereas the first category of techniques have been fairly adopted in the past, studies that use GAN-based techniques are limited for FER systems.


The Magnetic Resonance – Computed Tomography (MR-CT) Jordan University Hospital (JUH) dataset has been collected after receiving Institutional Review Board (IRB) approval of the hospital and consent forms have been obtained from all patients. All procedures followed are consistent with the ethics of handling patients’ data.


Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.


This is the supporting content for my ICASSP 2020 paper.

Paper number: 5581.