The aim of the discussed study was to develop and evaluate a 3D virtual speaker that could use pre-recorded audio-material and possibly be used for audio-visual speech testing in the future. The authors developed the terminal-analogue method that allows to directly control facial parameters of the virtual speaker in order to obtain visual speech cues. Two experiments were performed, speech reading and audio-visual speech intelligibility with five trained adult speech readers as subjects, and 36 adults with normal hearing as subjects respectively. The speech reading experiments were performed in complete silence, while in the second experiment the audible sentences were presented in the presence of speech weighted noise or multi-talker babble noise. Lists of Flemish/Dutch sentences and words were used for both experiments. Two virtual speakers (female and male) and a video-recorded speech therapist presented inaudible speech in the first experiment, and a female virtual speaker presented speech in the second experiment. The results of the speech reading experiment showed significantly better speech reading scores when speech was presented by the video-recorded speaker than for speech presented by virtual speakers which is in accordance with previous research. In general, the row speech reading scores obtained in this study for virtual speakers were higher than in other studies. However, as authors noted, the different methods used may bias the comparison. The second experiment’s results showed reduction in SRT when speech was presented by the virtual speaker in comparison to when speech was presented without visual cues. Although, there is still room for improvement of results, especially in reference to speech intelligibility, the software used in this study allowed using the full body virtual humans and 3D scenarios which is innovative and may be developed in order to assess audio-visual processing.

Speech intelligibility of virtual humans.
Devesse A, Dudek A, van Wieringen A, Wouters J.
INTERNATIONAL JOURNAL OF AUDIOLOGY
2018;57(12):908-16.
Share This
CONTRIBUTOR
Joanna Lemanska

De Montfort University, Leicester, UK.

View Full Profile