Visual presentation of automatically recognized speech to improve speech comprehension by hearing impaired listeners

The presentation of extra visual information is expected to support speech understanding by individuals that experience difficulties in understanding in daily listening situations. These listening difficulties can be caused by background noise and/or a hearing loss. Text generated by automatic speech recognition (ASR) can be used for additional visual information.
However today's ASR systems still have problems in correctly converting speech into text, due to large variations in pronouncation and due to vocabulary size. At present ASR systems may reach a best practical accuracy in the order of 80% correct for well spoken language and a medium sized vocabulary. By combining imperfect ASR textual output and partly intelligible speech these two modes may complement each other (bimodal) such that a normal conversation will become possible.

Implemented in an assistive listening device based on a PDA (Personal Digital Assistant) or an intelligent mobile phone, the ASR system (by a remote Internet service) will recognize speech and display its output (text) on the screen of mobile telephone or PDA. The extra displayed text will so support the understanding of speech by the listener (see Figure 1).

PDA with output from speech recognizer on display

Figure 1. Example of the assistive listening device developed in the HearCom project: a Personal Digital Assistant displays the automatically recognized speech in text form. Hearing impaired listeners can use this text to improve on speech comprehension.

At the moment it is not proven that visually displayed text from an ASR system, having several recognition errors, really will assist in speech understending. In particular, the presence of background noise and unclear pronunciation of the speech will increase the ASR system error rate. Additionally, the speech recognizer needs some time for processing, which leads to a delay in the text presentation.

The HearCom project therefore includes a study that investigates to what degree visual information that is incomplete and presented with some delay may contribute to speech comprehension. We will specifically focus on the influence of the amount of recognition errors by the ASR system and the delay in the displayed text on the benefit obtained from the text. Both young and old participants will be included in a study to examine whether the age of the listener influences his/her ability to use the text to support speech comprehension. Also, we will examine whether hearing impaired participants receive similar benefit from the text as compared to normal hearing participants. We will additionally measure the subjective listening effort to examine the effort required by the listeners to process additional and partly incorrect visual information.