ASR Client-Server for telephone conversations

The ability to send and receive phone calls is one of the most important accessibility scenarios for people that have hearing problems. The HearCom Project researches a new assistive solution for this scenario based on the application of speech to text conversion using Automatic Speech Recognition (ASR). By that the hearing will be assisted by textual information that comes in addition to the speech or even may replace.  

This solution may be feasible in the future due to the high processing and communication features of mobile communication devices (Personal Communication System (PCS) based on mobile phone or  PDA). In addition ASR performances are increasing by using high speed processing capabilities which can be accessed by remote server operation allowing the use of near real-time speech conversion for normal conversations.

Project Scenario:

The hearing-impaired telephone receives a phone call using his PCS which call is transferred in parallel also to a powerful remote server. Communication is through a wireless networks that support Broadband Internet and Mobile telephony. That remote server will analyze the audio flow being received, convert it into text and present it to the user in real time at his PCS device (client) by means of a rolling text display.
ASR client/Server Architecture
 

 Speech to text

Text corresponding to the other partner voice is presented in a rolling display form. Text appearing in the screen will be delayed with respect to the voice and is generated at the end of separated sentences or utterances. The server will also provide control information for structuring the conversation in short sentences and handling the delay of the speech conversion.

Current commercial state of the art ASR obtains up to a 95% quality (5% errors) for a good quality noise free audio if the system has been trained with the caller. This could prove to be useful for a number of scenarios.It is possible to improve further the quality of the system if the vocabulary is limited for example for a medical appointment scenario. 

Display for ASR systemPCS Display

The PCS Display presents the content of the call partner voice utterances in a format where older text is fading on the top (rolling text). Caller identification will help use previous experience of the system with this partner Voice Profile.

 

 

Main development activities in HearCom:

The HearCom Project focuses its activities on a few critical aspects for demonstration and evaluation purposes.
 
  • Application Control and Text presentation for the user is being researched in the Human Machine Interface running at the PCS and the Server.
  • Voice Communication via the PCS optimized for the hearing-impaired profile (developed by a separate HearCom Team).
  • Data and control communication is IP based and is taking place between the different elements configured as a client server configuration .
  • Integration and evaluation of own ASR system (partner ILSP) and one commercial Audio Platform (i.e. Loquendo Voxnauta).
 
The Project is aiming at producing a demonstration and research platform that will facilitate further development by using not only the most advanced available technology but adapting it to the needs of the hearing –impaired world.
 
Various stages of this platform are envisaged starting with the simple one way communication platform, where very little feedback to the caller has been implemented in the voice communication loop, and progressing towards a system that thanks to the interaction of the caller with the machine will overcome the current limitations of the ASR performance in noisy natural voice environments.