Predicting Binaural Speech Intelligibility

A Binaural Speech Intelligibility Model (BSIM) based on the Speech Intelligibility Index (SII)

Introduction

One of the items that have been developed within the HearCom project is a binaural extension of the Speech Intelligibility Index (SII, ANSI, 1997), a standardized model of speech intelligibility prediction. The SII was so far only intended for single channel input and was not specifially applicable to typical "cocktail-party" situations (Cherry, 1953; Bronkhorst, 2000), which involve, among other factors, multiple sound source locations and room reflections. The binaural extension of the SII (Beutelmann, 2006) evaluates binaural speech and noise signals and predicts the speech intelligibility benefit for spatially separated speech and noise sources in anechoic conditions as well as in realistic rooms. Predictions can be made for both normally-hearing and hearing-impaired subjects, based on the audiogram.

BSIM Demonstrator

Two demo versions of this Binaural Speech Intelligibility Model (BSIM) are available for download. One of them is a standalone executable, and the other one can be started within Matlab®. Apart from that, both demonstrators are identical. Both demonstrators include a graphical user interface for the model back-end. The restrictions of the demonstrators compared to the full model are:

  • The model accepts input signals only at a sampling rate of 44.1 kHz.
  • Only the first 0.5 s of the input signals are used, if they are longer.
  • Only the provided example audiogram (in addition to “normally-hearing”) can be used.
  • Internal model settings and output parameters of the binaural stage are not accessible

The input signals need to be be provided as separate wave files for speech and noise. Although it is in principle possible to use an actual target speech signal, it is recommended to replace it with stationary noise having the same long-term spectrum and an identical binaural configuration. This avoids unwanted deviations of the result due to the relatively small sample of speech statistics within 0.5 seconds.

BSIM Principle

The binaural extension of the SII does, in principle, not change the SII method, but acts as a front-end which determines the additional signal-to-noise ratio (SNR) improvement due to better-ear listening and binaural interaction. It was developed on the basis of the work by vom Hövel (1984). The binaural speech and noise signals are divided into ERB-wide frequency bands with the help of an auditory (gammatone) filter bank (Hohmann, 2002). In each of the frequency bands, the maximally achievable SNR is computed using the Equalization-Cancellation (EC) principle (Durlach, 1963). The EC process aims at eliminating the noise signal due to destructive interference by subtracting one of the channels from the other, after equalizing a potential interaural time delay and level difference. In order to match the model performance to human data, the process contains artificial inaccuracies of the equalization operations. The audiogram is incorporated in form of a hypothetical internal noise, which sets an upper limit for the SNR in each frequency band. The SNRs in each frequency band are passed to the SII, from which the speech intelligibility or a speech reception threshold (SRT, the speech level or overall SNR at which 50% intelligibility is reached) can be calculated.


back

References

ANSI (1997). “Methods for the Calculation of the Speech Intelligibility Index,” American National Standard S3.5–1997, Standards Secretariat, Acoustical Society of America. www.sii.to
Beutelmann, R. and Brand, T. (2006). “Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners,” J. Acoust. Soc. Am. 120, 331–342.
Bronkhorst, A. W. (2000). “The Cocktail Party Phenomenon: A Review of Research on Speech Intelligibility in Multiple Talker Conditions,” Acust. Acta Acust. 86, 117–128.
Cherry, E. C. (1953). “Some experiments on the recognition of speech, with one and with two ears,” J. Acoust. Soc. Am. 25, 975–979.
Durlach, N. I. (1963). “Equalization and Cancellation Theory of Binaural Masking-Level Differences,” J. Acoust. Soc. Am. 35, 1206–1218.
Hohmann, V. (2002). “Frequency analysis and synthesis using a Gammatone filterbank,” Acust. Acta Acust. 88, 433–442.
vom Hövel, H. (1984). “Zur Bedeutung der Übertragungseigenschaften des Außenohrs sowie des binauralen Hörsystems bei gestörter Sprachübertragung,” Dissertation, RTWH Aachen.

back