One of the items that have been developed within the HearCom project is a binaural extension of the Speech Intelligibility Index (SII, ANSI, 1997), a standardized model of speech intelligibility prediction. The SII was so far only intended for single channel input and was not specifially applicable to typical "cocktail-party" situations (Cherry, 1953; Bronkhorst, 2000), which involve, among other factors, multiple sound source locations and room reflections. The binaural extension of the SII (Beutelmann, 2006) evaluates binaural speech and noise signals and predicts the speech intelligibility benefit for spatially separated speech and noise sources in anechoic conditions as well as in realistic rooms. Predictions can be made for both normally-hearing and hearing-impaired subjects, based on the audiogram.
Two demo versions of this Binaural Speech Intelligibility Model (BSIM) are available for download. One of them is a standalone executable, and the other one can be started within Matlab®. Apart from that, both demonstrators are identical. Both demonstrators include a graphical user interface for the model back-end. The restrictions of the demonstrators compared to the full model are:
The input signals need to be be provided as separate wave files for speech and noise. Although it is in principle possible to use an actual target speech signal, it is recommended to replace it with stationary noise having the same long-term spectrum and an identical binaural configuration. This avoids unwanted deviations of the result due to the relatively small sample of speech statistics within 0.5 seconds.
The binaural extension of the SII does, in principle, not change the SII method, but acts as a front-end which determines the additional signal-to-noise ratio (SNR) improvement due to better-ear listening and binaural interaction. It was developed on the basis of the work by vom Hövel (1984). The binaural speech and noise signals are divided into ERB-wide frequency bands with the help of an auditory (gammatone) filter bank (Hohmann, 2002). In each of the frequency bands, the maximally achievable SNR is computed using the Equalization-Cancellation (EC) principle (Durlach, 1963). The EC process aims at eliminating the noise signal due to destructive interference by subtracting one of the channels from the other, after equalizing a potential interaural time delay and level difference. In order to match the model performance to human data, the process contains artificial inaccuracies of the equalization operations. The audiogram is incorporated in form of a hypothetical internal noise, which sets an upper limit for the SNR in each frequency band. The SNRs in each frequency band are passed to the SII, from which the speech intelligibility or a speech reception threshold (SRT, the speech level or overall SNR at which 50% intelligibility is reached) can be calculated.