Predicting Binaural Speech Intelligibility

Binaural model based on the Speech Transmission Index (STI)

Introduction

Over the past decades, the STI model has gradually evolved from a very simple procedure suitable for a limited set of applications, to a widely applicable model that is representative for most practical situations in which speech communication occurs (Steeneken and Houtgast, 1980; IEC, 2003).

Originally, the STI was designed to predict intelligibility in diotic listening conditions (i.e., same signal at left and right ear) based on measurements with a single microphone. Within HearCom, we developed an extension of the STI model to a binaural intelligibility prediction model by adding algorithms that simulate binaural interaction. This was done in such a way that the relation between STI and subjective intelligibility remains unchanged for existing diotic situations, hence not losing touch with engineers and consultants already using the STI. This comes down to a model extension that allows STI measurements in the same way as currently standardised, but with two microphones (or an artificial head) instead of one.

Bin-STI Principle and validation

The STI method assumes that the intelligibility of a transmitted speech signal is related to the preservation of the original spectrotemporal differences (modulations) between the speech sounds. These differences may among other things be reduced by bandpass limiting, masking noise, and reverberation. In the STI model the reduction of differences is quantified by looking at the modulation transfer in a number of frequency (octave) bands.

In developing a binaural version of the STI, we aimed at improving prediction in cases where sources of speech and interference (noise, reverberation) are separated spatially. A binaural version of the STI (Bin-STI) was developed based on interaural cross correlograms (Jeffress, 1948), where signals in left and right ear are measured and divided in into octave bands. The interaural cross correlation in each band is calculated, yielding a number of internal (time-delayed) spectral representations. These representations are processed as if corresponding to a single-channel STI measurement. By selecting the representation with the maximum modulation transfer per octave band, an overall binaural STI can be computed.

We validated the new model for a range of dichotic listening conditions (i.e., different signals at left and right ear), featuring anechoic, classroom, listening room, and strongly echoic environments (cathedral). Comparison of subjective speech intelligibility measurement (CVC word scores) with predicted scores showed good correspondence for these binaural conditions, much better than with the standard STI. The outcome for monaural conditions is identical to the standard STI.
 

References

IEC (2003). IEC 60268-16 (3rd edition). “Sound system equipment. Part 16: objective rating of speech intelligibility by speech transmission index” (International Electrotechnical Commission, Geneva, Switzerland).
Jeffress, L.A. (1948). “A place theory of sound localization,” J. Comp. Physiol. Psych. 41, 35–39.
Steeneken, H.J.M. and Houtgast, T. (1980). “A physical method for measuring speech transmission quality,” J. Acoust. Soc. Am. 67, 318–326.
Wijngaarden, S.J. van and Drullman, R. (2008). “Binaural intelligibility prediction base don the speech transmission index,” J. Acoust. Soc. Am. 123, 4514-4523.