Skip to main content


Challenge entrants are supplied with a fully functioning baseline system.

Figure 2 shows a detailed schematic of the baseline system:

Figure 2, Detailed schematic of the baseline system.


  • Green boxes represent audio signals.
  • Blue boxes represent operations applied to the audio signals.
  • Blue database box is the anechoic HRTF dataset (audio signals).
  • Red database box is the listener characteristics dataset (metadata information).
  • Yellow database box is the gains dataset (metadata information).
  • White 'Weight and sum' box is the downmix operation.
  • Solid lines are signals transferred from one state of process to the next.
  • Dashed lines are metadata information.

The Pre-Process blocks

The system starts by applying HRTFs to the music of MUSDB18-HQ dataset, simulating the music as it is picked up by the hearing aids microphones. This stage is illustrated by the "pre-process enhancement" and "pre-process evaluation" boxes. However, in practice both boxes correspond to the output of the script.

  • First, it takes the Scene details:
    • MUSDB18-HQ music (mixture, vocal, drums, bass, other).
    • Subject head and loudspeaker-position (HRTFs).
  • It applies the HRTFs to the left and right side of all signals (mixture and VDBO components) [Figure 2]
  • The mixture with HRTF applied corresponds to the output of the "pre-process enhancement" block.
  • The VDBO signals with HRTF applied correspond to the output of the "pre-process evaluation" block.
Figure 3, The scenario.

The Enhancement block

The enhancement takes a mixture signal as it is picked up by the hearing aids microphones and attempts to output a personalized rebalanced stereo rendition of the music.

  • First, it takes stereo tracks ("mixture at the hearing aid mics") and demixes them into their VDBO (vocal, drums, bass and other) representation. This is done by using an out-of-the-box audio source separation system.
  • Then, using the gains provided, the music is downmixed to stereo after changing the level of the different elements of the music.
  • Next, the downmixed signal is normalised to match the LUFS level of the input mixture.
  • NAL-R amplification is applied to the normalised downmixed signal, allowing for a personalised amplification for the listener using a standard hearing aid algorithm.
  • This amplified signal is the output of the system: Processed signal

The Evaluation block

The evaluation generates the reference and processed signals and computes the HAAQI score.

  • First, it takes the VDBO signals at the hearing aid microphones (these are the VDBO components provided by MUSDB18-HQ with the HRTF applied to them) and remixes the signals using the same gains as applied in the enhancement.
  • Then, it normalises the remix to the same LUFS level as the "mixture at the hearing aid mics".
  • Next, it applies the NAL-R amplification.
  • This process results in the Reference signal for HAAQI, which simulates a "listener preferred mixture". The reference signal corresponds to an ideal rebalanced signal when we have access to the clean VDBO components.
  • As HAAQI is an intrusive metric, the score is computed by comparing the Processed signal (downmixed music) with the Reference signal, focussing on changes to time-frequency envelope modulation, temporal fine structure and long-term spectrum.
  • In the Enhancement and Evaluation blocks, we apply a loudness normalisation (in LUFS) after applying the gains. This is to keep the loudness of the remix at the same levels as the mixture at the hearing aid mics.
  • As required by HAAQI, we resample both the reference and processed signal before computing the score.

Baseline Scores

Two baseline systems are proposed by employing two out-of-the-box audio source separation systems in the enhancement block.

  • Hybrid Demucs [2] distributed on TorchAudio
  • Open-Unmix [3] distributed through Pytorch hub.

The average HAAQI scores are:

SystemLeft HAAQIRight HAAQIOverall


[1] Kates, J.M. and Arehart, K.H., 2016. The Hearing-Aid Audio Quality Index (HAAQI), in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 354-365, doi: 10.1109/TASLP.2015.2507858

[2] Défossez, A. "Hybrid Spectrogram and Waveform Source Separation". Proceedings of the ISMIR 2021 Workshop on Music Source Separation. doi:10.48550/arXiv.2111.03600

[3] Stöter, F. R., Liutkus, A., Ito, N., Nakashika, T., Ono, N., & Mitsufuji, Y. (2019). "Open-Unmix: A Reference Implementation for Music Source Separation". Journal of Open Source Software, 4(41), 1667. doi:10.21105/joss.01667