Core Software
The code is provided as a Clarity_Cadenza GitHub repository containing individual Python tools and a complete baseline system.
1. Task 1: Headphones
The key elements of the task1 baseline system are the:
- Baseline demixer
- HAAQI music quality model.
1.1 Baseline Demixer
The demixing stage simply takes the out-of-the-box Hybrid Demucs music source separation model to obtain stereo VDBO stems. This model is publicly available in the TorchAudio library. Then, it applies a NAL-R [1] and compression procedure to each stem to personalize the output.
Inputs: Stereo songs and Listeners characteristics. Output: 8 stems (left and right VDBO stems) for each song-listener. And 1 remixed signal for listener panel.
2. Task 2: Car stereo
The key elements of the task2 baseline system are:
- Car acoustics model
- HAAQI music quality model
2.1 Car acoustics model
The scene metadata contains several parameters that is used by the car scene metadata.
- Using the car parameters, a car noise signal is generated. For details of the Car Noise generator, see additional tools. This signal is added with anechoic HRTFs at the front microphone of the Hearing Aid (HA).
- Using the head rotation angle, the model adds car HRTFs at the front microphone of the HA to the enhanced music.
- These two signal are then added together at a given SNR at the front microphone of the HA.
- The resulting signal is pass through a fixed HA composed of a simple NAL-R amplification.
- The HA output is the signal use for evaluation.
3. HAAQI Audio Qiality model
This is a python implementation of the Hearing Aid Audio Quality Index (HAAQI) model which is used for objective estimation. This will be used in the stage 1 evaluation of entrants (see Rules).
Note that HAAQI is not a binaural metric, instead, each channel must be processed separately. We average the left and right scores to produce a final overall score.
The reference signals for HAAQI are:
- Task 1 Headphone. The original left and right channels of the music tracks.
- Task 2 Car. The references are the left and right signals at the ear canal of a listener auditioning the music over a stereo set-up of two loudspeakers, in an anechoic room.
4. References
[1] Byrne, Denis, and Harvey Dillon. "The National Acoustic Laboratories'(NAL) new procedure for selecting the gain and frequency response of a hearing aid." Ear and hearing 7.4 (1986): 257-265.