Task 2 Car
Data and baseline code can be downloaded from the download page following this timeline.
1 Training/development
1.1 Music Data
The music dataset is based on the small split of the FMA dataset and MTG Jamendo. FMA-Small is a balanced dataset for genre classification. From the eight genres available in FMA-small, we selected five genres:
Hip-Hop
Instrumental
International
Pop
Rock
However, those with hearing loss are more likely to be older people who listen to classical
and orchestral
music [1]. Therefore, we included samples from these two genres sourced from the MTG-Jamendo dataset:
Classical
Orchestral
Each genre contains 900 30-seconds samples divided into 800 for training and 100 for development.
1.2 HRTF Data
To allow for the 'room' acoustic in the car, we use Anechoic
and Car
binaural room impulse
responses from the eBrIRD ELOSPHERES dataset.
2 Evaluation
- 700 30-second samples
- You should process all the music.
- All the music will be used for HAAQI evaluation.
- We will then select a random 10-second sample from some of this music for listening panel evaluation.
3. Data file formats and naming conventions
3.1 Enhanced signals
The baseline generates one output per scene:
<Dataset Split>/<Listener ID>/<Scene_ID>_<Listener ID>_<Song ID>.flac
Where:
Dataset Split
- Split you are evaluation,train
,valid
ortest
.Listener ID
- ID of the listener panel member, e.g., L001 to L100 for initialpseudo-listeners
, etc.Song ID
- ID of the song.
For example:
valid
├───L5000_fma_041020.wav
├───L5000_fma_058333.wav
├───L5007_mtg_00539764.wav
├─── ...
3.2 Evaluation Signal
In the evaluation stage, several signal can be generated to explore the different intermediate signals. For example, you might want to explore the output of the hearing aid to ensure that samples are not clipped. These additonal signals can be generated by setting the parameter evaluate.save_intermediate_wavs
to True in config.yaml
.
When evaluate.save_intermediate_wavs
is False, the evaluation generates:
ha_processed_signal.wav
- Output of the HA to use in HAAQI evaluation.ref_signal_for_eval
- Reference signal to use in HAAQI evaluation.
When evaluate.save_intermediate_wavs
is True, the evaluation also generates:
car_noise_anechoic.wav
- Car noise with anechoic HRTFs at the front HA microphones.car_noise_anechoic_scaled.wav
- Car noise with anechoic HRTFs at the front HA microphones scaled to SNR.enh_signal_hrtf.wav
- Enhanced musical signal with car HRTFs at the front HA microphone.enh_signal_hrtf_plus_car_noise_anechoic.wav
- Enhanced music with car noise added at certain SNR. Signal to pass through the HA.ref_signal_anechoic.wav
- Reference signal with anechoic HRTFs at the eardrums.
3.3 Music Metadata
Music data is store in a single JSON per file dataset with the following format.
{
"fma_000002": {
"track_id": "000002",
"path": "training/Hip-Hop/000002.mp3",
"artist": "AWOL",
"album": "AWOL - A Way Of Life",
"title": "Food",
"license": "Attribution-NonCommercial-ShareAlike 3.0 International",
"genre": "Hip-Hop",
"bit_rate": 256000,
"duration": 168,
"channels": 2,
"source": "fma"
},
...
3.4 HRTFs Metadata
HRTFs data is stored in a single JSON with the following format.
{
"train": {
"-57.5": {
"car": {
"left_speaker": {
"left_side": "HR13_E03_CH1_Left",
"right_side": "HR13_E03_CH1_Right"
},
"right_speaker": {
"left_side": "HR13_E04_CH1_Left",
"right_side": "HR13_E04_CH1_Right"
}
},
"anechoic": {
"left_speaker": {
"left_side": "HR5_E02_CH0_Left",
"right_side": "HR5_E02_CH0_Right"
},
"right_speaker": {
"left_side": "HR21_E02_CH0_Left",
"right_side": "HR21_E02_CH0_Right"
}
}
},
...
...
}
3.5 Scenes Metadata
Scene data is stored in a single JSON with the following format.
{
"S100000": {
"scene": "S100000",
"song": "fma_081613",
"song_path": "training/Instrumental/081613.mp3",
"hr": 25.0,
"car_noise_parameters": {
"speed": 114.0,
"gear": 6,
"reference_level_db": 30.9,
"engine_num_harmonics": 12,
"rpm": 1915.2,
"primary_filter": {"order": 1, "btype": "lowpass", "cutoff_hz": 20.3632},
"secondary_filter": {"order": 2, "btype": "lowpass", "cutoff_hz": 314.2048},
"bump": {"order": 2, "btype": "bandpass", "cutoff_hz": [77, 110]},
"dip_low": {"order": 1, "btype": "lowpass", "cutoff_hz": 170},
"dip_high": {"order": 1, "btype": "highpass", "cutoff_hz": 455}
},
"snr": 7.8386,
"split": "train"
},
...
4. Reference
[1] Bonneville-Roussy, A., Rentfrow, P. J., Xu, M. K., & Potter, J. (2013). Music through the ages: Trends in musical engagement and preferences from adolescence through middle adulthood. Journal of personality and social psychology, 105(4), 703.