Task 2 Car

1 Training/development

1.1 Music Data

The music dataset is based on the small split of the FMA dataset and MTG Jamendo. FMA-Small is a balanced dataset for genre classification. From the eight genres available in FMA-small, we selected five genres:

Hip-Hop
Instrumental
International
Pop
Rock

However, those with hearing loss are more likely to be older people who listen to classical and orchestral music [1]. Therefore, we included samples from these two genres sourced from the MTG-Jamendo dataset:

Classical
Orchestral

Each genre contains 900 30-seconds samples divided into 800 for training and 100 for development.

1.2 HRTF Data

To allow for the 'room' acoustic in the car, we use Anechoic and Car binaural room impulse responses from the eBrIRD ELOSPHERES dataset.

2 Evaluation

700 30-second samples
You should process all the music.
All the music will be used for HAAQI evaluation.
We will then select a random 10-second sample from some of this music for listening panel evaluation.

3. Data file formats and naming conventions

3.1 Enhanced signals

The baseline generates one output per scene:

<Dataset Split>/<Listener ID>/<Scene_ID>_<Listener ID>_<Song ID>.flac

Where:

Dataset Split - Split you are evaluation, train, valid or test.
Listener ID - ID of the listener panel member, e.g., L001 to L100 for initial pseudo-listeners, etc.
Song ID - ID of the song.

For example:

valid
  ├───L5000_fma_041020.wav
  ├───L5000_fma_058333.wav
  ├───L5007_mtg_00539764.wav
  ├─── ... 

3.2 Evaluation Signal

In the evaluation stage, several signal can be generated to explore the different intermediate signals. For example, you might want to explore the output of the hearing aid to ensure that samples are not clipped. These additonal signals can be generated by setting the parameter evaluate.save_intermediate_wavs to True in config.yaml.

When evaluate.save_intermediate_wavs is False, the evaluation generates:

ha_processed_signal.wav - Output of the HA to use in HAAQI evaluation.
ref_signal_for_eval - Reference signal to use in HAAQI evaluation.

When evaluate.save_intermediate_wavs is True, the evaluation also generates:

car_noise_anechoic.wav - Car noise with anechoic HRTFs at the front HA microphones.
car_noise_anechoic_scaled.wav - Car noise with anechoic HRTFs at the front HA microphones scaled to SNR.
enh_signal_hrtf.wav - Enhanced musical signal with car HRTFs at the front HA microphone.
enh_signal_hrtf_plus_car_noise_anechoic.wav - Enhanced music with car noise added at certain SNR. Signal to pass through the HA.
ref_signal_anechoic.wav - Reference signal with anechoic HRTFs at the eardrums.

3.3 Music Metadata

Music data is store in a single JSON per file dataset with the following format.

  {
  "fma_000002": {
    "track_id": "000002",
    "path": "training/Hip-Hop/000002.mp3",
    "artist": "AWOL",
    "album": "AWOL - A Way Of Life",
    "title": "Food",
    "license": "Attribution-NonCommercial-ShareAlike 3.0 International",
    "genre": "Hip-Hop",
    "bit_rate": 256000,
    "duration": 168,
    "channels": 2,
    "source": "fma"
  }

3.4 HRTFs Metadata

HRTFs data is stored in a single JSON with the following format.

{
  "train": {
    "-57.5": {
      "car": {
        "left_speaker": {
          "left_side": "HR13_E03_CH1_Left",
          "right_side": "HR13_E03_CH1_Right"
        },
        "right_speaker": {
          "left_side": "HR13_E04_CH1_Left",
          "right_side": "HR13_E04_CH1_Right"
        }
      },
      "anechoic": {
        "left_speaker": {
          "left_side": "HR5_E02_CH0_Left",
          "right_side": "HR5_E02_CH0_Right"
        },
        "right_speaker": {
          "left_side": "HR21_E02_CH0_Left",
          "right_side": "HR21_E02_CH0_Right"
        }
      }
    }
  }
}

3.5 Scenes Metadata

Scene data is stored in a single JSON with the following format.

{
  "S100000": {
    "scene": "S100000",
    "song": "fma_081613",
    "song_path": "training/Instrumental/081613.mp3",
    "hr": 25.0,
    "car_noise_parameters": {
      "speed": 114.0,
      "gear": 6,
      "reference_level_db": 30.9,
      "engine_num_harmonics": 12,
      "rpm": 1915.2,
      "primary_filter": {
        "order": 1,
        "btype": "lowpass",
        "cutoff_hz": 20.3632
      },
      "secondary_filter": {
        "order": 2,
        "btype": "lowpass",
        "cutoff_hz": 314.2048
      },
      "bump": {
        "order": 2,
        "btype": "bandpass",
        "cutoff_hz": [
          77,
          110
        ]
      },
      "dip_low": {
        "order": 1,
        "btype": "lowpass",
        "cutoff_hz": 170
      },
      "dip_high": {
        "order": 1,
        "btype": "highpass",
        "cutoff_hz": 455
      }
    },
    "snr": 7.8386,
    "split": "train"
  }
}

4. Reference

[1] Bonneville-Roussy, A., Rentfrow, P. J., Xu, M. K., & Potter, J. (2013). Music through the ages: Trends in musical engagement and preferences from adolescence through middle adulthood. Journal of personality and social psychology, 105(4), 703.

1 Training/development​

1.1 Music Data​

1.2 HRTF Data​

2 Evaluation​

3. Data file formats and naming conventions​

3.1 Enhanced signals​

3.2 Evaluation Signal​

3.3 Music Metadata​

3.4 HRTFs Metadata​

3.5 Scenes Metadata​

4. Reference​