Skip to main content

Task 1 Headphones

Data and baseline code can be downloaded from the download page following this timeline.

1 Training/development

The main training/development database is the MUSDB18-HQ. MUSDB18-HQ has 86 training songs and 14 validation songs.

You can supplement the training and validation data from the following sources:

  • Bach10
  • FMA-small
  • MedleydB version 1 and version 2

We leave it to you to decide how to use these as part of the training and validation sets. Note, some songs from MedleydB are already part of the training set in MUSDB18-HQ. For more information on augmenting and supplementing the training data, please see the rules.


2 Evaluation

  • We will use the MUSDB18-HQ's evaluation set which is made up of 50 songs.
  • You must process all of these for the complete songs.
  • All the music will be used for HAAQI evaluation.
  • We will then select a random 10-second sample from some of the pieces of music for listening panel evaluation.

3. Data file formats and naming conventions

3.1 Enhanced signals

There are nine output signals generated by the baseline enhancement algorithm:

  • Eight enhanced output signal corresponding to the left and right channels of each stem (i.e., as submitted by the challenge entrants)

<Listener ID>/<Song Name>/<Listener ID>_<Song Name>_<Channel>_<Stem>.wav

  • One enhanced output signal corresponding to the final remix

<Listener ID>/<Song Name>/<Listener ID>_<Song Name>_remix.wav

Where:

  • Listener ID – ID of the listener panel member, e.g., L001 to L100 for initial pseudo-listeners, etc.
  • Song Name - Track name from MUSDB18, e.g, One Minute Smile.
  • Channel - left or right channel
  • Stem - Vocal, Bass, Drums or Others

For example, for development listener ID L5011 and development song name One Minute Smile_left, the enhanced output is:

L5011
└───One Minute Smile
├───L5011_Actions - One Minute Smile_left_bass.wav
├───L5011_Actions - One Minute Smile_right_bass.wav
├───L5011_Actions - One Minute Smile_left_drums.wav
├───L5011_Actions - One Minute Smile_right_drums.wav
├───L5011_Actions - One Minute Smile_left_other.wav
├───L5011_Actions - One Minute Smile_right_other.wav
├───L5011_Actions - One Minute Smile_left_vocals.wav
├───L5011_Actions - One Minute Smile_right_vocals.wav
└───L5011_Actions - One Minute Smile_remix.wav

3.2 Music metadata

Music data is store in a single JSON per file dataset with the following format.

[
{
"Track Name":"A Classic Education - NightOwl",
"Genre":"Singer/Songwriter",
"Source":"MedleyDB",
"License":"CC BY-NC-SA",
"Split":"train"
},
...
]