Datasets and Pretrained models

If participants want to use a specific dataset not included in the list of the official datasets for the challenge, or want to use a pretrained model, they can submit a request by filling out the form at the end of this page. Before submitting the request, please ensure that it is not already included in the current lists and that it complies with the following rules.

Cannot be private.
Must be freely available to all participants.
Must be easily accessible to all participants.
Must not contain copyright restrictions that prevent its use.
Datasets must not derive from the evaluation datasets.
Pretrained models must not include the evaluation dataset in any form.

Datasets

The next table shows a list with the datasets that are or aren't allowed for the specific tasks.

Description of the columns:

Dataset: indicates the name of the dataset.
Task 1: indicates if the dataset can be used for training in Task 1 - Lyrics Intelligibility task.
Task 2: indicates if the dataset can be used for training in Task 2 - Rebalancing levels of instruments in a Classical Music.
Comments: Extra comments.

Yes : Dataset can be used for the task.
No : Dataset cannot be used for the task.
- : Dataset may not be suitable for the task.

Dataset	Task 1	Task 2	Comments
MUSDB18-HQ train split	Yes	Yes
MUSDB18-HQ test split	No	Yes
DALI	Yes	Yes	https://github.com/gabolsgabs/DALI
JamendoLyrics	No	Yes	In Task 1, this includes all datasets that are derived from it, such as Jam-ALT.
CCMixter	Yes	Yes	https://members.loria.fr/ALiutkus/kam/
EnsembleSet	-	Yes	We mirror one remix but the rest are allowed https://zenodo.org/records/6519024
CadenzaWoodwind	-	Yes
AAM	-	Yes	https://zenodo.org/records/5794629
URMP	-	No	Including any dataset derived from it, such as Sub-URMP.
BACH10	-	No	Including any dataset derived from it, such as BACH10 Sibelius.
TRIOS dataset	-	Yes	https://zenodo.org/records/6797837
FMA	Yes	Yes	https://github.com/mdeff/fma

Pretrained models

The next table shows a list with the pretrained models that are or aren't allowed for the specific tasks.

Description of the columns:

Models: indicates the name of the pretrained models.
Task 1: indicates if the dataset can be used for training in Task 1 - Lyrics Intelligibility task.
Task 2: indicates if the dataset can be used for training in Task 2 - Rebalancing levels of instruments in a Classical Music.
Comments: Extra comments.

Dataset	Task 1	Task 2	Comments
Whisper	No	No	Because there is no clarity of what datasets were used for training Whisper
OWSM	Yes	Yes	https://www.wavlab.org/activities/2024/owsm/
Models from DNS-Challenge-4	Yes	Yes	https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2022/

info

Note that although we do not allow the use of Whisper as a pretrained model in your system’s Music Enhancer, it is used during the evaluation stage of the challenge.

Request a Dataset or Pretrained model

If you want to use a dataset not listed above or want to work with a pretrained model, please submit a request in the form below so we can check it is OK. We want to make sure all teams have access to the same data/pre-trained models for a fair challenge.

Datasets​

Pretrained models​

Request a Dataset or Pretrained model​

Datasets

Pretrained models

Request a Dataset or Pretrained model