Dataset Specification
Data Distribution and Installation
The data is distributed as two gzipped tar archives:
cadenza_clip1_data.train.v1.0.tar.gz[4.5GB]: labelled training data.cadenza_clip1_data.valid.v1.0.tar.gz[576MB]: unlabelled validation data.
If you need to have a quick view of the data structure, we also provide a demo package:
cadenza_clip1_data.demo.v1.0.tar.gz[4MB]
This package contains 5 samples from the training and 5 from the validation data.
- Installation Instructions
- Download the
.tar.gzfiles. - unpack each archive using the following commands:
tar -xvzf cadenza_clip1_data.train.v1.0.tar.gz # For training data
tar -xvzf cadenza_clip1_data.valid.v1.0.tar.gz # For validation data
- Directory Structure after unpacking
cadenza_data/ ├── metadata/ ├── train/ │ ├── signals/ # Audio (1) to predict intelligibility │ └── unprocessed/ # Audio (2) without hearing loss ├── valid/ │ ├── signals/ │ └── unprocessed/ └── Manifest/
Audio signals
- Stored as 16-bit stereo FLAC files at 44100 Hz.
- Filenames:
- train/signals/
<HASH_NUMBER>.flac: audio (1) signal to predict intelligibility. - train/unprocessed/
<HASH_NUMBER>_unproc.flac: the unprocessed (without hearing loss) audio (2) signal.
- train/signals/
- Notes:
- Audios (1) and unprocessed audios (2) have matching
<HASH_NUMBER>. - Slight misalignment and variations in the number of frames may occur between the signals audio and the unprocessed audio due to the hearing loss simulation.
- Audios (1) and unprocessed audios (2) have matching
Training Metadata
The train metadata is saved in metadata/train_metadata.json.
The metadata contains list of dictionaries, each representing the listener's response.
Fields:
signal: audio filename.fma: Free Music Archive track id. Same as the one in fma metadata.original_prompt: original target sentence.prompt: text normalised target sentence (ground truth) used for correctness computation.original response: text transcription from Prolific listening tests ('#' indicates no words entered).response: text normalised transcription used for correctness computation.n_words: number of words in prompts after expanding contractions (see data construction).words_correct: number of correctly identified words.correctness: Intelligibility score i.e. rate of correctly identified words (the target variable).hearing_loss: Indicating if the signal audio was: not processed (No Loss); has Mild simulated hearing loss; or Moderate simulated hearing loss
[
{
"signal":"b2486a8e181f84acd2db4d61",
"fma":"001750",
"original_prompt":"in my mind I can see is fine",
"prompt":"in my mind i can see is fine",
"original_response":"#",
"response":"",
"n_words":8,
"words_correct":0,
"correctness":0.0,
"hearing_loss":"Moderate"
}
]
Validation/Evaluation Metadata
The validation metadata is saved in metadata/valid_metadata.json.
The metadata contains list of dictionaries, each representing the listener's response.
Fields:
signal: name of signal to predict intelligibility from.original_prompt: original target sentence.prompt: text normalised target sentence (ground truth) used for correctness computation.n_words: number of words in prompts after expanding contractions.hearing_loss: Indicating if the signal audio was: not processed (No Loss); has Mild simulated hearing loss; or Moderate simulated hearing loss
[
{
"signal":"c657dc0bd1f822f8ff7104a1",
"original_prompt":"do you know i am truly alone",
"prompt":"do you know i am truly alone",
"n_words":7,
"hearing_loss":"Moderate"
}
]