Dataset Specification
Data Distribution and Installation​
The data is distributed as two gzipped tar archives:
cadenza_clip1_data.train.v1.0.tar.gz
[4.5GB]: labelled training data.cadenza_clip1_data.valid.v1.0.tar.gz
[576MB]: unlabelled validation data.
If you need to have a quick view of the data structure, we also provide a demo package:
cadenza_clip1_data.demo.v1.0.tar.gz
[4MB]
This package contains 5 samples from the training and 5 from the validation data.
- Installation Instructions
- Download the
.tar.gz
files. - unpack each archive using the following commands:
tar -xvzf cadenza_clip1_data.train.v1.0.tar.gz # For training data
tar -xvzf cadenza_clip1_data.valid.v1.0.tar.gz # For validation data
- Directory Structure after unpacking
cadenza_data/ ├── metadata/ ├── train/ │ ├── signals/ # Audio (1) to predict intelligibility │ └── unprocessed/ # Audio (2) without hearing loss ├── valid/ │ ├── signals/ │ └── unprocessed/ └── Manifest/
Audio signals​
- Stored as 16-bit stereo FLAC files at 44100 Hz.
- Filenames:
- train/signals/
<HASH_NUMBER>
.flac: audio (1) signal to predict intelligibility. - train/unprocessed/
<HASH_NUMBER>
_unproc.flac: the unprocessed (without hearing loss) audio (2) signal.
- train/signals/
- Notes:
- Audios (1) and unprocessed audios (2) have matching
<HASH_NUMBER>
. - Slight misalignment and variations in the number of frames may occur between the signals audio and the unprocessed audio due to the hearing loss simulation.
- Audios (1) and unprocessed audios (2) have matching
Training Metadata​
The train metadata is saved in metadata/train_metadata.json
.
The metadata contains list of dictionaries, each representing the listener's response.
Fields:
signal
: audio filename.prompt
: original target sentence (ground truth).response
: text transcription from Prolific listening tests ('#' indicates no words entered).n_words
: number of words in prompts after expanding contractions (see data construction).words_correct
: number of correctly identified words.correctness
: Intelligibility score i.e. rate of correctly identified words (the target variable).hearing_loss
: Indicating if the signal audio was: not processed (No Loss); has Mild simulated hearing loss; or Moderate simulated hearing loss
[
{
"signal": "b2486a8e181f84acd2db4d61",
"prompt": "in my mind I can see is fine",
"response": "#",
"n_words": 8,
"words_correct": 0,
"correctness": 0.0,
"hearing_loss": "Moderate"
}
]
Validation Metadata​
The validation metadata is saved in metadata/valid_metadata.json
.
The metadata contains list of dictionaries, each representing the listener's response.
Fields:
signal
: name of signal to predict intelligibility from.prompt
: original target sentence (ground truth).n_words
: number of words in prompts after expanding contractions.hearing_loss
: Indicating if the signal audio was: not processed (No Loss); has Mild simulated hearing loss; or Moderate simulated hearing loss
[
{
"signal": "c657dc0bd1f822f8ff7104a1",
"prompt": "do you know i am truly alone",
"n_words": 7,
"hearing_loss": "Moderate"
}
]