Rules

What information can I use?

Training and development

We have provided teams with signals and listener responses for training and validation. This includes:

The stereo audio that our listeners heard during the intelligibility tests. This audio may have no, mild or moderate hearing loss simulated.
The audio without hearing loss simulation (where (1) has hearing loss simulation).
The listener's hearing impairment severity.
The ground-truth text of the lyrics.
The transcription by our listeners during the intelligibility tests and the intelligibility scores.

See Data for more details on how these were generated.

In addition, teams can use their own data for training or expand the training data through simple automated modifications. Additional pre-training data could be generated by existing speech intelligibility, lyric intelligibility and hearing loss models. The FAQ gives links to some models that might be used for this.

Evaluation

The evaluation set cannot be used for training or development of models. Audio samples must be processed independently of each other.

The only data that can be used by the prediction model during evaluation are:

The stereo audio that our listeners heard during the intelligibility tests.
The audio without hearing loss simulation (in a third of cases, this is the same as (1)).
The listener's hearing impairment severity.
The ground-truth text of the lyrics.

We will have a separate ranking list for intrusive and non-intrusive systems. Intrusive methods are ones that make use of (4), the ground truth text. Non-intrusive ones don't use (4) (This is a slightly different definition to that commonly used in speech intelligibility because of our scenario).

Baseline models and computational restrictions

The use of pre-trained foundational models is allowed but must be fully declared.
Teams may choose to use all or some of the provided baseline.
There is no limit on computational cost, but we expect entrants to report model size.
Models can be non-causal.

What sort of model do I create?

Your model should report the lyric intelligibility for the whole sentence for each audio sample/listener combination, i.e. a single score that represents a prediction of the proportion of words that would be recognised correctly
The model architecture is entirely up to you, e.g. you can create a model that attempts to recognise individual words and then reduces this down to a proportion, or you can estimate an intelligibility score directly from the audio.

Submitting multiple entries for a task

If you wish to submit multiple entries,

Your systems must have significant differences in their approach.
You must contact the organisers to discuss your plans.
If accepted, you will be issued with multiple Team IDs to distinguish your entries.
In your documentation, you must make it clear how the submissions differ.

Evaluation of systems

Entries will be ranked according to their performance in predicting measured intelligibility scores.
The system score will be taken to be the RMSE between the predicted and measured intelligibility scores across the complete test set.
Separate rankings will be made for intrusive and non-intrusive methods.
Systems will only be considered if the technical report has been submitted and the system is judged to be compliant with the challenge rules.

Teams

Teams must pre-register and nominate a contact person.
Teams can be from one or more institution.
FYI: The organisers and their PhD students are not permitted to enter under ICASSP rules

Transparency

Teams must provide a technical document of up to 2 pages describing the system/model and any external data and pre-existing tools, software and models used.
We will publish all technical documents on the challenge website (anonymous or otherwise).
Teams are encouraged – but not required – to provide us with access to the system(s)/model(s) and to make their code open source.
Anonymous entries are allowed.
All teams will be referred to using anonymous codenames if the rank ordering is published before the final results are announced.

Intellectual property

The entrants' “Submission” will consist of a set of intelligibility predictions and an accompanying technical report. Entrants retain ownership of all intellectual and industrial property rights (including moral rights) in and to Submissions. As a condition of submission, entrants grant the Challenge Organisers, its subsidiaries, agents and partner companies, a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to use, reproduce, adapt, modify, publish, distribute, publicly perform, create a derivative work from, and publicly display the Submission. Entrants provide Submissions on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.

What information can I use?​

Training and development​

Evaluation​

Baseline models and computational restrictions​

What sort of model do I create?​

Submitting multiple entries for a task​

Evaluation of systems​

Teams​

Transparency​

Intellectual property​