Rules
The following rules are provisional.
1. Teamsβ
- Teams must pre-register and nominate a contact person.
- Teams can be from one or more institution.
- Anonymous entries are allowed.
- You must provide a technical document of up to 2 pages describing the system/model, what data and pre-existing tools, software and models used.
- We will publish all technical documents (anonymous or otherwise).
- You are encouraged to make your code open source so others can build on your work.
2. What information can I use?β
2.1. Training and developmentβ
Teams should use the signals and listener responses provided.
In addition, teams can use their own data for training or expand the training data through simple automated modifications. Additional pre-training data could be generated by existing speech intelligibility, lyric intelligibility and hearing loss models. The FAQ gives links to some models that might be used for this.
2.2. Evaluationβ
The only data that can be used by the prediction model(s) during evaluation are described below.
For non-intrusive methods:
- The output of the hearing aid processor/system.
- The listener's hearing impairment severity as indicated by the metadata
Additionally, for intrusive methods:
- The target reference signal
- The target transcript, i.e. the sentence sung in the target signal.
3. Baseline models and computational restrictionsβ
- Teams may choose to use all or some of the provided baseline models.
- There is no limit on computational cost.
- Models can be causal and non-causal.
4. What sort of model do I create?β
- Your model should report the lyric intelligibility for the whole sentence for each audio sample/listener combination, i.e. a single score that represents a prediction of the proportion of words that would be recognised correctly
- The model architecture is entirely up to you, e.g. you can create a model that attempts to recognise individual words and then reduces this down to a proportion, or you can estimate an intelligibility score directly from the audio. Models may have explicit hearing loss model stages or be trained directly to map signals and audiograms to predictions.
- The use of pre-trained foundational models is allowed, but the name of the model and how it was used should be clear in the report.
5. Submitting multiple entries for a taskβ
If you wish to submit multiple entries,
- Your systems must have significant differences in their approach.
- You must contact the organisers to discuss your plans.
- If accepted, you will be issued with multiple Team IDs to distinguish your entries.
- In your documentation, you must make it clear how the submissions differ.
6. Evaluation of systemsβ
- Entries will be ranked according to their performance in predicting measured intelligibility scores.
- The system score will be taken to be the RMSE between the predicted and measured intelligibility scores across the complete test set.
- Separate rankings will be made for intrusive and non-intrusive methods.
- Systems will only be considered if the technical report has been submitted and the system is judged to be compliant with the challenge rules.
7. Transparencyβ
- Teams must provide a technical document of up to 2 pages describing the system/model and any external data and pre-existing tools, software and models used.
- We will publish all technical documents on the challenge website (anonymous or otherwise).
- Teams are encouraged β but not required β to provide us with access to the system(s)/model(s) and to make their code open source.
- Anonymous entries are allowed.
- If a group of people submits multiple entries.
- All teams will be referred to using anonymous codenames if the rank ordering is published before the final results are announced.
- Teams are strongly encouraged to submit their report for presentation at the Cadenza Workshop organised for these purposes.
8. Intellectual propertyβ
The following terms apply to participation in this machine learning challenge (Challenge). Entrants may create original solutions, prototypes, datasets, scripts, or other content, materials, discoveries or inventions. Entrants retain ownership of all intellectual and industrial property rights (including moral rights) in and to these.
The Submission constitutes a set of intelligibility predictions and an accompanying technical report.
The Challenge Organiser is the Cadenza Project team.
As a condition of submission, Entrant grants the Challenge Organiser, its subsidiaries, agents and partner companies, a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive licence to use, reproduce, adapt, modify, publish, distribute, publicly perform, create a derivative work from, and publicly display the Submission.
Entrants provide Submissions on an βAS ISβ BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
Referencesβ
[1] Park, D.S., Zhang, Y., Chiu, C.C., Chen, Y., Li, B., Chan, W., Le, Q.V. and Wu, Y., 2020, May. Specaugment on large scale datasets. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6879-6883). IEEE.