Baseline System

A baseline non-intrusive lyrics intelligibility system has been provided to help you get started.

🚧 Under Construction 🚧

This section is under construction and some information may change in the final version.

A. Overview

The baseline is based on the singing adapted STOI and vocal-specific features (SA-STOI) system [1] to make predictions. SA-STOI is a self-reference metric (the reference is generated from the processed signal) that uses estimated vocals as reference and 12 vocal features computed from the processed signal and the estimated vocals to account for singing styles and expressions. The final computation is obtained by using a support vector machines (SVM) -based regression model trained using a 13 feature vector (the STOI score combined with the 12 vocal features) and, using human intelligibility scores as reference.

Figure 1. Framework of SA-STOI computation (Figure 6 in [1])

A1. Differences Baseline implementation v/s original SA-STOI

The original SA-STOI model is based on a U-NET music source separation model (MSS) and MATLAB functions to compute the vocal features. For our baseline, we replaced the U-NET MSS with the HDemucs model which reported higher performance in MUSDB-18 demixing benchmark. The model was ported to Python using the functions available to compute the same features as the original SA-STOI.
The original SA-STOI model was trained on 140 audio segments, each scored by 17 participants. The final intelligibility score corresponds to the average across all 17 scores. Our ported model was trained on more than 8000 audio segments, each scored by a single participant. For details of the dataset, please refer to the dataset description
For the SVM model, the original SA-STOI system employ the libsvm-3.24 library. In our baseline system, we used the SVM model from the Python scipy module.

B. How to use the baseline

The baseline system is included in the pyclarity Python package (version pyclarity >= 0.8), which is available on GitHub. The relevant scripts are located in the recipes/clip1/baseline directory. To use the baseline system:

Download the Code: Clone or download the repository from GitHub.
Follow the Instructions: Refer to the README file in the recipes/clip1/baseline directory for detailed steps to run the baseline on the CLIP1 dataset.

C. Baseline Performance

The baseline system achieve the following performance on the validation set:

Metric	Value
RMSE	TBC
Correlation	TBC

References

B. Sharma and Y. Wang, "Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 319-331, 2020, doi: 10.1109/TASLP.2019.2955253

A. Overview​

A1. Differences Baseline implementation v/s original SA-STOI​

B. How to use the baseline​

C. Baseline Performance​

References​

A. Overview

A1. Differences Baseline implementation v/s original SA-STOI

B. How to use the baseline

C. Baseline Performance

References