Baseline System
A baseline non-intrusive lyrics intelligibility system has been provided to help you get started.
This section is under construction and some information may change in the final version.
A. Overview​
The baseline is based on the singing adapted STOI and vocal-specific features (SA-STOI) system [1] to make predictions. SA-STOI is a self-reference metric (the reference is generated from the processed signal) that uses estimated vocals as reference and 12 vocal features computed from the processed signal and the estimated vocals to account for singing styles and expressions. The final computation is obtained by using a support vector machines (SVM) -based regression model trained using a 13 feature vector (the STOI score combined with the 12 vocal features) and, using human intelligibility scores as reference.
![Figure 6 in [1]](/assets/images/sastoi-7e3510fd9f53ae2f65e630480492669a.gif)
A1. Differences Baseline implementation v/s original SA-STOI​
-
The original SA-STOI model is based on a U-NET music source separation model (MSS) and MATLAB functions to compute the vocal features. For our baseline, we replaced the U-NET MSS with the HDemucs model which reported higher performance in MUSDB-18 demixing benchmark. The model was ported to Python using the functions available to compute the same features as the original SA-STOI.
-
The original SA-STOI model was trained on 140 audio segments, each scored by 17 participants. The final intelligibility score corresponds to the average across all 17 scores. Our ported model was trained on more than 8000 audio segments, each scored by a single participant. For details of the dataset, please refer to the dataset description
-
For the SVM model, the original SA-STOI system employ the
libsvm-3.24
library. In our baseline system, we used the SVM model from the Pythonscipy
module.
B. How to use the baseline​
The baseline system is included in the pyclarity
Python package (version pyclarity >= 0.8), which is available on GitHub.
The relevant scripts are located in the recipes/clip1/baseline
directory. To use the baseline system:
- Download the Code: Clone or download the repository from GitHub.
- Follow the Instructions: Refer to the README file in the
recipes/clip1/baseline
directory for detailed steps to run the baseline on the CLIP1 dataset.
C. Baseline Performance​
The baseline system achieve the following performance on the validation set:
Metric | Value |
---|---|
RMSE | TBC |
Correlation | TBC |
References​
- B. Sharma and Y. Wang, "Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 319-331, 2020, doi: 10.1109/TASLP.2019.2955253