⚠️ This is a beta version. Please report any issues or request new tutorial by opening an issue on the GitHub repository. ⚠️

Interacting with Metadata#

This tutorial walks through the process of interacting with metadata samples from CAD1. However, the same process can be applied to other challenges.

Load our tutorial environment

!pip install gdown --quiet
from pprint import pprint

Obtaining the sample data#

In order to demonstrate basic functionality, we will download a small demo dataset for CAD1.

!gdown 10SfuZR7yVlVO6RwNUc3kPeJHGiwpN3VS
!mv cadenza_task1_data_demo.tar.xz ../cadenza_task1_data_demo.tar.xz
%cd ..
!tar -xvf cadenza_task1_data_demo.tar.xz
Downloading...
From: https://drive.google.com/uc?id=10SfuZR7yVlVO6RwNUc3kPeJHGiwpN3VS
To: /home/gerardoroadabike/Extended/Projects/cadenza_tutorials/getting_started/cadenza_task1_data_demo.tar.xz
100%|█████████████████████████████████████████| 102M/102M [00:00<00:00, 109MB/s]
/home/gerardoroadabike/Extended/Projects/cadenza_tutorials
cadenza_data_demo/
cadenza_data_demo/cad1/
cadenza_data_demo/cad1/task1/
cadenza_data_demo/cad1/task1/metadata/
cadenza_data_demo/cad1/task1/metadata/musdb18.valid.json
cadenza_data_demo/cad1/task1/metadata/listeners.valid.json
cadenza_data_demo/cad1/task1/audio/
cadenza_data_demo/cad1/task1/audio/musdb18hq/
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/bass.wav
/home/gerardoroadabike/anaconda3/envs/tutorials/lib/python3.11/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/drums.wav
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/other.wav
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/vocals.wav
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/mixture.wav

The Structure of the metadata#

In CAD1, there are two metadata files, which are JSON files. This structure is similar across all challenges. Some challenges may have additional metadata files.

  • listeners.json - listeners’ characteristics in form of audiograms for left and right ear.

  • musdb18hq.json - list of audio tracks to process.

Dataset

Structure

Index

listener

dict of dicts

LISTENER_ID

musdb18hq

list of dict

Track Name


Reading the metadata files#

The challenges metadata are stored in JSON format. The python’s JSON library imports JSON files and parses them into python objects. This is demonstrated in the cell below.

import json

with open("cadenza_data_demo/cad1/task1/metadata/listeners.valid.json") as f:
    listeners = json.load(f)

with open("cadenza_data_demo/cad1/task1/metadata/musdb18.valid.json", "r", encoding="utf-8") as file:
    song_data = json.load(file)

Listeners#

The next cell shows two samples of listeners from the validation set. These are anonymized real audiograms.

  • audiogram_cfs is a list of center frequencies in Hz.

  • auidogram_levels_l and audiogram_levels_r are lists of hearing thresholds in dB SPL for the left and right ear, respectively.

  • name is the listener’s id.

pprint(listeners)
{'L5040': {'audiogram_cfs': [250, 500, 1000, 2000, 3000, 4000, 6000, 8000],
           'audiogram_levels_l': [30, 25, 25, 70, 80, 80, 80, 80],
           'audiogram_levels_r': [20, 15, 20, 45, 55, 70, 80, 80],
           'name': 'L5040'},
 'L5076': {'audiogram_cfs': [250, 500, 1000, 2000, 3000, 4000, 6000, 8000],
           'audiogram_levels_l': [15, 20, 30, 30, 45, 50, 60, 75],
           'audiogram_levels_r': [15, 25, 30, 35, 40, 40, 60, 70],
           'name': 'L5076'}}

Music#

The next cell shows one samples of music tracks from the validation set. This file contains general information about the track like the name, split and licence.

pprint(song_data)
[{'Genre': 'Pop/Rock',
  'License': 'Restricted',
  'Source': 'DSD',
  'Split': 'valid',
  'Track Name': 'Actions - One Minute Smile'}]

Let’s load a 10-second sample of the mixture signal from the first song in the validation set.

import pandas as pd
from IPython.display import Audio, display
from pathlib import Path
from scipy.io import wavfile

# Load song_data as pandas DataFrame
songs_valid = pd.DataFrame.from_dict(song_data)
split_directory = (
    "test"
    if songs_valid.loc[0, "Split"] == "test"
    else "train"
    )

sample_rate, mixture_signal = wavfile.read(
        Path('cadenza_data_demo/cad1/task1/audio/musdb18hq')
        / split_directory
        / songs_valid.loc[0, "Track Name"]
        / "vocals.wav"
)

Audio(mixture_signal[:int(10 * sample_rate), :].T, rate=sample_rate)