Interacting with Metadata#
This tutorial walks through the process of interacting with metadata samples from CAD1. However, the same process can be applied to other challenges.
Load our tutorial environment
!pip install gdown --quiet
from pprint import pprint
Obtaining the sample data#
In order to demonstrate basic functionality, we will download a small demo dataset for CAD1.
!gdown 10SfuZR7yVlVO6RwNUc3kPeJHGiwpN3VS
!mv cadenza_task1_data_demo.tar.xz ../cadenza_task1_data_demo.tar.xz
%cd ..
!tar -xvf cadenza_task1_data_demo.tar.xz
Downloading...
From: https://drive.google.com/uc?id=10SfuZR7yVlVO6RwNUc3kPeJHGiwpN3VS
To: /home/gerardoroadabike/Extended/Projects/cadenza_tutorials/getting_started/cadenza_task1_data_demo.tar.xz
100%|█████████████████████████████████████████| 102M/102M [00:00<00:00, 109MB/s]
/home/gerardoroadabike/Extended/Projects/cadenza_tutorials
cadenza_data_demo/
cadenza_data_demo/cad1/
cadenza_data_demo/cad1/task1/
cadenza_data_demo/cad1/task1/metadata/
cadenza_data_demo/cad1/task1/metadata/musdb18.valid.json
cadenza_data_demo/cad1/task1/metadata/listeners.valid.json
cadenza_data_demo/cad1/task1/audio/
cadenza_data_demo/cad1/task1/audio/musdb18hq/
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/bass.wav
/home/gerardoroadabike/anaconda3/envs/tutorials/lib/python3.11/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/drums.wav
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/other.wav
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/vocals.wav
cadenza_data_demo/cad1/task1/audio/musdb18hq/train/Actions - One Minute Smile/mixture.wav
The Structure of the metadata#
In CAD1, there are two metadata files, which are JSON files. This structure is similar across all challenges. Some challenges may have additional metadata files.
listeners.json - listeners’ characteristics in form of audiograms for left and right ear.
musdb18hq.json - list of audio tracks to process.
Dataset |
Structure |
Index |
---|---|---|
|
dict of dicts |
LISTENER_ID |
|
list of dict |
Track Name |
Reading the metadata files#
The challenges metadata are stored in JSON format. The python’s JSON
library imports JSON files and parses them into python objects.
This is demonstrated in the cell below.
import json
with open("cadenza_data_demo/cad1/task1/metadata/listeners.valid.json") as f:
listeners = json.load(f)
with open("cadenza_data_demo/cad1/task1/metadata/musdb18.valid.json", "r", encoding="utf-8") as file:
song_data = json.load(file)
Listeners#
The next cell shows two samples of listeners from the validation set. These are anonymized real audiograms.
audiogram_cfs
is a list of center frequencies in Hz.auidogram_levels_l
andaudiogram_levels_r
are lists of hearing thresholds in dB SPL for the left and right ear, respectively.name
is the listener’s id.
pprint(listeners)
{'L5040': {'audiogram_cfs': [250, 500, 1000, 2000, 3000, 4000, 6000, 8000],
'audiogram_levels_l': [30, 25, 25, 70, 80, 80, 80, 80],
'audiogram_levels_r': [20, 15, 20, 45, 55, 70, 80, 80],
'name': 'L5040'},
'L5076': {'audiogram_cfs': [250, 500, 1000, 2000, 3000, 4000, 6000, 8000],
'audiogram_levels_l': [15, 20, 30, 30, 45, 50, 60, 75],
'audiogram_levels_r': [15, 25, 30, 35, 40, 40, 60, 70],
'name': 'L5076'}}
Music#
The next cell shows one samples of music tracks from the validation set. This file contains general information about the track like the name, split and licence.
pprint(song_data)
[{'Genre': 'Pop/Rock',
'License': 'Restricted',
'Source': 'DSD',
'Split': 'valid',
'Track Name': 'Actions - One Minute Smile'}]
Let’s load a 10-second sample of the mixture signal from the first song in the validation set.
import pandas as pd
from IPython.display import Audio, display
from pathlib import Path
from scipy.io import wavfile
# Load song_data as pandas DataFrame
songs_valid = pd.DataFrame.from_dict(song_data)
split_directory = (
"test"
if songs_valid.loc[0, "Split"] == "test"
else "train"
)
sample_rate, mixture_signal = wavfile.read(
Path('cadenza_data_demo/cad1/task1/audio/musdb18hq')
/ split_directory
/ songs_valid.loc[0, "Track Name"]
/ "vocals.wav"
)
Audio(mixture_signal[:int(10 * sample_rate), :].T, rate=sample_rate)