Generating Intelligibility Scores

Generating Intelligibility Scores#

Before computing the intelligibility scores, we text normalised both the ground-truth and listeners’ transcriptions. For the ground truth, we saved the longest possible version you can obtain after the normalisation process. For the transcriptions, we save all possible versions obtained after normalisation. The scoring matches all the transcrition alternatives with the ground truth and returns the maximum score.

1. Text Normalisation#

We ran all prompts and responses through a text normalisation process to ensure consistency and improve the robustness of our scoring. This process includes the following steps (in this order):

Fix Misspellings: Common misspellings are corrected using a predefined dictionary to ensure that spelling errors do not negatively affect intelligibility scoring.
- I dont know if Ill go at 3 P M becomes I don't know if I'll go at 3 P M
Number to Text: All numeric values are converted to their textual representation to avoid mismatches between digits and words.
- I don't know if I'll go at 3 P M becomes I don't know if I'll go at three P M
Remove Punctuation: Punctuation marks are removed from the sequences to focus on the actual words and avoid misalignment issues during scoring.
- Why don't you write your own songs, invisible? becomes why don't you write your own songs invisible
Use Alternative Words: Phonetically similar or interchangeable words are replaced with all possible alternatives to account for variations in pronunciation and listener perception.
- cause your finally alone I guess you'll never know becomes cause you're finally alone i guess you'll never know
Expand Contractions: Contractions such as you're or I'll are expanded to their full forms (you are, I will) to standardise the text and improve comparison between ground truth and transcription.
- I don't know if I'll go at three P M becomes I do not know if I will go at three P M

1.1 Misspelling Corrections#

Both the original ground truth and the original transcriptions contain several misspellings. This is because the ground truths were generated by human annotators, the transcriptions were obtained through a listener panel, and misspellings can naturally occur when typing.

To correct these misspellings, we created a class that takes a CSV file containing a series of spelling corrections and processes all sequences, replacing incorrect spellings with the correct forms.

First, to identify misspellings in the dataset, we compared all words from the sequences against a pronunciation dictionary. Words not found in the dictionary were classified either as obvious misspellings (e.g. DIDNT → DIDN'T or AWNSER → ANSWER) or as words for which a pronunciation needed to be generated.

The class CheckSpellings takes a CSV file containing common misspellings and their corrections in its constructor. The method fix_misspellings accepts a string sequence and returns a corrected version of the text, replacing all words that match entries in the dictionary.

import re


class CheckSpellings:
    """
    Class to check and correct common misspellings in text based on a provided dictionary.
    """

    def __init__(self, spellings_file: str):
        """
        Constructor

        Args:
            spellings_file (str): Path to the CSV file with common misspellings.
        """
        self.spellings = {}
        with open(spellings_file, "r") as f:
            for line in f:
                parts = line.strip().split(",", 1)
                if len(parts) == 2:
                    self.spellings[parts[0].lower()] = parts[1].lower()
                else:
                    self.spellings[parts[0].lower()] = ""

    def fix_misspellings(self, text: str) -> str:
        """
        Fix common misspellings in the text based on a provided dictionary.
        Replace whole words, even when surrounded by punctuation.
        Args:
            text (str): Input text to be corrected.
        Returns:
            str: Corrected text.
        """

        def replace_word(match):
            word = match.group(0)
            return self.spellings.get(word.lower(), word)

        # Match word-like tokens including punctuation like (), [], etc.
        # This will match (something), 'hello', etc.
        pattern = r"[^\s]+"
        corrected_text = re.sub(pattern, replace_word, text)
        return corrected_text

Example for Correcting Spelling
In this example, we will correct the spelling of useless and wasn't. For this, let’s create a spelling_corrections.csv file with the corrections.

!echo -e "wasnt,wasn't\nusless,useless\ndont,don't" > spelling_corrections.csv
!cat spelling_corrections.csv

wasnt,wasn't
usless,useless
dont,don't

# Spellings object using ```spelling_corrections.csv``` file
spellings = CheckSpellings("spelling_corrections.csv")
# The input sequence
input_sequence = "correcting words usless and wasnt"
# generate version with corrected spelling
corrected_text = spellings.fix_misspellings(input_sequence)

from IPython.display import display, Markdown

display(Markdown("**Original form:**"))
display(Markdown(f"- `{input_sequence}`"))
display(Markdown("**Corrected form:**"))
display(Markdown(f"- `{corrected_text}`"))

Original form:

correcting words usless and wasnt

Corrected form:

correcting words useless and wasn't

1.2 Handling Contractions and Alternative Words#

As part of the normalisation process, contractions were expanded and words with possible alternatives were replaced.

Before computing intelligibility scores, we expanded all contractions (e.g. you're → you are). Additionally, we accounted for phonetically identical alternatives, including those arising from misspellings (e.g. your and you're).

For this purpose, we created a class that takes a CSV file containing word alternatives. The class uses these alternatives, together with an input string sequence, to generate a list of possible string variations.

The class AlternativeWords takes a CSV file (without a header) in its constructor and initializes a regex compiler.
The method make_sentence_forms accepts a string sequence and returns a list of all possible spelling alternatives.

from __future__ import annotations

import re
from collections import defaultdict
from itertools import product


class AlternativeWords:
    """
    Class to handle alternative spellings. The class takes a CSV file
    with two columns:

    - Column 1: word or phrase to be replaced
    - Column 2: alternative spelling or phrase

    The class can generate all possible sentence forms by replacing words/phrases
    with their alternatives.
    """

    def __init__(self, alternative_file: str):
        """Constructor

        Args:
            alternative_file (str): Path to the CSV file with alternative words.
        """
        self.alternative_dict = defaultdict(list)
        with open(alternative_file, "r") as f:
            for line in f:
                parts = [x.strip() for x in line.strip().split(",", 1)]
                if len(parts) == 1:
                    k, v = parts[0], ""
                else:
                    k, v = parts
                self.alternative_dict[k.lower()].append(v.lower())

        # Create regex pattern
        pattern = "|".join(
            rf"\b{re.escape(k)}\b" if "'" not in k else rf"(?<!\w){re.escape(k)}(?!\w)"
            for k in self.alternative_dict.keys()
        )

        self.contra_re = re.compile(f"({pattern})", re.IGNORECASE)

    def make_sentence_forms(self, sentence: str | list) -> list[str]:
        """ Generate all possible forms of a sentence by expanding using alternatives.

        Args:
            sentence (str or list): Input sentence or list of sentences.
        Returns:
            list: List of all possible sentence forms.
        """
        if isinstance(sentence, str):
            sentence = [sentence]

        APOST = r"['\u2019]"
        token_re = re.compile(
            rf"[a-z]+(?:{APOST}[a-z]+)*(?:{APOST})?|{APOST}[a-z]+|[^\w\s]",
            re.IGNORECASE,
        )

        sentence_forms = []
        for s in sentence:
            parts = token_re.findall(s.lower())

            # For each part, list all possible variants
            options = [
                self.alternative_dict[p] + [p] if p in self.alternative_dict else [p]
                for p in parts
            ]

            sentence_forms += [" ".join(s).strip() for s in product(*options)]
        return list(set(sentence_forms))

Example for Contractions
In the following example, we expand the contractions in the sequence I don't know if I'll go to the party and generate all possible alternatives. For this, we need to create a contraction.csv file with a list of possible contractions.

!echo -e "don't,do not\nI'll,I will\nthey're,they are\nyou're,you are" > contraction.csv
!cat contraction.csv

don't,do not
I'll,I will
they're,they are
you're,you are

Example for Alternative words

!echo -e "your,you're\nto,too" > alternative.csv
!cat alternative.csv

your,you're
to,too

# Contractiona object using ```contractions.csv``` file
contractions = AlternativeWords("contraction.csv")
# The input sequence
input_sequence = "I don't know if I'll go to the party"
# generate alternatives by expanding contractions
alternative_forms = contractions.make_sentence_forms(input_sequence)

from IPython.display import display, Markdown

display(Markdown("**Original form:**"))
display(Markdown(f"- `{input_sequence}`"))

# Alternatives
alt_md = "**Possible forms:**\n"
alt_md += "\n".join(f"{i}. `{alt}`" for i, alt in enumerate(alternative_forms, 1))

display(Markdown(alt_md))

Original form:

I don't know if I'll go to the party

Possible forms:

i do not know if i will go to the party
i don't know if i'll go to the party
i don't know if i will go to the party
i do not know if i'll go to the party

1.3 Other Support Classes#

Numbers to Text NormalizeNumbers
To standardize numeric values in the text sequences, we created a Jiwer transform by implementing the AbstractTransform class. This transform converts all numbers—both integers and decimals – into their corresponding words, ensuring consistency when comparing transcriptions with ground truth.

How it works
- Decimals: Numbers with a decimal point (e.g. 6.4) are split into the whole and fractional parts, converted to words, and joined with the word point (e.g. 6.4 → six point four).
- Integers: Remaining whole numbers are converted to words (e.g. 42 → forty two).

This transform ensures that numeric sequences are consistently represented in words, which improves alignment and intelligibility scoring.

Remove Punctuation MyRemovePunctuation

To standardize text sequences, we created a Jiwer transform by implementing jiwer.AbstractTransform to remove punctuation. This allows more control than the default Jiwer punctuation removal.

How it works You provide a string of punctuation symbols to remove (e.g. !@#$%&*()_+-=[]{};:”,.<>?/). The transform uses a regular expression to remove all occurrences of these symbols from the text. This step ensures that punctuation does not interfere with word alignment or intelligibility scoring.

1.4 Class Text Cleaner#

To normalize and clean string sequences, we implemented a TextCleaner class that executes each preprocessing step in sequence. This class combines the transforms for numbers, punctuation, contractions, alternative words, and spellings into a single workflow.

Constructor `init`#

Explanation

NormalizeNumbers converts numeric values to text.
MyRemovePunctuation removes specified punctuation symbols.
AlternativeWords handles phonetic or spelling alternatives.
CheckSpellings corrects common misspellings.
Contractions, spellings, and alternative words are optional and loaded only if the respective CSV files are provided.

Call Method `call`#

How it works

Spellings: Corrects misspellings before and after punctuation removal to catch errors that may appear after transformations.
Numbers to Text: Standardizes numeric sequences.
Punctuation Removal: Eliminates characters that could interfere with word alignment.
Alternative Words: Generates variations for phonetically or contextually interchangeable words.
Contractions: Expands contractions to their full forms.
Output Sorting: Optionally sorts the resulting sequences by length to prioritize longer variants.

This class depends on the Jiwer module to implement normalisation transforms.

!pip install jiwer

Requirement already satisfied: jiwer in /home/gerardoroadabike/anaconda3/envs/clarity39/lib/python3.10/site-packages (3.0.4)
Requirement already satisfied: click<9.0.0,>=8.1.3 in /home/gerardoroadabike/anaconda3/envs/clarity39/lib/python3.10/site-packages (from jiwer) (8.1.7)
Requirement already satisfied: rapidfuzz<4,>=3 in /home/gerardoroadabike/anaconda3/envs/clarity39/lib/python3.10/site-packages (from jiwer) (3.12.1)

import inflect
import jiwer
import re

p = inflect.engine()

class NormalizeNumbers(jiwer.AbstractTransform):
    def process_string(self, s: str):
        def replace_decimal(match):
            num_str = match.group(0)
            whole, decimal = num_str.split(".")
            whole_spoken = p.number_to_words(whole, andword="").replace("-", " ")
            decimal_spoken = " ".join(p.number_to_words(d) for d in decimal)
            return f"{whole_spoken} point {decimal_spoken}"

        def replace_integer(match):
            num_str = match.group(0)
            return p.number_to_words(num_str, andword="").replace("-", " ")

        # Replace decimals first (to avoid partial integer matches)
        text = re.sub(r"\d+\.\d+", replace_decimal, s)

        # Then replace remaining integers
        text = re.sub(r"\b\d+\b", replace_integer, text)
        return text.lower()

class MyRemovePunctuation(jiwer.AbstractTransform):
    """Replacement fo%pip listr jiwer's remove punctuation that allows more control."""

    def __init__(self, symbols):
        self.substitutions = f"[{symbols}]"

    def process_string(self, s):
        return re.sub(self.substitutions, "", s)

class TextCleaner:
    def __init__(self, contractions_file: str = None, spellings_file: str = None, alternative_words_file: str = None):
        """
        Initialize the TextCleaner with optional files for contractions and spellings.

        Args:
            contractions_file (str): Path to the contractions file.
            spellings_file (str): Path to the spellings file.
            alternative_words_file (str): Path to the alternative words file (not used in this implementation).

        """
        self.numbers = NormalizeNumbers()
        self.punctuation = MyRemovePunctuation(";!*#,.′’‘_()")

        self.contraction = None
        if contractions_file:
            self.contraction = AlternativeWords(contractions_file)

        self.spelling = None
        if spellings_file:
            self.spelling = CheckSpellings(spellings_file)

        self.alternative_words = None
        if alternative_words_file:
            self.alternative_words = AlternativeWords(alternative_words_file)

    def __call__(self, text: str, descending: bool = False) -> list[str]:
        """
        Clean the text by removing punctuation and applying contractions.
        """

        # Correct spellings
        if self.spelling:
            text = self.spelling.fix_misspellings(text)

        # Number to text
        text = self.numbers.process_string(text)

        # Remove punctuation
        text = self.punctuation.process_string(text)

        # Correct spellings again to recover some errors with contractions
        if self.spelling:
            text = self.spelling.fix_misspellings(text)

        # Handle alternative words
        if self.alternative_words:
            text = self.alternative_words.make_sentence_forms(text)

        # Expand Contractions
        if self.contraction:
            text = self.contraction.make_sentence_forms(text)
        else:
            text = [text]

        # return sorted
        return sorted(text, key=lambda s: len(s.split()), reverse=descending)

Cleaning Ground Truth#

When processing the ground truth transcription, we assume that the words are correct. Therefore, no multiple versions with alternative words are generated. We apply the TextCleaner class to normalize the text.

After the transformations, we obtain a single, cleaned version of the ground truth sequence suitable for comparison with listener transcriptions.

from IPython.display import display, Markdown

# Initialize the TextCleaner with contractions and spelling corrections
prompt_cleaner = TextCleaner(
    contractions_file='contraction.csv',
    spellings_file='spelling_corrections.csv'
)

# Example ground truth sequence
input_sequence = "and you don't know all they're saying"

# Clean the sequence (ground truth assumed correct, so only one version is taken)
output_sequence = prompt_cleaner(input_sequence, descending=True)[0]

# Display results neatly in Jupyter
display(Markdown(f"**Original sequence:**\n- `{input_sequence}`"))
display(Markdown(f"**Cleaned sequence:**\n- `{output_sequence}`"))

Original sequence:

and you don't know all they're saying

Cleaned sequence:

and you do not know all they are saying

Cleaning Transcriptions#

For the transcription, we search for all possible alternatives from contractions and spelling corrections. Note that in this case, we save all possible alternatives, even those were the contractions are not expanded. This is to ensure that we are excluding some border cases.

from IPython.display import display, Markdown

# Initialize the TextCleaner
response_cleaner = TextCleaner(
    contractions_file='contraction.csv',
    spellings_file='spelling_corrections.csv'
)

# Example transcription sequence
input_sequence = "and you don't know all they're saying"

# Generate cleaned sequences (may contain multiple alternatives)
output_sequences = response_cleaner(input_sequence)

# Display results neatly in Jupyter
display(Markdown(f"**Original sequence:**\n- `{input_sequence}`"))
display(Markdown("**Cleaned sequences:**"))

for seq in output_sequences:
    display(Markdown(f"- `{seq}`"))

Original sequence:

and you don't know all they're saying

Cleaned sequences:

and you don't know all they're saying

and you do not know all they're saying

and you don't know all they are saying

and you do not know all they are saying

2. Generating Intelligibility Scores#

To compute intelligibility scores, we assume that both the ground truth listener transcription are already normalized using the TextCleaner. This ensures that punctuation, numbers, contractions, and spelling inconsistencies do not affect the scoring.

The scoring process uses a Levenshtein alignment (edit distance) to determine how many words in the hypothesis match the reference. It also incorporates phonemic representations when a pronunciation dictionary is provided, allowing for more robust scoring against homophones or alternative spellings.

Step-by-Step Explanation#

Initialization (__init__)
- self.transformation defines a series of text preprocessing steps using Jiwer:
  - Remove nonwords (Kaldi-style).
  - Strip leading/trailing spaces.
  - Convert to uppercase for case-insensitive matching.
  - Remove multiple spaces and reduce whitespace.
  - Reduce to a single sentence.
- self.transformation_towords converts sentences to lists of words, which is required for the Levenshtein alignment.
- pron_dict is an optional pronunciation dictionary that allows generating phonetic variants of words.
Prepare Word Sequences (get_word_sequence)
- Applies the main transformation pipeline to a sentence.
- Returns a cleaned sequence of words ready for scoring.
Scoring (score)
- Accepts ref (reference) and hyp (hypothesis), which can each be a string or a list of strings.
- Normalizes each string using the transformation pipeline.
- If a pronunciation dictionary is provided:
  - Generates phonemic alternatives for both reference and hypothesis.
  - Keeps track of the original hypothesis corresponding to each phonemic form.
- Creates all possible pairs between hypothesis forms and reference forms.
Compute Alignment and Hits
- For each pair of reference and hypothesis sequences, the function uses process_words (based on Jiwer) to compute the number of hits (correctly matched words).
- Chooses the pair with the maximum number of hits as the best alignment.
- Optionally prints a visual alignment of the best hypothesis against the reference.
Return Metrics
- prompt: the original reference (lowercased)
- response: the best hypothesis matched to the reference (lowercased)
- total_words: number of words in the reference
- hits: number of correctly matched words
- correctness: proportion of correctly matched words (hits / total_words)

This class depends on the Jiwer module to perform sequence alignment using the Levenshtein algorithm and to compute the number of matches.

Note
The Levenshtein alignment attempts to minimize the Levenshtein distance between two sequences. This means that in some cases—where the ground truth and the transcription share a matching word—the alignment may result in zero hits because it can prefer substitutions over insertions.

Example
In the following example, both the ground truth and the transcription share the word nuclear. Although the ground truth and transcription share the word nuclear, the Levenshtein alignment may yield zero hits. This is because the algorithm minimizes the total edit distance rather than maximizing matches. Since substitutions cost less than a combination of insertions and deletions, the alignment prefers to substitute mismatched words one-to-one instead of skipping words to align the common “nuclear”. As a result, the shared word is not aligned, and no matches are recorded.

import jiwer 

ground_truth = "now a nuclear animal rules"
transcription = "nuclear ay man you"

output = jiwer.process_words(ground_truth, transcription)
print(jiwer.visualize_alignment(output))

sentence 1
REF:     now  a nuclear animal rules
HYP: nuclear ay     man    you *****
           S  S       S      S     D

number of sentences: 1
substitutions=4 deletions=1 insertions=0 hits=0

mer=100.00%
wil=100.00%
wip=0.00%
wer=100.00%

Now, let’s check the SentenceScorer class

import jiwer
import logging
import inflect

from itertools import product
from jiwer import process_words

p = inflect.engine()
logger = logging.getLogger(__name__)


class SentenceScorer:
    def __init__(self, pron_dict=None):
        self.transformation = jiwer.Compose(
            [
                jiwer.RemoveKaldiNonWords(),
                jiwer.Strip(),
                jiwer.ToUpperCase(),
                jiwer.RemoveMultipleSpaces(),
                jiwer.RemoveWhiteSpace(replace_by_space=True),
                jiwer.ReduceToSingleSentence(),
            ]
        )
        self.transformation_towords = jiwer.Compose(
            [jiwer.ReduceToListOfListOfWords(word_delimiter=" ")]
        )

        self.pron_dict = pron_dict

    def get_word_sequence(self, sentence):
        return self.transformation(sentence)

    def lcs_length(self, ref, hyp):
        ref = ref.lower().split()
        hyp = hyp.lower().split()
        n, m = len(ref), len(hyp)
        dp = [[0] * (m + 1) for _ in range(n + 1)]

        for i in range(n):
            for j in range(m):
                if ref[i] == hyp[j]:
                    dp[i + 1][j + 1] = dp[i][j] + 1
                else:
                    dp[i + 1][j + 1] = max(dp[i][j + 1], dp[i + 1][j])
        return dp[n][m]

    def score(self, ref, hyp, show_alignment=False):
        if isinstance(ref, str):
            _ref_form = [ref]
        else:
            _ref_form = ref.copy()

        if isinstance(hyp, str):
            _hyp_forms = [hyp]
        else:
            _hyp_forms = hyp.copy()

        hyp_forms, ref_forms = [], []
        hyp_match_pron_ori= {}
        for hyp_form in _hyp_forms:
            hyp_form = self.transformation(hyp_form)
            hyp_forms.extend(self.pron_dict.get_pronunciations(hyp_form, ref=False))
            for pron in hyp_forms:
                hyp_match_pron_ori[pron] = hyp_form

        for ref_form in _ref_form:
            ref_form = self.transformation(ref_form)
            ref_forms.extend(self.pron_dict.get_pronunciations(ref_form, ref=True))

        alternatives = [(x, y) for x, y in product(hyp_forms, ref_forms)]
        
        measures = [
            process_words(
                ref,
                hyp,
                reference_transform=self.transformation_towords,
                hypothesis_transform=self.transformation_towords,
            )
            for hyp, ref in alternatives
        ]

        hits = [m.hits for m in measures]
        best_index = hits.index(max(hits))
        
        if show_alignment:
            print("Alignment:")
            print(jiwer.visualize_alignment(measures[best_index], show_measures=False))



        return {
            "prompt": ref.lower(),
            "response": hyp_match_pron_ori[alternatives[best_index][0]].lower(),
            "total_words": len(measures[best_index].references[0]),
            "hits": measures[best_index].hits,
            "correctness": measures[best_index].hits
            / len(measures[best_index].references[0]),
        }

from IPython.display import display, Markdown

2.1 Normalise Sequences#

The process starts with the normalisation of both reference and hypothesis sequences.

ref = "I don't know if I'll go to the party"
hyp = "I dont know if I will to the party"

# Normalise sequences
cleaner = TextCleaner(
    contractions_file='contraction.csv',
    spellings_file='spelling_corrections.csv',
    alternative_words_file='alternative.csv'
)
ref = cleaner(ref, descending=True)[0]
hyp = cleaner(hyp)

display(Markdown(f"**Reference:**\n- `{ref}`"))
display(Markdown("**Hypothesis:**"))
for h in hyp:
    display(Markdown(f"- `{h}`"))

Reference:

i do not know if i will go to the party

Hypothesis:

i don't know if i will to the party

i don't know if i will too the party

i do not know if i will too the party

i do not know if i will to the party

2.2 Each sequence is transformed to its phonemic form.#

For this, we implemented a class that takes a pronunciation dictionay as input.

from __future__ import annotations

from collections import defaultdict
from itertools import product
from pathlib import Path
import re
import logging

logger = logging.getLogger(__name__)


class PronDictionary:
    def __init__(self, filename: str | Path | list | None = None):
        if isinstance(filename, str):
            filename = [Path(filename)]
        elif isinstance(filename, Path):
            filename = [filename]
        elif filename is None:
            filename = []

        self.pron_dict = defaultdict(set)
        for file in filename:
            self.load_from_file(file)

    def add_dict(self, filename):
        self.load_from_file(filename)

    def load_from_file(self, filename: str | Path):
        """Load pronunciations from a file."""
        if isinstance(filename, str):
            filename = Path(filename)
        if not filename.exists():
            raise FileNotFoundError(f"File {filename} does not exist.")

        with open(filename, "r") as file:
            for line in file:
                if line.strip() == "" or line.startswith("#"):
                    continue
                word, *phones = line.strip().split()
                pron = "-".join(phones)
                self.add_pronunciation(word.upper(), pron)

    def add_pronunciation(self, word, pronunciation):
        """Add a word and its pronunciation to the dictionary."""
        self.pron_dict[word].add(pronunciation)

    def lookup(self, word):
        """Get the pronunciation of a word."""
        word = word.upper()
        if self.pron_dict.get(word, None):
            return list(self.pron_dict[word])
        else:
            # Optionally return the word itself as a fallback
            self.add_pronunciation(word, f"<{word}>")

        return [f"<{word}>"]

    def get_pronunciations(self, phrase: str, sep=" ", ref=True):
        """Return all pronunciations in the dictionary."""
        words = phrase.upper().split()

        # Get list of pronunciation options for each word
        options = []
        for word in words:
            if self.lookup(word):
                if not ref:
                    options.append(self.lookup(word))
                else:
                    # If not all, just take the first pronunciation
                    options.append([self.lookup(word)[0]])
            else:
                # Optionally use the word itself as fallback
                options.append([f"<{word}>"])

        # Cartesian product of all word pronunciations
        combinations = product(*options)

        # Join each combination into a transcription string
        return [sep.join(comb) for comb in combinations]

We used the British English Pronunciation Dictionary BEEP

!wget https://openslr.elda.org/resources/14/beep.tar.gz

--2025-10-24 09:08:26--  https://openslr.elda.org/resources/14/beep.tar.gz
Resolving openslr.elda.org (openslr.elda.org)... 141.94.109.138, 2001:41d0:203:ad8a::
Connecting to openslr.elda.org (openslr.elda.org)|141.94.109.138|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2740236 (2.6M) [application/x-gzip]
Saving to: ‘beep.tar.gz.12’

beep.tar.gz.12      100%[===================>]   2.61M  --.-KB/s    in 0.09s   

2025-10-24 09:08:27 (30.3 MB/s) - ‘beep.tar.gz.12’ saved [2740236/2740236]

!tar -xvzf beep.tar.gz

beep/
beep/README
beep/ACKNOWLEDGEMENTS
beep/addparan
beep/lexicode.doc
beep/phoncode.doc
beep/phone45.tab
beep/beep-1.0
beep/sayTimit.doc
beep/sayTimit.pl
beep/ANNOUNCE-1.0
beep/case.txt

pron_dict = PronDictionary('beep/beep-1.0')
scorer = SentenceScorer(pron_dict=pron_dict)

score = scorer.score(ref, hyp, show_alignment=True)

from IPython.display import display, Markdown
display(Markdown(f"**Reference:** `{score['prompt']}`"))
display(Markdown(f"**Hypothesis:** `{score['response']}`"))
display(Markdown(f"**Total words:** {score['total_words']}"))
display(Markdown(f"**Hits:** {score['hits']}"))
display(Markdown(f"**Score:** {score['correctness']:.2f}"))

Alignment:
sentence 1
REF: ay d-uw n-oh-t n-ow ih-f ay w-ih-l g-ow t-ax dh-iy p-aa-t-iy
HYP: ay d-uw n-oh-t n-ow ih-f ay w-ih-l **** t-ax dh-iy p-aa-t-iy
                                           D                     

Reference: i do not know if i will go to the party

Hypothesis: i do not know if i will to the party

Total words: 11

Hits: 10

Score: 0.91

In this last example, the final intelligibility score is 0.91 (10 out of 11 matched words)