Skip to content

Glossary

Term Meaning
SFT Simple fine-tuning — Whisper fine-tune on monolingual ASR data with no vocab swap
FFT Full fine-tuning — 3-stage pipeline with tokenizer replacement + multitask fine-tuning
VFT Vanilla fine-tuning — internal naming for what's called SFT in the paper
VFTMTL Vanilla fine-tuning + multi-task learning — SFT with text-MTL added
MTL Multitask learning — interleave ASR examples with text-only examples
Strategy C The FFT tokenizer-replacement + warm-start embedding init strategy
Whisper-zs Whisper-large-v3 zero-shot (the baseline)
cfg A/B/C/D LR-multiplier configurations for FFT — see Hyperparameter selection
ASR_RATIO (α) Fraction of batches that are ASR vs. text-MTL. Default α=0.5
Rep-trap A training failure mode where loss keeps dropping but WER explodes due to autoregressive token repetition
Combined-gate Save criterion that fires iff (WER improves OR loss improves) AND neither has drifted too far
Group A / B / C The three per-lang save-criterion buckets in the deployed code
frankenstein/ Original per-lang BPE tokenizers (broken Split — pre-Regex fix)
frankenstein_fix/ Per-lang BPE tokenizers after the Regex() fix
goldfish Multilingual text-pretraining corpus from Chang et al. used for text-MTL
FLEURS Few-shot Learning Evaluation of Universal Representations of Speech — 102-language test bench
CV / CV25 Mozilla CommonVoice v25, used for additional ASR training data
matrix.json Experiment manifest (list of all jobs to run)
matrix_status.json Per-job runtime state (queued, running, done, failed)
IID dataset Pre-combined FLEURS+CV training data, shuffled IID
tokfix The 102-lang retrain using the fixed tokenizers (job_id suffix _tokfix)