| SFT |
Simple fine-tuning — Whisper fine-tune on monolingual ASR data with no vocab swap |
| FFT |
Full fine-tuning — 3-stage pipeline with tokenizer replacement + multitask fine-tuning |
| VFT |
Vanilla fine-tuning — internal naming for what's called SFT in the paper |
| VFTMTL |
Vanilla fine-tuning + multi-task learning — SFT with text-MTL added |
| MTL |
Multitask learning — interleave ASR examples with text-only examples |
| Strategy C |
The FFT tokenizer-replacement + warm-start embedding init strategy |
| Whisper-zs |
Whisper-large-v3 zero-shot (the baseline) |
| cfg A/B/C/D |
LR-multiplier configurations for FFT — see Hyperparameter selection |
| ASR_RATIO (α) |
Fraction of batches that are ASR vs. text-MTL. Default α=0.5 |
| Rep-trap |
A training failure mode where loss keeps dropping but WER explodes due to autoregressive token repetition |
| Combined-gate |
Save criterion that fires iff (WER improves OR loss improves) AND neither has drifted too far |
| Group A / B / C |
The three per-lang save-criterion buckets in the deployed code |
| frankenstein/ |
Original per-lang BPE tokenizers (broken Split — pre-Regex fix) |
| frankenstein_fix/ |
Per-lang BPE tokenizers after the Regex() fix |
| goldfish |
Multilingual text-pretraining corpus from Chang et al. used for text-MTL |
| FLEURS |
Few-shot Learning Evaluation of Universal Representations of Speech — 102-language test bench |
| CV / CV25 |
Mozilla CommonVoice v25, used for additional ASR training data |
| matrix.json |
Experiment manifest (list of all jobs to run) |
| matrix_status.json |
Per-job runtime state (queued, running, done, failed) |
| IID dataset |
Pre-combined FLEURS+CV training data, shuffled IID |
| tokfix |
The 102-lang retrain using the fixed tokenizers (job_id suffix _tokfix) |