Hyperparameter selection¶

Placeholder — write up the per-lang cfg selection rule

The four configs¶

cfg	Embed LR mult	Decoder LR mult	When used
A	3.0×	3.0×	Aggressive — for langs where Whisper has a weak prior
B	1.0×	1.0×	Conservative — for langs where Whisper is near-state-of-the-art
C	3.0×	1.0×	Embed-heavy — for langs where the vocab change is the main bottleneck
D	2.0×	2.0×	Middle — between A and B, for scaling matrix langs in the mid-WER range

For 102-lang scaling (post-17-lang pilot):

Per-lang base learning rate η for the 17 core langs is set from the Bayesian sweep (see paper Appendix B.2 Table 2).