Running a single FFT job¶

How to run one full fine-tune end-to-end without the dispatcher / matrix infrastructure.

Direct invocation¶

python train_fft.py \
  --lang spanish \
  --fleurs-code es_419 \
  --whisper-lang spanish \
  --config-id B \
  --seed 1337 \
  --asr-ratio 0.5 \
  --max-text-lines 500000 \
  --text-mtl-path /path/to/text_pretraining/goldfish/spa_latn.txt \
  --output-dir runs/spanish-cfgB-s1337 \
  --init-strategy C

What each flag means:

Flag	Meaning
`--lang`	Lang name (must match a `train_strategy_c_<lang>.py` template)
`--fleurs-code`	FLEURS dataset code (e.g. `es_419`)
`--whisper-lang`	Whisper's language token name (e.g. `spanish`)
`--config-id`	Which LR config to use (A=aggressive, B=conservative, C=embed-heavy, D=middle)
`--seed`	Random seed
`--asr-ratio`	Fraction of batches that are ASR (vs. text-MTL)
`--max-text-lines`	Cap on text-MTL data
`--text-mtl-path`	Path to per-lang goldfish corpus
`--output-dir`	Where to save best/ and latest/ checkpoints
`--init-strategy`	`C` for Strategy C (warm-start embedding init)

What happens under the hood¶

train_fft.py reads the per-lang template at _pod_share/jobs/train_strategy_c_<lang>.py, patches the LR mults / seed / output dir / etc. into it, writes the patched script to run_state/fft_scripts/, then exec's it. See Pipeline → Per-job script generation for the patching details.

Expected runtime¶

A100 80 GB: ~3–4 hours for cfg B (24 grad_accum), ~2.5 hours for cfg A (16 grad_accum)
H100 80 GB: ~1.5× faster than A100

Outputs¶

After training:

runs/spanish-cfgB-s1337/
├── best/
│   ├── checkpoint.pt         # ~6 GB FP32 state dict
│   └── training_config.json  # the resolved cfg + best metrics
└── latest/
    └── checkpoint.pt         # periodic backup (every 1000 steps)

Test eval¶

After training, evaluate on FLEURS-test + CV25-test:

python eval_strategy_c_test_combined.py \
  --lang spanish \
  --fleurs es_419 \
  --whisper-lang spanish \
  --cv-code es \
  --ckpt-path runs/spanish-cfgB-s1337/best/checkpoint.pt \
  --results-path results/test_spanish_cfgB_s1337.json

Output test_spanish_cfgB_s1337.json will have fleurs_test, cv25_test, and combined blocks with raw + normalized WER/CER.