Detect clicks. Emit phonetic text.
Raw whale audio enters CETI's WhAM pipeline. The transformer-based system detects individual clicks, groups them into codas, and annotates each using the phonetic alphabet: rhythm, tempo, rubato, ornamentation, vowel-like spectral features.
Output: thousands of "whale sentences" in phonetic text form — already discrete, already symbolic, already ready for a tokenizer.
audio segmentation notation ──────── ───────────── ──────── ░▒▓▒░▓▒▒░▓▓ ──▶ [ · · · · ] ──▶ R4.reg ▒░░▒▓▒░▒▒░ ──▶ [ · · · ] ──▶ R3.irr ▓▒▒░▓▒░▓▒░ ──▶ [ · · · ] ──▶ R3.reg ░▒▓▒░▓▒░░▒ ──▶ [·· ·· · ] ──▶ R5.orn ↓ tok(R4.reg) · tok(T.fast) · tok(O.heavy)