ARION proposes injecting tokenized cetacean phonetic data into frontier language models — using the same cross-lingual alignment that already lets models understand 100+ human languages.
Current AI models for animal communication — DolphinGemma, WhAM, and others — have made remarkable progress discovering patterns within species-specific vocalizations. They can predict next sounds, cluster behaviors, and even generate realistic calls.
But they remain siloed. They never share an embedding space with the rich conceptual world encoded in human text. Without that shared foundation, they cannot describe what a whale coda means in English, nor map it to human-understandable ideas like cooperation, kinship, or navigation.
SILOED audio ──▶ clusters ──▶ ∅ (dead end) SHARED phonetic text ┐ ├──▶ shared model ──▶ explanations human text ┘ in both domains
Project CETI has already cracked the first step: a phonetic alphabet for sperm whale codas. Each coda is encoded as a structured string capturing rhythm, tempo, rubato, ornamentation, and vowel-like qualities.
This is text. The same modality as English, Mandarin, Python code, or ancient Sumerian. A standard tokenizer can process it. A frontier model can train on it. No cross-modal adapters. No architectural changes.
In 625 BCE, the poet Arion sang on the deck of a ship. Dolphins gathered, drawn by his music. When he leapt into the sea, a dolphin carried him to safety.
Acoustic signal. Cross-species understanding. A shared modality bridging two worlds. The story we're trying to write with AI is 2,600 years old.