Skip to main content

What if the next frontier model could begin to understand whales?

ARION proposes injecting tokenized cetacean phonetic data into frontier language models — using the same cross-lingual alignment that already lets models understand 100+ human languages.

Specialized models hit a wall

Current AI models for animal communication — DolphinGemma, WhAM, and others — have made remarkable progress discovering patterns within species-specific vocalizations. They can predict next sounds, cluster behaviors, and even generate realistic calls.

But they remain siloed. They never share an embedding space with the rich conceptual world encoded in human text. Without that shared foundation, they cannot describe what a whale coda means in English, nor map it to human-understandable ideas like cooperation, kinship, or navigation.

Phonetic text is the bridge

Project CETI has already cracked the first step: a phonetic alphabet for sperm whale codas. Each coda is encoded as a structured string capturing rhythm, tempo, rubato, ornamentation, and vowel-like qualities.

This is text. The same modality as English, Mandarin, Python code, or ancient Sumerian. A standard tokenizer can process it. A frontier model can train on it. No cross-modal adapters. No architectural changes.

Example coda notation

R4.reg T.fast O.heavy RB.rise V.a

Rhythm · Tempo · Ornamentation · Rubato · Vowel quality

Named for the poet saved by dolphins

In 625 BCE, the poet Arion sang on the deck of a ship. Dolphins gathered, drawn by his music. When he leapt into the sea, a dolphin carried him to safety.

Acoustic signal. Cross-species understanding. A shared modality bridging two worlds. The story we're trying to write with AI is 2,600 years old.

Read the full story →

Built on published research from

  • Project CETI (MIT CSAIL, UC Berkeley)
  • Google DeepMind (DolphinGemma)
  • Wild Dolphin Project
  • Dominica Sperm Whale Project
  • Published in Nature Communications, Open Mind, ACL

The next frontier run is coming. Include the data.