Research registry
Research registry
An open database of models, datasets, and tools for non-human language research.
The ARION Research Registry is under development.
When launched, it will be a browsable, searchable database where researchers can share and discover:
Language models trained with non-human phonetic data. Track architecture, size, human/animal dataset ratios, injection method, and generation lineage.
Audio recordings, video, phonetic transcriptions. Raw or model-prepared, with full provenance: which generation of model prepared the data, and which generation it's intended to feed.
Annotation pipelines, tokenizers, phonetic alphabets, evaluation frameworks.
How submission will work
Instead of filling out dropdown menus and categorization forms, you'll chat with an AI research clerk. Describe your model, dataset, or tool in natural language. The clerk will ask clarifying questions, categorize your submission, and validate it for completeness — no forms, no friction.
Preview — AI research clerk
Coming in v2 — powered by Cloudflare Workers
Seed entries
What the registry will look like
Dominica Sperm Whale Project Archive
Long-term acoustic dataset of sperm whale codas collected off the coast of Dominica. The foundational dataset for sperm whale communication research.
- Species
- Sperm whale (Physeter macrocephalus)
- Source
- Dominica Sperm Whale Project
- Data type
- audio
- Location
- Eastern Caribbean, Dominica
- Years
- 2005–present
WhAM (Whale Acoustics Model)
Transformer-based pipeline that automatically detects, segments, and annotates sperm whale codas using the phonetic alphabet. Runs on public datasets.
- Source
- Project CETI
- Architecture
- Transformer-based
- Function
- Automated coda detection, segmentation, and phonetic annotation
DolphinGemma
First generative model for dolphin vocalizations. Predicts and generates realistic whistles, clicks, and burst pulses. Trained on 40+ years of Atlantic spotted dolphin recordings.
- Species
- Atlantic spotted dolphin
- Source
- Google DeepMind + Georgia Tech + Wild Dolphin Project
- Architecture
- Gemma-based with SoundStream tokenization
Get notified at launch
Leave your email to be notified when the registry launches.