AI is helping to decode animals’ speech. Will it also let us talk with them?

Deep in the rainforests of the Democratic Republic of the Congo, Mélissa Berthet found bonobos doing something thought to be uniquely human.

During the six months that Berthet observed the primates, they combined calls in several ways to make complex phrases1. In one example, bonobos (Pan paniscus) that were building nests together added a yelp, meaning ‘let’s do this’, to a grunt that says ‘look at me’. “It’s really a way to say: ‘Look at what I’m doing, and let’s do this all together’,” says Berthet, who studies primates and linguistics at the University of Rennes, France.

In another case, a peep that means ‘I would like to do this’ was followed by a whistle signalling ‘let’s stay together’. The bonobos combine the two calls in sensitive social contexts, says Berthet. “I think it’s to bring peace.”

The study, reported in April, is one of several examples from the past few years that highlight just how sophisticated vocal communication in non-human animals can be. In some species of primate, whale2 and bird, researchers have identified features and patterns of vocalization that have long been considered defining characteristics of human language. These results challenge ideas about what makes human language special — and even how ‘language’ should be defined.

Perhaps unsurprisingly, many scientists turn to artificial intelligence (AI) tools to speed up the detection and interpretation of animal sounds, and to probe aspects of communication that human listeners might miss. “It’s doing something that just wasn’t possible through traditional means,” says David Robinson, an AI researcher at the Earth Species Project, a non-profit organization based in Berkeley, California, that is developing AI systems to decode communication across the animal kingdom.

As the research advances, there is increasing interest in using AI tools not only to listen in on animal speech, but also to potentially talk back.

Combining calls

Researchers studying animal communication ask some of the same types of question that linguists do. How are speech sounds physically produced (phonetics)? How are sounds combined to make meaningful units (morphology)? What rules determine how phrases and sentences are structured (syntax)?

Until about a decade ago, researchers thought that only humans used a feature known in linguistics as compositionality. This is the combining of meaningful words, calls or other noises into expressions that have a meaning derived from those of their parts.

But in 2016, a study of Japanese tits (Parus minor) changed how scientists thought about compositionality. The birds looked for predators when they heard an ‘alert’ call and approached a sound’s source after hearing a ‘recruitment’ call. When they heard the calls in that order, they performed both behaviours3. But they didn’t do so when the order was reversed, suggesting compositionality: the combination of calls had its own meaning.

A study in 2023 extended that work. By presenting chimpanzees (Pan troglodytes) with fake snakes in the wild, scientists showed that the primates similarly combine ‘alarm’ and ‘recruitment’ vocalizations into a message that prompts others to gather around the caller to respond to a threat4.

However, humans remained the only species known to use compositionality in more than one way. For instance, by ordering words differently to change the meaning of the phrase, adding endings to words to modify meaning and creating metaphors and idioms to produce a figurative expression.

Three bonobos moving down on a fallen tree branch in a forest and vocalizing.

Bonobos in the Democratic Republic of the Congo combine calls into phrases in several ways.Credit: Christian Ziegler/Nature Picture Library

But the study by Berthet and her colleagues softened that distinction between humans and other animals. They recorded 700 calls by 30 adult bonobos and found that the animals combined a finite number of calls in four ways1. One — a yelp–grunt combination — the authors considered to have ‘trivial’ compositionality, because the meaning of the individual calls had merely been combined. (For instance, ‘the red car’ describes an object that is both red and a car.) In the three other cases, one call modified the other, resulting in ‘non-trivial’ compositionality. (‘A terrible actor’ describes a person who is bad at acting, not someone who is terrible and an actor.)

Evolutionary biologist Cédric Girard-Buttoz at the Lyon Neuroscience Research Center, France, and his colleagues reported in May that chimpanzees also combine a finite number of calls in several ways5. For some vocalizations, the meaning of the combined phrase can’t be determined from the meaning of the individual calls, as is the case for some idioms in human languages. For example, a hoot, used when resting on the ground, followed by a pant, which signifies playing and affiliation, prompted the chimpanzees to climb a tree, make a nest and rest together, even though neither call is typically associated with tree climbing, says Girard-Buttoz. Generating meaning in several ways is a building block of language, he adds.

Whales, too, have some notable features of human language. Researchers at Project CETI, a non-profit organization in New York City, have been tracking and recording sperm whales (Physeter macrocephalus) off the coast of the Caribbean island of Dominica to compile a large data set of movements and sounds. By finding patterns that link whale sounds and behaviours, the scientists hope to translate ‘whale speak’.

CETI linguist Gašper Beguš has been training generative-AI models to produce sounds and sequences of sounds that mimic those made by sperm whales. Whereas humans create distinct sounds by sending air through vocal folds in the throat, which vibrate at different frequencies, these whales send air through a lip-like structure in their nasal passage, which vibrates and creates clicks. The clicks are grouped into units called codas.

A drone hovering over the back of a sperm whale as it breaks the surface of the water

Scientists attach sensors that can gather bioacoustic and other data to sperm whales using drones.Credit: Jaime Rojo

CETI scientists reported last year that sperm whales have their own ‘phonetic alphabet’, with codas varying in characteristics such as tempo and rhythm6. Beguš and his colleagues have since found that whale codas can differ in ways analogous to vowels and diphthongs in human language. Vowels in human speech differ on the basis of the tongue’s position and the shape of the lips, such as for the ‘ee’ in cheese versus the ‘o’ in hot. Diphthongs, or gliding vowels, are created by combining two vowels in a single syllable, such as in ‘pout’, resulting in a frequency change as the lips and tongue move.

Beguš’s team identified two codas with distinct sound patterns that the researchers called an a-vowel and i-vowel. They also found that these vowels changed frequency in four ways: they can rise, they can fall, they can fall then rise or they can rise then fall7. The frequency changes could be indicative of diphthongs.

What’s in a language

Whether the sophistication of animal communication is enough to qualify it as language depends on how a person defines the term and what they think about how animals think. There are two prevailing views, Beguš says. “One world view says that language and complex thought are intrinsically connected.” According to this view, complex thought came first and language is a way to externalize thoughts. If so, animals can’t have a language unless they are capable of complex thought.

The other view holds that language is just one kind of communication, like gestures or facial expressions, and complex thought isn’t required. In this case, animals could have a language with or without complex thought. Experiments that train animals to communicate with humans, such as those with the bonobo Kanzi, who died earlier this year, have hinted that animals might be capable of having a language. But that’s a different question from whether they use language on their own in the wild.

“The word is still out on whether we’ll find a full-on language,” says Robinson.

For one, some aspects of human language haven’t been found in other species yet. Three of the 16 features — displacement, productivity and duality — on a language checklist created by linguist Charles Hockett haven’t been identified in non-human animals.

Displacement is the ability to talk about abstract concepts, such as the past, the future or things that are distant. This feature hasn’t been seen convincingly in animal communication, although there is anecdotal evidence in some instances, such as dolphins calling the names of other dolphins that had disappeared years ago, and orangutans (Pongo spp.) telling others about a predator that was previously in an area, Berthet says.

Productivity is the ability to say things that have never been said or heard before, and be understood by another individual.

And duality describes meaningful messages made up of smaller meaningful units, which consist of even smaller, meaningless sounds. Although whales use clicks to create longer codas, scientists haven’t yet shown that clicks are meaningless and codas are meaningful.

Recursion is another feature that might be unique to human language. This is when sentences or phrases are embedded in each other to create deeper levels of meaning. By training crows (Corvus corone) to peck at open and closed brackets in the appropriate sequence on a touch screen, Diana Liao, who studies vocal communication and cognition at the University of Tübingen in Germany, and her colleagues found evidence that the animals are mentally capable of recursion8. “They do this even better than macaque monkeys and on par with human toddlers”, Liao says. However, it’s not clear whether crows use it in their communication.

It’s also unclear whether animals have grammatical rules that define how vocal communication is structured. And, although primates have been shown to mix and match calls to generate meaning, the number of meanings that they can produce is “really far from what humans can do”, says Girard-Buttoz.

Continue Reading