How Adults comprehend what Children are Saying

According to new research, adult listening abilities are critical to understanding children’s early linguistic efforts. When babies first start talking, their vocabulary is extremely limited. One of the first sounds they make is “da,” which could mean dad, a dog, a dot, or nothing at all.

How does an adult listener interpret such a limited verbal repertoire? A new study from MIT and Harvard University researchers discovered that adults’ understanding of conversational context and knowledge of common mispronunciations made by children are critical to understanding children’s early linguistic efforts.

The research team created computational models using thousands of hours of transcribed audio recordings of children and adults interacting, allowing them to begin to reverse engineer how adults interpret what small children are saying. Models based solely on the sounds children made in their speech performed relatively poorly in predicting what adults thought children said. The most successful models predicted based on large chunks of previous conversations that provided context for what the children were saying. When the models were retrained on large datasets of adults and children interacting, they performed even better.

According to the researchers, the findings suggest that adults are highly skilled at making these context-based interpretations, which may provide crucial feedback that helps babies acquire language.

An adult with lots of listening experience is bringing to bear extremely sophisticated mechanisms of language understanding, and that is clearly what underlies the ability to understand what young children say.
Roger Levy

“An adult with lots of listening experience is bringing to bear extremely sophisticated mechanisms of language understanding, and that is clearly what underlies the ability to understand what young children say,” says Roger Levy, a professor of brain and cognitive sciences at MIT. “At this point, we don’t have direct evidence that those mechanisms are directly facilitating the bootstrapping of language acquisition in young children, but I think it’s plausible to hypothesize that they are making the bootstrapping more effective and smoothing the path to successful language acquisition by children.”

Levy and Elika Bergelson, an associate professor of psychology at Harvard, are the senior authors of the study, which appears today in Nature Human Behavior. MIT postdoc Stephan Meylan is the lead author of the paper.

How adults understand what kids are saying

Adult listening skills are critical

While many studies have looked into how children learn to speak, the researchers in this project wanted to look at how adults interpret what children say.

“While people have looked historically at a number of features of the learner, and what is it about the child that allows them to learn things from the world, very little has been done to look at how they are understood and how that might influence the process of language acquisition,” Meylan said.

Previous research has shown that when adults speak to each other, they use their beliefs about how other people are likely to talk, and what they’re likely to talk about, to help them understand what their conversational partner is saying. This strategy, known as “noisy channel listening,” makes it easier for adults to handle the complex task of deciphering the acoustic sounds they’re hearing, especially in environments where voices are muffled or there is a lot of background noise, or when speakers have different accents.

In this study, the researchers explored whether adults can also apply this technique to parsing the often seemingly nonsensical utterances produced by children who are learning to talk.

“This problem of interpreting what we hear is even harder for child language than ordinary adult language understanding, which is actually not that easy either, even though we’re very good at it,” Levy says.

For this study, the researchers made use of datasets originally generated at Brown University in the early 2000s, which contain hundreds of hours of transcribed conversations between children ages 1 to 3 and their caregivers. The data include both phonetic transcriptions of the sounds produced by the children and the text of what the transcriber believed the child was trying to say.

Based on the phonetic transcription, the researchers used other datasets of child language (which included approximately 18 million spoken words) to train computational language models to predict what words the children were saying in the original dataset. They created many different models using neural networks, varying in the sophistication of their knowledge of conversational topics, grammar, and children’s mispronunciations. They also changed how much of the conversational context each model was allowed to analyze before predicting what the children said. Some models considered only one or two words spoken before the target word, whereas others were permitted to consider up to 20 previous utterances in the exchange.

The researchers found that using the acoustics of what the child said alone did not lead to models that were particularly accurate at predicting what adults thought children said. The models that did best used very rich representations of conversational topics, grammar, and beliefs about what words children are likely to say (ball, dog or baby, rather than mortgage, for example). And much like humans, the models’ predictions improved as they were allowed to consider larger chunks of previous exchanges for context.

A feedback system

The findings suggest that when listening to children, adults base their interpretation of what a child is saying on previous exchanges that they have had. For example, if a dog had been mentioned earlier in the conversation, “da” was more likely to be interpreted by an adult listener as “dog.”

This is an example of a strategy that humans often use in listening to other adults, which is to base their interpretation on “priors,” or expectations based on prior experience. The findings also suggest that when listening to children, adult listeners incorporate expectations of how children commonly mispronounce words, such as “weed” for “read.”

The researchers intend to investigate how adults’ listening skills and subsequent responses to children may aid children’s ability to learn language.

“Most people prefer to talk to others, and I think babies are no exception to this, especially if there are things that they might want, either in a tangible way, like milk or to be picked up, but also in an intangible way in terms of just the spotlight of social attention,” Bergelson said. “It’s a feedback system that might push the kid, with their burgeoning social skills and cognitive skills and everything else, to continue down this path of trying to interact and communicate.”

The researchers hope to study this interplay between child and adult by combining computational models of how children learn language with the new model of how adults respond to what children say.

“We now have this model of an adult listener that we can plug into models of child learners, and then those learners can leverage the feedback provided by the adult model,” says Meylan. “The next frontier is trying to understand how kids are taking the feedback that they get from these adults and build a model of what these children expect that an adult would understand.”