Technology

A New Study reveals that AI-generated Empathy has its Limitations

A New Study reveals that AI-generated Empathy has its Limitations

Conversational agents (CAs) like Alexa and Siri are supposed to answer questions, provide suggestions, and even demonstrate empathy. However, new research shows that when it comes to analyzing and exploring a user’s experience, they perform worse than humans do.

CAs are powered by large language models (LLMs), which consume vast amounts of human-created data and are consequently susceptible to the same biases as the humans from whom the information is derived.

Researchers at Cornell University, Olin College, and Stanford University investigated this notion by asking CAs to demonstrate empathy while chatting with or about 65 different human identities. The researchers discovered that CAs make value judgments about specific identities, such as gay and Muslim, and can be supportive of identities associated with bad beliefs, including Nazism.

It’s extremely unlikely that it (automated empathy) won’t happen, so it’s important that as it’s happening, we have critical perspectives so that we can be more intentional about mitigating the potential harms.

Andrea Cuadra

“I think automated empathy could have tremendous impact and huge potential for positive things — for example, in education or the health care sector,” said lead author Andrea Cuadra, a Stanford postdoctoral researcher.

“It’s extremely unlikely that it (automated empathy) won’t happen,” she said, “so it’s important that as it’s happening, we have critical perspectives so that we can be more intentional about mitigating the potential harms.”

New study finds AI-generated empathy has its limits

Cuadra will present “The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction” at CHI ’24, the Association of Computing Machinery conference on Human Factors in Computing Systems, May 11-18 in Honolulu. Research co-authors at Cornell University included Nicola Dell, associate professor, Deborah Estrin, professor of computer science and Malte Jung, associate professor of information science.

The researchers discovered that, on general, LLMs received good grades for emotional reactions but low marks for interpretations and explorations. In other words, LLMs can react to a query based on their training but cannot go deeper.

Dell, Estrin, and Jung claimed they were encouraged to consider this work while Cuadra was researching the usage of earlier-generation CAs by older persons.

“She witnessed intriguing uses of the technology for transactional purposes such as frailty health assessments, as well as for open-ended reminiscence experiences,” Estrin stated. “Along the way, she observed clear instances of the tension between compelling and disturbing ’empathy.'”