Without Requiring Brain Surgery, AI Researchers Decode Speech from Thoughts

Patients who are unable to speak owing to neurodegenerative disorders or injuries to the brain or spinal cord have new hope for a noninvasive treatment thanks to the advanced pattern-recognition capabilities of AI deep learning.

Researchers from Meta AI have recently published a study that demonstrates how artificial intelligence-based deep learning can decode speech from noninvasive recordings of brain activity. This represents a significant advancement over current approaches that call for invasive open brain surgery to implant brain-computer interface devices.

“Decoding language from brain activity is a long-awaited goal in both healthcare and neuroscience,” wrote the research team of Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, and Jean-Rémi King.

“Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge.”

Convolutional neural networks (CNNs) were employed by the Meta AI researchers to tackle this problem by aiding in the decoding of brain activity utilizing data collected without the need for open brain surgery.

Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge.

The deep learning algorithm was an open-source pretrained self-supervised model called wav2vec 2.0 that was developed in 2020 by the Facebook (now Meta) AI team of Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli.

The study gathered data from 169 healthy participants who were passively listen to audiobooks and sentences in their native languages of either English or Dutch while their brain activity was recorded noninvasively with either magnetoencephalography (MEG) or electroencephalography (EEG).

The high-dimensional data was then searched for patterns using this data as input into an AI model. The objective is for the AI to infer from the noninvasive brain scans of the research participants’ neural activity what they were listening to.

The system performed better with the MEG datasets than the EEG datasets, according to the researchers.

For the MEG datasets, the model predicted a top-10 accuracy of up to 72.5 percent from three seconds of brain activity out of over 1,590 distinct segments. In decoding EEG datasets, the AI method outperformed the random baseline, however it only succeeded in 19.1% of the 2,600+ segments.

As for the societal impact, the Meta AI researcher caution, “Although these results hold great promise for the development of a safe and scalable system to help patients with communication deficits, the scientific community should remain vigilant that it will not be adapted to decode brain signals without the consent of the participants.”

This research represents a significant step forward in the development of BCI technology and has the potential to help people with communication disorders, such as speech impairments or conditions that affect movement, to communicate more easily and effectively.

The AI researchers further note that, in contrast to other indicators like facial characteristics, DNA, and fingerprints, brain activity measurements from EEG and MEG could not be taken without a participant’s knowledge.