Neuroengineers at Columbia University have created a system that translates thought into intelligible, recognizable speech. By monitoring someone’s brain activity, the technology can reconstruct the words a person hears with unprecedented clarity. This breakthrough, which harnesses the power of speech synthesizers and artificial intelligence, could lead to new ways for computers to communicate directly with the brain. It also lays the groundwork for helping people who cannot speak, such as those living with amyotrophic lateral sclerosis (ALS) or recovering from stroke, regain their ability to communicate with the outside world.

According to Nima Mesgarani, PhD, the paper’s senior author and a principal investigator at Columbia University’s Mortimer B. Zuckerman Mind Brain Behavior Institute, voices help connect a person to friends, family and the world around him, which is why losing the power of one’s voice due to injury or disease is so devastating. With today’s study, scientists have a potential way to restore that power. They have shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener. Decades of research has shown that when people speak - or even imagine speaking - telltale patterns of activity appear in their brain. Distinct (but recognizable) pattern of signals also emerge when they listen to someone speak or imagine listening. Experts, trying to record and decode these patterns, see a future in which thoughts need not remain hidden inside the brain - but instead could be translated into verbal speech at will.

Dr. Mesgarani and his team, including the first author Hassan Akbari, turned instead to a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people talking.  To teach the vocoder to interpret to brain activity, Dr. Mesgarani teamed up with Ashesh Dinesh Mehta, MD, Ph.D., a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute. Dr. Mehta treats epilepsy patients, some of whom must undergo regular surgeries. Next, the researchers asked those same patients to listen to speakers reciting digits between 0 to 9 while recording brain signals that could then be run through the vocoder. The sound produced by the vocoder in response to those signals was analyzed and cleaned up by neural networks, a type of artificial intelligence that mimics the structure of neurons in the biological brain.

The end result was a robotic-sounding voice reciting a sequence of numbers. To test the accuracy of the recording, Dr. Mesgarani and his team tasked individuals to listen to the recording and report what they heard. They found that people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts. The improvement in intelligibility was especially evident when comparing the new recordings to the earlier, spectrogram-based attempts. Dr. Mesgarani and his team plan to test more complicated words and sentences next, and they want to run the same tests on brain signals emitted when a person speaks or imagines speaking. Ultimately, they hope their system could be part of an implant, similar to those worn by some epilepsy patients, that translates the wearer’s thoughts directly into words.