Technology

Project Intends to Broaden Language Technologies

Project Intends to Broaden Language Technologies

Language technology is a rapidly growing field that has the potential to greatly improve our ability to communicate and process information. There are many different areas of language technology, including natural language processing (NLP), computer-assisted translation (CAT), text-to-speech (TTS) and speech-to-text (STT) conversion, sentiment analysis, and many others.

Only a small percentage of the world’s 7,000 to 8,000 languages benefit from modern language technologies such as voice-to-text transcription, automatic captioning, instant translation, and voice recognition. Researchers at Carnegie Mellon University want to increase the number of languages supported by automatic speech recognition tools from around 200 to potentially 2,000.

“A lot of people around the world speak different languages, but language technology tools for all of them aren’t being developed,” said Xinjian Li, a Ph.D. student at the School of Computer Science’s Language Technologies Institute (LTI). “One of the goals of this research is to develop technology and a good language model for all people.”

Each language plays an important role in its culture. Each language has its own story, and if languages are not preserved, those stories may be lost. Creating this type of speech recognition system and tool is a step toward preserving those languages.

Xinjian Li

Li is a member of a research team that is attempting to simplify the data requirements that languages require in order to create a speech recognition model. The team, which also includes Shinji Watanabe, Florian Metze, David Mortensen, and Alan Black from LTI, presented their most recent work, “ASR2K: Speech Recognition for Around 2,000 Languages Without Audio,” at Interspeech 2022 in South Korea.

The majority of speech recognition models require two types of data: text and audio. Text data is available for thousands of languages. Audio data, on the other hand, does not. By focusing on linguistic elements shared by many languages, the team hopes to eliminate the need for audio data.

Historically, speech recognition technologies focus on a language’s phoneme. These distinct sounds that distinguish one word from another – like the “d” that differentiates “dog” from “log” and “cog” – are unique to each language. But languages also have phones, which describe how a word sounds physically. Multiple phones might correspond to a single phoneme. So even though separate languages may have different phonemes, their underlying phones could be the same.

Project aims to expand language technologies
Project aims to expand language technologies

Although the research is still in its early stages, it has improved existing language approximation tools by a mere 5%, but the team hopes that it will serve as an inspiration not only for their future work, but also for that of other researchers.

For Li, the work entails more than simply making language technologies available to everyone. It is all about preserving culture. “Each language plays an important role in its culture. Each language has its own story, and if languages are not preserved, those stories may be lost” Li stated. “Creating this type of speech recognition system and tool is a step toward preserving those languages. This is the first research to target such a large number of languages, and we’re the first team aiming to expand language tools to this scope.”

Although the research is still in its early stages, it has improved existing language approximation tools by a mere 5%, but the team hopes that it will serve as an inspiration not only for their future work, but also for that of other researchers.

For Li, the work entails more than simply making language technologies available to everyone. It is all about preserving culture. “Each language plays an important role in its culture. Each language has its own story, and if languages are not preserved, those stories may be lost “Li stated. “Creating this type of speech recognition system and tool is a step toward preserving those languages.”