Learning another language has never been a simple proposition. It can take months of study to absorb the basics and years to become fluent. Of course, there's the added headache that learning a language doesn't help if a person encounters one of the world's other 7,000 or so languages.
However, thanks to digital technology, the accent is now shifting to automated machine translation.
The dream of producing devices and software that can translate from Japanese to Spanish, or from Farsi to Hindi, is taking shape. Fueled by improvements in speech recognition, machine learning, better algorithms, cloud processing, and more powerful computing devices, the quality of machine translations is improving.
For example, Google Translate and Bing Translate can convert spoken or typed sentences and read a reasonably accurate translation out loud. Google Translate on a smartphone can also convert text from a foreign language menu, sign, or document into a user's native language. At the same time, startups such as Waverly Labs have introduced earphones and earbuds that allow participants to converse in different languages but hear everything in their own language in near-real time.
From casual travel to international business and global conferences, machine language translation is changing the face of human interaction. Alex Waibel, a computer science professor at Carnegie Mellon University and the Karlsruhe Institute of Technology, says that while the technology is nothing new (he and others began exploring it in the mid-1980s), the last few years have fueled enormous advances in the field.
In the past, machine language translation systems have mostly used rules-based methods to determine the most accurate combination of words. Yet, it's impossible to program the millions (or even billions) of rules needed to address every translation.
The next advance was statistical learning machine translation systems. Although still imperfect, this led to broad adoption of machine translation. This advance also made the translation of spoken language possible.
A major breakthrough occurred in 2009 when Waibel and his team developed an iPhone app called Jibbigo, which translated voice and text for vocabularies of more than 40,000 words in near-real time on a smartphone. This brought even speech dialog translation to the masses. By 2015, recurring neural networks began to reshape the field—producing performance gains of 30% or more.
Achieving a higher level of accuracy isn't about simply dumping words into machine learning systems and allowing them to parse through all the combinations. Accents, dialects, regional variations, synonyms, idioms, and colloquialisms require close scrutiny. Further complicating the task is the fact that new words and slang constantly appear, such as the term Brexit, and meanings of words sometimes change over time. Consequently, researchers often use specific datasets that focus on context and industry-specific terms. This helps a system distinguish between a river bank and a financial bank, or between a construction crane and a bird.
Today's machine language frameworks accommodate upwards of millions of words and, in many cases, produce increasingly high-quality translations, as measured by a BLEU score, an algorithmic tool for evaluating machine-translated text across languages.
Still, perfection remains elusive.
"It's difficult to communicate speaking style, tone, and intent through machine translation. Direct text output, such as a translation that takes place at a social media site, doesn't convey the true feeling," says Ge Gao, an assistant professor at the College of Information Studies, University of Maryland College Park.
In labs around the world, research continues into how to build better machine translators and scale up the number of languages that machines can accommodate. Although tools such as Google Translate are suitable for casual use, business and diplomatic interpretation requires a level of precision and handling of stylistic, social, and diplomatic variations and subtleties that are lacking in today's technology. "You have people who can interpret multiple languages appropriately, including communicating politeness, sarcasm, humor, insults, formality and anger," Waibel says.
Training systems to translate accurately across several thousand languages and hundreds of thousands of dialects becomes an almost impossible task. Speech and image recognition present additional challenges; Waibel describes the task as a "software nightmare." At present, Waverly Ambassador supports 20 languages in 42 dialects. The $149 wireless over-the-ear units stream translations in near-real time by tapping cloud processing. Ochoa says 5G cellular technology and further enhancements in chips and machine learning will fuel better systems.
Gao believes it may one day be possible to hit 100% accuracy, and incorporate tone and emotions into machine language translation. Even then, the need for humans in translation likely will persist. Devices and machines are not suitable, or even available, for every environment and situation.
Waibel believes machine language translation technology won't replace the desire to learn languages; he says such systems have actually produced an uptick in interest in gaining language skills. "The more people have the technology, the more they venture into other languages without fear," he says. "They learn phrases, they seek out travel opportunities, and they venture into a culture."
Samuel Greengard is an author and journalist based in West Linn, OR, USA.