Relearning to Speak

Learning a new language requires understanding of differences in pronunciation. — Computer-assisted language learning systems are useful in helping one to learn the basic aspects of a new language such as speaking, reading, and writing, but often we get tripped up in the pronunciation.

George Bernard Shaw purportedly once said, "England and America are two countries separated by the same language." If we have trouble understanding someone who speaks the same language with an unfamiliar accent, imagine how difficult it can be to try to speak a foreign language. Computer-assisted language learning systems are useful in helping address the basic aspects of language such as speaking, reading, and writing, but often we get tripped up in the pronunciation.

Learning a new language tends to get trickier as we get older, because perception of the sounds in a second language is based "deeply in neural organization, from the auditory nerve to the brain," according to the book "The Path of Speech Technologies in Computer Assisted Language learning: From Research Toward Practice," edited by V. Melissa Holland and F. Pete Fisher.

As a result, the authors argue, one of the challenges of learning the sounds of a second language is that "after a certain level of maturity, adult learners are so resistant to producing native-like sounds and have difficulty hearing and saying the distinctions that are natural to native speakers."

In addition to syntax and vocabulary, which are both necessary pieces for language understanding, it is important to consider prosody, or the intonation patterns of a language, says Sue Feldman, CEO of Synthexis, a cognitive computing consultancy. "Most people who do not speak tonal languages have trouble hearing the differences in intonation patterns that are an integral, important element in sounding like a native."

Intonation and emphasis are critical, agrees Jaime Carbonell, director of the Language Technologies Institute (LTI) at Carnegie-Mellon University. "Put emphasis on the wrong syllable and I sound like some kind of hick,’’ he says. Social conventions require the correct level of pronunciation in order to be taken seriously, Carbonell notes, "and in this global business world where people move from one place to another, learning vocabulary and grammar are not enough."

Teaching the pronunciation of a language is not just about teaching the deviations from what one is accustomed to hearing and saying, but how to overcome them, Carbonell stresses. For example, someone can be shown how to say something by putting their tongue closer to the front of their teeth. However, it is hard logistically for teachers to provide individual feedback on pronunciation when a classroom has one teacher and 30 students he says.

That is where automation comes in, with systems that teach pronunciation.

Carnegie Speech is one example. Its cloud-based NativeAccent software focuses on the speaking aspect of teaching English, and is used by colleges, adult education programs, and corporations with offshore employees who need to speak better English, says Angela Kennedy, president and CEO, and a graduate of CMU’s LTI.

"Oftentimes when people are learning the language, they’re not learning from native speakers,’’ says Kennedy. "We have three billion people learning English, and their teachers either don’t have great pronunciation, or are timid and would prefer to teach the reading, writing, and vocabulary." People from Asian countries in particular tend to have the greatest difficulty with learning to master English pronunciation, she says, because phonetics is hard and they have the fewest native speakers as teachers.

The company patented a speech recognition technology called Pinpointing, and has since patented a more advanced version of it, says Kennedy. The technology listens to a student’s speech and identifies at a granular level where the errors are, she says. "For instance, the technology may know a particular student has difficulty making the ‘r’ sound in the middle of a word when it’s co-articulated with an ‘o,’’’ she notes.

Users take a pre-assessment test that contains passages that allow the evaluation of users based on their knowledge of grammar, speaking fluency, and word stress; for example, whether they know, and can express, how do distinguish the pronunciation of "dessert" from "desert."

Based on the pre-assessments, Carnegie Speech has assembled metrics on over 400 language skills, including the ability to make all the different sounds of English in all the different positions (beginning, mid, final) co-articulated with all the vowels, consonant clusters, r’s and l’s, and more. There are metrics on many different grammatical, fluency, and prosodic skills (such as pitch, duration, and intonation) as well, according to Kennedy.

Another part of its technology is the intelligent tutoring system, which uses data from each student’s pre-assessment to create a customized learning path based on what will help them learn the most skills the most quickly, she says. It also performs a "contrastive linguistic analysis between your language and English," Kennedy says, "so it knows what types of errors a person is likely to make when learning to speak English, and based on their native language, which remediation techniques will teach that skill best for students of that native language."

Carbonell says in order for any such system to be effective, it first has to diagnose what the person is able to say correctly, what they mispronounce, and whether those mistakes are systematic (meaning, for example, they are unable to pronounce the ‘r’ sound, or they say the word correctly but with the wrong emphasis). Then it needs to either retrieve a fixed lesson, or compose one that addresses the problem and informs the learner how to pronounce words, and track their progress. "It’s useful to have a progress bar,’’ he says. "That’s psychologically encouraging."

Continued practice, of course, is also critical. "The machine is also infinitely patient, so a person can try 10 or 50 times to get it right," he says.

Feldman did her senior thesis on comparing American and British English intonation patterns, to demonstrate that they play a major role in the emotional meaning of a conversation. "The words alone are not enough,’’ she says. "It’s the music of the language that makes it possible to understand a speaker’s meaning."

Esther Shein is a freelance technology and business writer based in the Boston area.