Massachusetts Institute of Technology (MIT) researchers have developed a machine-learning system that can learn to distinguish spoken words, as well as lower-level phonetic units, such as syllables and phonemes.
The researchers say the technology could aid in the development of speech-processing systems for languages that are not widely spoken and do not have the benefit of decades of linguistic research on their phonetic systems. The technology also could help make speech-processing systems more portable because information about lower-level phonetic units could help solidify distinctions between different speakers' pronunciations.
In addition, the MIT system acts directly on raw speech files, which could prove to be much easier to extend to new sets of training data and new languages. Finally, the researchers say the system could offer some insights into human speech acquisition.
The key to the system's performance is a "noisy-channel" model of phonetic variability. The researchers modeled this phenomenon by borrowing an idea from communication theory, treating an audio signal as if it were a sequence of perfectly regular phonemes that had been sent through a noisy channel.
The goal of the machine-learning system is to learn the statistical correlations between the "received" sound and the associated phoneme.
From MIT News
View Full Article
Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA