Researchers at the Massachusetts Institute of Technology (MIT) have issued the first major database of fully annotated English sentences written by non-native English speakers, in the hope it could inform the design of applications to enhance how computers manage the spoken or written language of non-native speakers.
More than 5,100 sentences taken from exam essays written by English as a second language (ESL) students constitute the database, with each sentence featuring at least one grammatical error. The system was trained for weeks on annotating both correct and error-ridden sentences, after which the researchers mapped the syntactic relationships between the words in both the corrected and uncorrected versions using Universal Dependency formalism.
MIT graduate student Yevgeni Berzak says most writers or speakers of English are non-native speakers, a fact that "is often overlooked when we study English scientifically or when we do natural language processing for English."
Berzak notes such machine learning-based systems seek patterns in training data that is only written in standard English. He says systems trained on non-standard English could be more capable of handling non-native speakers' linguistic quirks.
Uppsala University professor Joakim Nivre says annotating both corrected and uncorrected sentences "could be cast as a machine-translation task, where the system learns to translate from ESL to English."
From MIT News
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA