Researchers at the Massachusetts Institute of Technology (MIT) and the Qatar Computer Research Institute have developed a method that identifies individual nodes, or "neurons," in neural networks that capture specific linguistic features.
The researchers also designed a toolkit to analyze and manipulate how networks translate text for various purposes, like offsetting classification biases in the training data.
For example, the system can pinpoint neurons used to classify gendered words, past and present tenses, numbers at the beginning or middle of sentences, and plural and singular words; the researchers also demonstrated how some tasks require many neurons, while others require only a few.
The new technique combines all the embeddings captured from different layers into a single embedding. As the network classifies a given word, the model learns weights for every neuron activated during each classification process, and assigns a weight to each neuron in each word embedding that fired for a certain part of the classification.
Said MIT’s Yonatan Belinkov, “This work is about gaining a more fine-grained understanding of neural networks and having better control of how these models behave.”
From MIT News
View Full Article
Abstracts Copyright © 2019 SmithBucklin, Washington, DC, USA