Researchers at McGill University have taught machine-learning software to identify hate speech by acquiring knowledge on how members of hateful communities speak.
The system was trained on a data dump containing most of the Reddit posts made over a decade, focusing on primary targets that included African Americans, overweight people, and women. For each target, the researchers selected the most active support and abuse groups on Reddit to train their software, while also feeding it comments from the Voat forum site and individual websites committed to hate speech. The team found its strategy yielded fewer false positives than keyword-based detectors.
"Comparing hateful and non-hateful communities to find the language that distinguishes them is a clever solution," says Cornell University's Thomas Davidson. However, Joanna Bryson at the University of Bath says the method will not catch every instance of offensive speech, although she does think it could help human moderators.
From New Scientist
View Full Article
Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA