A new machine learning-based search engine developed by researchers from Aalto University and the University of Jena in Germany could benefit scientists in the life and medical sciences.
CSI:FingerID is designed to identify metabolites from tandem mass spectrometry measurements, and its accuracy is more than 150 times higher than its rivals.
There are hundreds of thousands to millions of metabolites and they all look similar, notes Aalto professor Juho Rousu. For the project, the team used a tandem mass spectrometer to split molecules into fragments to measure their masses and relative abundances, or their mass spectrum. A fragmentation tree is first computed from each spectrum included in the training data, which describes for each fragment its parent, and then researchers train the machine-learning model using a large number of fragmentation trees and molecular properties (or fingerprints) that correspond to each tree. When the spectrum of a new molecule is then provided for the model, it predicts its likely fingerprints based on which a set of best-matching molecules is retrieved from the molecule database.
"The molecular structures it predicts can be used in much the same way as search results from the Google search engine," Rousu notes.
The team says a considerable increase in data volume is needed to improve the accuracy of molecule identification.
The search engine could have potential applications in anti-doping work, drug control by customs, and crime scene investigation.
From Aalto University
View Full Article
Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA