Researchers at the University of California, San Diego (UCSD) have harnessed machine-learning techniques to classify proteins found in healthy and unhealthy gut microbiomes.
Using samples from 30 healthy people and 30 people with inflammatory bowel disease, the researchers sequenced about 600 billion DNA bases of the hundreds of species of microbes that live in the gut. The reconstructed DNA of the microbiome was then translated into hundreds of thousands of proteins, which were grouped into about 10,000 protein families.
Standard biostatistics identified the 100 most statistically significant protein families that determine health and disease status. These 100 protein families were used to train a machine to classify the remaining protein families as indicative of diseased or healthy status.
The software used to carry this out was run on the Gordon supercomputer at UCSD's San Diego Supercomputer (SDSC) Center using 180,000 core-hours. Researchers plan to continue their study using SDSC's Comet supercomputer, which will enable them to expand from 10,000 protein families to 1 million.
"Scalable methods for quickly identifying such anomalies between health and disease states will be increasingly valuable for biological interpretation of sequence data," the researchers note.
From UC San Diego News Center
View Full Article
Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA