On a recent business trip, I reached into my reading bag for the July 2, 2010 issue of the journal, Science. The issue contained a fascinating article by Gibson et al, “Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome,” summarizing the work by Craig Venter’s team to create an artificial life form. (I had read about the result in the popular press, but I was anxious to read the original work.) The team created a synthetic DNA sequence, assembling it from digitized genome sequence data for Mycoplasma mycoides and transplanted the modified sequence into Mycoplama capricolum, where it drove reproduction of new cells with the modified genetic code.
The process is a fascinating example of the interplay of biology, sequencing technology, algorithms and high-performance computing. Yet as the authors of the Science paper note, it also highlights how little we still understand. No single cellular system has all of its genes understood in terms of their biological roles. Simply put, we do not yet fully understand the function of all the genetic material, nor do we understand the complex interdependencies of differential gene expression during an organism’s lifecycle.
Systems Biology Writ Large
This set me to thinking about systems biology and the multidisciplinary challenges inherent in constructing predictive computational models of biological systems, from cellular processes through organism lifecycle to evolution and population dynamics. These “grand challenges” encompass almost every aspect of modern computing, from numerical and symbolic methods through data management and analytics to extraordinarily high-performance computing platforms. The wide dynamic range of temporal and spatial scales of systems biology models, from picosecond first principles molecular dynamics to the geological timescales of environmental shifts, is a full employment act for computational scientists.
In Silico Challenges
For many of us in high-performance computing, the temptation is to lapse into reverie about trans-exascale computing platforms, for predictive, multilevel biological models will be prodigious consumers of computational resources. Yet I would humbly suggest that is the wrong dreamy mediation for systems biology. Instead, the algorithmic, software and educational challenges are paramount.
There are enormous numerical and algorithmic challenges in fusing disparate models, from ab inito quantum chemistry, molecular dynamics, electrostatic continuum models, finite element models (FEM), computational fluid dynamics (CFD) models, and discrete automata models, among others. The software engineering challenges associated with building and maintaining such multidisciplinary codes are also complex, particularly when one realizes they are supported by generations of students, post-doctoral researchers, faculty members and software professionals.
Natural Philosophy Redux
In systems biology, the algorithms and the model couplings are subtle, the intellectual communities are diverse and often disjoint, and the model approximations are often domain specific. (Models are, after all, approximations of reality.) Perhaps most importantly, the educational structures and the social processes needed to inform, educate, integrate and fund biological researchers, engineers, computer scientists, and software developers to address such complex problems all challenge our existing approaches. Yet such integration harkens back to the origins of science as natural philosophy when all aspects of scientific inquiry were coupled.
In many ways, we are attempting to come full circle, from in vivo observations to in vitro experiments to holistic, in silico computational models. The challenges of systems biology – modeling, integration, computational resources and education – and the opportunities – for fundamental understanding of biological processes and the application to develop more efficacious drugs and improved health care are one of the great possibilities of the 21st century. These are exciting times.