How to Harness Petaflop Performance

The Petascale Computing Facility under construction on the University of Illinois campus will house the Blue Waters infrastructure and staff.

Since the Cray X-MP became the first computer to achieve gigaflops performance in 1993, the ability of computers to perform calculations has grown steadily, to one teraflop in 1997 and more recently to one petaflop.

Several petascale computers are up and running, such as Oak Ridge National Laboratory’s Jaguar system (1.06 petaflop per second on the LINPACK benchmark) and Los Alamos National Laboratory’s Roadrunner system (1.1 petaflop per second LINPACK). Others are sure to follow, such as the Blue Waters computing project scheduled to open in 2011 with sustained petaflops performance and still unknown peak performance.

All of these supercomputers are facing many unique challenges in trying to harness a petaflop.

Most obvious is the need to coordinate thousands of separate cores. Oak Ridge’s Jaguar involves 37,544 quad cores, or over 150,000 total, while the Blue Waters project will coordinate over 1.6 million cores. Spreading the workload over each core and getting them to work together is a significant challenge for software developers.

"We’ve gone from a period where we had a free lunch where we needed to do nothing to our software to gain the increases in speed that resulted from increasing chip frequencies to the point now where we need to significantly revise the software that we use for scientific and engineering applications," says Thomas Dunning, director of the National Center for Supercomputing Applications, Urbana, IL. "Unless we are willing to do a significant investment in the development of new software for these machines, we will not realize all of the potential that the petascale machines have to offer us."

Fighting Failure

Tomorrow’s supercomputers also face huge reliability issues. The Blue Waters system will have over 40,000 disk drives backing up the data being processed, and the best estimates are that one disk drive will fail every day. So systems will have to be developed that can find hardware before it fails and replace it without users ever knowing or losing time.

Energy use is also becoming an increasingly important issue for supercomputers, some of which consume as much energy as a small city.

"The trick is to put these things in places where you have lowered energy costs, or even better carbon neutral energy cost," says Tom DeFanti at the University of California, San Diego. "Once computers get to be so much more expensive because of the energy they’re using, then you move the computer to where the energy is and turn on the computer when the wind blows or the sun shines. Of course, right now we are still putting data centers in cities which is exactly wrong."

Of course, despite barely having the wrapping off the box for petascale computers some are already dreaming of exascale computers, although as DeFanti points out an exaflop is just 100 million video game centers hooked together.

Dunning is less hopeful. "We can’t get to the exascale system just by scaling up the technologies that we’re using on the petascale," he says. "We need to do some new rethinking of how it is that we do computing."

Graeme Stemp-Morlock is a freelance science writer based in Waterloo, Ontario.