Opinion
Computing Applications Last byte

Q&A: RISC and Reward

Having helped develop Reduced Instruction Set Computing and Redundant Arrays of Inexpensive Disks, David Patterson has set his sights on interdisciplinary research.
Posted
  1. Article
  2. Author
  3. Figures
David Patterson of UC Berkeley
"We need new challenges to drive our technology," says David Patterson of the University of California at Berkeley.

Though his entry into computer science was somewhat accidental—enrolling in a computing course by chance after a college math class was canceled—the University of California at Berkeley’s David Patterson has left a deep mark on the field. The Reduced Instruction Set Computer, or RISC, project that he led at Berkeley inspired the Oracle SPARC architecture, as well as the ARM architecture (the “R” in each stands for RISC) that powers most mobile phones. RAID—redundant arrays of inexpensive disks—offered a powerful new way to prevent data loss. More recently, Patterson has turned his attention to interdisciplinary research, collaborating with bioinformaticists and clinicians to better understand cancer genomics.

Tell me about your childhood and what drew you to the field.

I was the first of my family to graduate from college, and I was a math major because I did well in math in high school. In my junior year at UCLA, a math class was canceled, so I took a computing course as a lark. It was love at first sight. There was no computer science major at the time, so I just informally took all the computing courses that I could.

At the time, I was working at my dad’s company in downtown Los Angeles—an industrial job—and I casually mentioned to the instructor of one CS course that I would rather be writing software than working in a factory. On his own, he talked around and found me a job on a research project. Once I was in that environment, it seemed cost-effective to get a master’s, since it was just another year. And then they just assumed I was going to get a Ph.D., so I did. Had that instructor, Jean-Loup Baer, not found me a job as an undergrad, I would not be on this path.

That led to a job at Berkeley, where you have been ever since. How did you get involved with RISC, which is likely the first very-large-scale integration (VLSI) Reduced Instruction Set microprocessor?

Before we started, I took a leave of absence at DEC, and it gave me a chance to reflect. DEC had a very successful minicomputer called the VAX—this was 1979—and at the time, people were designing microprocessors to imitate minicomputers. But VAX had an extremely complicated instruction set, and I thought it would be a terrible idea to put that into silicon, in part because there would be so many mistakes. When I got back to Berkeley, I wrote a paper arguing that if you do that, you need to make a chip that was easy to modify. The paper was rejected, because it does not make sense to have a microprocessor that you need to modify. Those two mutually exclusive facts led to my involvement in the RISC effort.

Hence “reduced instruction set,” and your focus on including only instructions that were actually used.

We wanted to build a fast, lean microprocessor that performed well by tapping the rapidly improving capabilities of silicon due to Moore’s Law, and do the complicated stuff with software. There was a related effort at IBM led by John Cocke that predated ours, and John Hennessy’s work at Stanford came a little later.

Take me through the project’s evolution and key concepts like register windows, which break down the register file into global and local variables.

Part of what I realized at DEC is that it is hard to get things done at universities, because universities do not have deadlines. The only deadline at universities is when classes start and classes end. So we pursued the project through a series of four courses. The first one was the architecture investigation, and two of the grad students in that class came up with the idea of register windows as a way to take advantage of the resources to get good performance. Then we learned VLSI design in the next course, and we implemented them in subsequent courses.

After RISC, you moved on to the Redundant Arrays of Inexpensive Disks, or RAID—a powerful architecture whose name morphed from “inexpensive” to “independent” once it was commercialized.

When Randy Katz and I started working, that was the premise: to put a lot of inexpensive disks together in an array to get more performance at a cheaper per-byte cost. By adding redundancy, the cheap ones could actually be made more reliable than the expensive mainframe disks IBM was making.


“Now that we have made so much progress, we should reach out to other fields because we need new challenges to drive our technology.”


After it became successful, and people wanted to sell it, the word “inexpensive” was a problem. If it is inexpensive, how can we charge $100,000 for it? So they asked Randy if they could call it “independent.” That is marketing. Fine. But when I give a retrospective talk and I use the word “inexpensive,” people think I got it wrong.

Let’s talk about your more recent work. You are currently involved in a big data lab called the AMPLab—which stands for “algorithms, machines, and people”—that’s taking on some very tough questions …

Someone I spoke with put it this way: in the first 50 years of computer science, we did not need to talk to people in other fields. Since software was so bad and hardware was so slow, it was obvious what to work on. But now that we have made so much progress, we should reach out to other fields because we need new challenges to drive our technology.

In the three years since AMPLab was founded, you have worked on a series of collaborative projects to do with cancer genomics.

Cancer genomics is a realm where I think computer scientists can really help. My last biology course was in high school, but my strength is that I can bring together a multidisciplinary team. In 2011, I wrote an Op-Ed in The New York Times arguing just that. It did not receive a universally warm reaction, because the premise is that if computer scientists learn a little biology, we may be able to make contributions that are as important as those of people who dedicate their lives to biology and learn a little computer science.

Three years on, I am standing by my editorial. Our lab is now rolling, with three interesting results coming out at about the same time. That is happening because we are collaborating with much smarter domain experts, but also because our ability to build software, understand algorithms, and leverage cloud computing means we can create systems that can do things that weren’t possible before.

What are some of the challenges you have faced in your interdisciplinary work?

We’ve learned the hard way how to write a paper for a biology journal. They will not tell you how; they will just reject your paper. The first paper we submitted was not only rejected—it was rejected and we were told never to resubmit it again.

One of our Microsoft colleagues who had successfully made the transition said, “You don’t understand—you have to write backwards.” Step one, you write your 12-page computer science paper, which covers all topics in great depth; that’s your appendix. Then you write the part that goes into the journal; it’s a six-page section that you can think of as an extended abstract with highlights and important results. But the meat is in the appendix.

What are your results?

First is the Scalable Nucleotide Alignment Program, or SNAP. The cost of genetic sequencing has gone down faster than Moore’s Law over the past decade, but it is still around $1,000. The first step is among the most time-consuming: you break cells into many copies of small pieces in the wet lab, and then you have to assemble them back together to figure out the DNA. It’s like a jigsaw puzzle, and if you could align the pieces together faster, you could cut the data processing cost. SNAP is by far the fastest sequence aligner, and is as accurate as others.

Our second result (ADAM) stores genetic information in a way that allows it to be processed in parallel 50 times faster using the cloud.

The third is a benchmarking project called SMaSH. Fields of computer science accelerate when they agree on benchmarks, because you can’t measure progress without them. Genetics doesn’t really have benchmarks like ours, so we are proposing one.

The bottom line is that it’s both inspiring and exciting that people like us can help fight the war on cancer.

Back to Top

Back to Top

Figures

UF1 Figure. For more of David Patterson’s thoughts on research challenges, see “How To Build a Bad Research Center,” on page 33.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More