A new effort at the Texas Advanced Computing Center is aimed at teaching scientists and engineers about supercomputing. They argue that "Anyone looking to do relevant computational work today in the sciences and engineering must have these skills." They offer a certificate or portfolio in "Scientific Computation."
Greg Wilson has been going after this same goal using a different strategy. He suggests that before we can teach scientists and engineers about high-performance computing, we first have to teach them about computing. He leads an effort called "Software Carpentry" to figure out what to teach scientists and engineers about computing:
I've been trying to construct a better answer for the past 13 years; Software Carpentry (http://software-carpentry.org/blog/) is what I've arrived at. It's the 10% of software engineering (with a very small "e") that scientists and engineers need to know before they tackle GPUs, clusters, and other fashionable Everests. Like sanitation and vaccination, the basic skills it teaches are cheap and effective; unfortunately, the other characteristic they share is that they're not really what photo ops are made of. We've also found a lot of resistance based on survivor bias: all too often, senior scientists who have managed to get something to work on a supercomputer say, "Well, I didn't need version control or unit testing or any of that guff, so why should my students waste their time on it?" Most scientists rightly regard computing as a tax they have to pay in order to get results.
The evidence is that the problem of teaching everyone else about computer science is bigger than teaching computer science majors about computer science. Chris Scaffidi, Mary Shaw, and Brad Myers have estimated that, by 2012, there will be about three million professional software developers in the U.S., but there will also be about 13 million end-user programmerspeople who program as part of their work, but do not primarily develop software. This result suggests that for every student in your computer science classes, there are four more students who could use some help in learning computer science. Those scientists and engineers who will be programming one day are in those other four.
Maybe we should be teaching scientists and engineers about computer science more generally. But as Greg Wilson points out, they don't want muchthey see computer science as a "tax." What's the core of computer science that even scientists and engineers ought to know? Alan Kay recently suggested a "Triple Whammy" (http://computinged.wordpress.com/2010/05/24/the-core-of-computer-science-alan-kays-triple-whammy/) defining the core of computer science:
- Matter can be made to remember, discriminate, decide, and do.
- Matter can remember descriptions and interpret and act on them.
- Matter can hold and interpret and act on descriptions that describe anything that matter can do.
That's a pretty powerful set. It goes way beyond Python vs. Java, or using Perl to check genome sequences with regular expressions vs. using MATLAB for analyzing data from ecological simulations. How do we frame the Triple Whammy in a way that fledgling scientists and engineers would find valuable and learnable?
The worrying trend I see is that many computer engineering graduates are interested in learning only a large set of programming languages, but dislike courses like algorithm design, not realizing that these languages are merely tools for implementing solutions. The end result is what you could call technicians but not engineers.
Greg Linden "Research in the Wild: Making Research Work in Industry"
How to do research in academia is well established. You get grants to fund your group, attract students, publish papers, and pump out Ph.Ds. Depending on who you ask and how cynical they have become, the goals are some combination of impacting the field, educating students, and personal aggrandizement.
Research in industry is less established. How to organize is not clear. The purpose is not even well understood. The business strategy behind forming a research group sometimes seems to be little more than a variant of the Underpants Gnomes' plan in South Park. Phase 1: Hire Ph.Ds. Phase 2:? Phase 3: Profit!
Generally, researchers in industry are supposed to yield some combination of long-term innovation, improving the sophistication of technology and products beyond simple and obvious solutions, and helping to attract talented and enthusiastic developers.
To take one example in search, without researchers who know the latest work, it would be hard for a company to build the thousands of classifiers that ferret out subtleties of query intent, document meaning, and spamminess, all of which is needed for a high-quality search experience. Information retrieval is a field that benefits from a long history of past work, and researchers often are the ones that know the history and how to stand on giants' shoulders.
Even so, there are many in industry that consider researchers an expensive luxury that their company can ill afford. Part of this comes from the historically common organizational structure of having a separate and independent research lab, which sometimes looks to be a gilded ivory tower to those who feel they are locked outside.
The separate research lab is the traditional structure, but a problematic one, not only for the perception of the group by the rest of the company, but also because the researchers can be so far removed from the company's products as to have little ability to make an impact. Many companies appear to be trying other ways of organizing researchers into the company.
For example, Google is well known for integrating many of its researchers into product groups and shifting them among product groups, working side-by-side with different development teams. While on a particular project, a researcher might focus on the part of the problem that requires esoteric knowledge of particular algorithms, but they are exposed to and work on many problems in the product. When this group comes together, everyone shares knowledge, and then people move to another group, sharing their knowledge again. Moreover, these ephemeral teams get people to know people, yielding valuable peer networks. When a tough research problem later comes up and no one nearby knows how to solve it, finding the person in the company who can solve it becomes much easier.
Many other companies, including Microsoft, Facebook, and Twitter, maintain separate research organizations, but try to keep the researchers working very closely with the product teams. At these companies, the impetus for novel research often is a problem in the product, usually a problem that would not be obvious in academia because of their lack of access to big data and scale.
What organizational structure works best in industry may depend on your goals. For immediate impact, having researchers integrated into product groups provides a lot of value; they are directly solving today's hard problems. But what about the problem that might hit in a year or two? And what about long-term breakthroughs, entirely new products, enabled by new technology no one has thought of yet?
My personal opinion leans mostly toward integrating researchers on projects, much like Google does, but also giving researchers 20% time (as all developers should get) and occasionally turning a 20% time project into a full project (again, as all developers should get, but the threshold for what is considered impactful might differ for a researcher, given the speculative gamble that is the nature of research). This strikes a balance between immediate impact, doing novel research, and taking advantage of a long-term opportunity when inspiration hits.
What do you think? How should researchers be organized in companies? Why?
Research as a process and a profession and as a mind set is quite different than product-making. Pushing the two too close together or expecting people to be good at both may not always be optimal. See my "Research as product" post on the FXPAL Blog.
©2011 ACM 0001-0782/11/0300 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.