BLOG@CACM
Computing Applications BLOG@CACM

Scientists, Engineers, and Computer Science; Industry and Research Groups

The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.

twitter
Follow us on Twitter at http://twitter.com/blogCACM

http://cacm.acm.org/blogs/blog-cacm

Mark Guzdial discusses what scientists and engineers should know about computer science, such as Alan Kay's "Triple Whammy." Greg Linden writes about industry's different approaches to research and how to organize researchers in a company.
Posted
  1. Mark Guzdial "What do Scientists and Engineers Need to Know About Computer Science?"
  2. Reader's comment
  3. Greg Linden "Research in the Wild: Making Research Work in Industry"
  4. Reader's comment
  5. Authors
  6. Footnotes
BLOG@CACM logo

http://cacm.acm.org/blogs/blog-cacm-96699

A new effort at the Texas Advanced Computing Center is aimed at teaching scientists and engineers about supercomputing. They argue that "Anyone looking to do relevant computational work today in the sciences and engineering must have these skills." They offer a certificate or portfolio in "Scientific Computation."

Greg Wilson has been going after this same goal using a different strategy. He suggests that before we can teach scientists and engineers about high-performance computing, we first have to teach them about computing. He leads an effort called "Software Carpentry" to figure out what to teach scientists and engineers about computing:

I’ve been trying to construct a better answer for the past 13 years; Software Carpentry (http://software-carpentry.org/blog/) is what I’ve arrived at. It’s the 10% of software engineering (with a very small "e") that scientists and engineers need to know before they tackle GPUs, clusters, and other fashionable Everests. Like sanitation and vaccination, the basic skills it teaches are cheap and effective; unfortunately, the other characteristic they share is that they’re not really what photo ops are made of. We’ve also found a lot of resistance based on survivor bias: all too often, senior scientists who have managed to get something to work on a supercomputer say, "Well, I didn’t need version control or unit testing or any of that guff, so why should my students waste their time on it?" Most scientists rightly regard computing as a tax they have to pay in order to get results.

The evidence is that the problem of teaching everyone else about computer science is bigger than teaching computer science majors about computer science. Chris Scaffidi, Mary Shaw, and Brad Myers have estimated that, by 2012, there will be about three million professional software developers in the U.S., but there will also be about 13 million end-user programmers—people who program as part of their work, but do not primarily develop software. This result suggests that for every student in your computer science classes, there are four more students who could use some help in learning computer science. Those scientists and engineers who will be programming one day are in those other four.

Brian Dorn and I have a paper, "Learning on the Job: Characterizing the Programming Knowledge and Learning Strategies of Web Designers," in the 2010 ACM International Computing Education Research workshop on Brian’s work studying graphic designers who program. Brian finds that these end-user programmers don’t know a lot about computer science, and that lack of knowledge hurts them. He finds that they mostly learn to program through Google. In his most recent work, he is finding that not knowing much about computer science means that they’re inefficient at searching. When they see "try-catch" in a piece of code that they’re trying to understand, they don’t know to look up "exception handling," and they can easily spend hours reading about Java exception handling when they are actually working in JavaScript.

Maybe we should be teaching scientists and engineers about computer science more generally. But as Greg Wilson points out, they don’t want much—they see computer science as a "tax." What’s the core of computer science that even scientists and engineers ought to know? Alan Kay recently suggested a "Triple Whammy" (http://computinged.wordpress.com/2010/05/24/the-core-of-computer-science-alan-kays-triple-whammy/) defining the core of computer science:

  1. Matter can be made to remember, discriminate, decide, and do.
  2. Matter can remember descriptions and interpret and act on them.
  3. Matter can hold and interpret and act on descriptions that describe anything that matter can do.

That’s a pretty powerful set. It goes way beyond Python vs. Java, or using Perl to check genome sequences with regular expressions vs. using MATLAB for analyzing data from ecological simulations. How do we frame the Triple Whammy in a way that fledgling scientists and engineers would find valuable and learnable?

Back to Top

Reader’s comment

The worrying trend I see is that many computer engineering graduates are interested in learning only a large set of programming languages, but dislike courses like algorithm design, not realizing that these languages are merely tools for implementing solutions. The end result is what you could call technicians but not engineers.
        —Farhan Ahmad

Back to Top

Greg Linden "Research in the Wild: Making Research Work in Industry"

http://cacm.acm.org/blogs/blog-cacm/97467

How to do research in academia is well established. You get grants to fund your group, attract students, publish papers, and pump out Ph.Ds. Depending on who you ask and how cynical they have become, the goals are some combination of impacting the field, educating students, and personal aggrandizement.

Research in industry is less established. How to organize is not clear. The purpose is not even well understood. The business strategy behind forming a research group sometimes seems to be little more than a variant of the Underpants Gnomes’ plan in South Park. Phase 1: Hire Ph.Ds. Phase 2:? Phase 3: Profit!

Generally, researchers in industry are supposed to yield some combination of long-term innovation, improving the sophistication of technology and products beyond simple and obvious solutions, and helping to attract talented and enthusiastic developers.

To take one example in search, without researchers who know the latest work, it would be hard for a company to build the thousands of classifiers that ferret out subtleties of query intent, document meaning, and spamminess, all of which is needed for a high-quality search experience. Information retrieval is a field that benefits from a long history of past work, and researchers often are the ones that know the history and how to stand on giants’ shoulders.

Even so, there are many in industry that consider researchers an expensive luxury that their company can ill afford. Part of this comes from the historically common organizational structure of having a separate and independent research lab, which sometimes looks to be a gilded ivory tower to those who feel they are locked outside.

The separate research lab is the traditional structure, but a problematic one, not only for the perception of the group by the rest of the company, but also because the researchers can be so far removed from the company’s products as to have little ability to make an impact. Many companies appear to be trying other ways of organizing researchers into the company.

For example, Google is well known for integrating many of its researchers into product groups and shifting them among product groups, working side-by-side with different development teams. While on a particular project, a researcher might focus on the part of the problem that requires esoteric knowledge of particular algorithms, but they are exposed to and work on many problems in the product. When this group comes together, everyone shares knowledge, and then people move to another group, sharing their knowledge again. Moreover, these ephemeral teams get people to know people, yielding valuable peer networks. When a tough research problem later comes up and no one nearby knows how to solve it, finding the person in the company who can solve it becomes much easier.

Many other companies, including Microsoft, Facebook, and Twitter, maintain separate research organizations, but try to keep the researchers working very closely with the product teams. At these companies, the impetus for novel research often is a problem in the product, usually a problem that would not be obvious in academia because of their lack of access to big data and scale.

What organizational structure works best in industry may depend on your goals. For immediate impact, having researchers integrated into product groups provides a lot of value; they are directly solving today’s hard problems. But what about the problem that might hit in a year or two? And what about long-term breakthroughs, entirely new products, enabled by new technology no one has thought of yet?

My personal opinion leans mostly toward integrating researchers on projects, much like Google does, but also giving researchers 20% time (as all developers should get) and occasionally turning a 20% time project into a full project (again, as all developers should get, but the threshold for what is considered impactful might differ for a researcher, given the speculative gamble that is the nature of research). This strikes a balance between immediate impact, doing novel research, and taking advantage of a long-term opportunity when inspiration hits.

What do you think? How should researchers be organized in companies? Why?

Back to Top

Reader’s comment

Research as a process and a profession and as a mind set is quite different than product-making. Pushing the two too close together or expecting people to be good at both may not always be optimal. See my "Research as product" post on the FXPAL Blog.
        —Gene Golovchinsky

Back to Top

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More