Despite the fact that he does not see very well, Alexei Efros, recipient of the 2016 ACM Prize in Computing and a professor at the University of California at Berkeley, has spent most of his career trying to understand, model, and recreate the visual world. Drawing on the massive collection of images on the Internet, he has used machine learning algorithms to manipulate objects in photographs, translate black-and-white images into color, and identify architecturally revealing details about cities. Here, he talks about harnessing the power of visual complexity.
You were born in St. Petersburg (Russia), and were 14 when you came to the U.S. What drew you to computer science?
I was interested in computers from an early age. I remember reading a book about PDP-11 assembly language programming when I was 12 and dreaming about how one day, I might actually have a computer of my own to try this out in practice. Then, in high school, I did some research with a professor at the University of Utah. It sounds kind of brazen, but I went to the CS department and was like, "Bring me to your chairman." Tom Henderson was the chair at that time and, you know, he actually saw me. I told him that I wanted to do computer science and asked him for a problem. And he basically said, "Ok, weird Russian kid. I have a robot running around; do you want to help with that?" It was wonderful.
You did your undergraduate work at the University of Utah, as well.
Interestingly enough, I was actually considering whether I should go into computer science (CS) or theater. In fact, I applied to Carnegie Mellon University because it's one of the top departments in CS, but also one of the top universities for theater. Then I showed my father the tuition, and, well, we were immigrants. So I went to the University of Utah, where CS was much stronger than theater, and I think I got a very good education. But I'm still practicing my stagecraft twice a week in my classes.
I've seen your talks. You're a very engaging speaker.
There is this whole dichotomy between the geeks and the artsy peopleeither you are good with numbers, or with arts and humanities. I think it's misplaced. CS is hot right now. A lot of smart kids go into CS, and many look down at all of these humanities people with disdain. In my classes, I try to remind them that computer scientists are hot now, but physicists were hot in the Sixties, and chemists were hot in the Thirties, and they're not superhot now. Shakespeare is going to be around much longer than Python.
How did you get involved with computer vision, graphics, and machine learning?
Even in high school, my goal was to solve AI. But then I reasoned it out: AI is too hard, and you don't know when you're succeeding. With language, you kind of know when you're succeeding, but that's also very high-level. Meanwhile, almost all animals have vision. Vision seems like the most basic thing, so it's got to be easy, right?
Basically, I think I've just had one idea throughout my whole career, and I've been milking it since undergrad, and the idea is not even that profound. It's that we fetishize intellectual contributionsalgorithms, data structures, and so on. And we often forget that a lot of the complexity in the world is actually due to the data. My favorite example is in computer graphics. We know how light behaves, and we can simulate everything we want. But the reason current animated movies don't look like the real thing is the data. There is a lot of entropy in the world and it's just too hard to capture. The algorithms are fine. It's the data that is missing.
Another example is the Shannon trick of synthesizing text. Imagine if you start typing an SMS on your phone but you keep using the predictive function. The algorithm is very basicit's just "look for the last time something like this occurred and steal the next most probable letter." But you get really interesting results, because you have a lot of data.
Thanks to the Internet, you've got access to a massive corpus of data. Didn't one of your early papers examine two million images from Flickr?
Exactly. Initially, we said, "We'll just download 20,000 images." The results weren't great. But my then-grad student, James Hays, was like, "Why don't we just keep downloading?" If you look at the big neural networks right now, it is really impressive what they can do. But I think people are forgetting that one of the reasons they're so powerful is that they are able to gobble up orders of magnitude more data than we could do with earlier methods. This is not very glamorous, because it suggests that humans are not so smart. It's really the data.
That reminds me of the old philosophical debate about experiential vs. a priori knowledge.
People like to rationalize. They like to get a nice beautiful theory of the world. But reality is often really noisy and complicated, and in a way, data allows you to use this complexity, to not have to throw it away. It's not the minimalist beauty, the clean lines. It's the beauty of a jumbled mess.
Your analyses of photographic data sets like faces and building facades have also revealed lots of visual trends that might not otherwise have been easy to notice.
That is a big beautiful promise and we're only scratching the surface. People are good at finding certain kinds of patterns. We can hold a small number of things in our minds and compare them. We are not able to find a tiny, tiny little pattern over thousands or millions of data points, or very subtle changes over a long range of time. Using computer vision and techniques, I'm hoping we can make new discoveries in ways people have not been able to do before. I would love to discover something that people haven't noticed yet.
"We don't see things as they are; we see them tinted by language and culture and all the baggage."
What about your recent discovery, in an analysis of 150,000 American yearbook photos, that people's smiles broadened during each decade since 1900?
For the portraits, we were very happy to see the increase in smiling over time. We thought, wow, this is a really cool discovery. Of course, then we found some psychological literature that indicates people have already noticed this.
Your work has found applications in areas from entertainment to security. What other pie-in-the-sky applications or discoveries do you hope to see?
Frankly, my goal has always been to understand and model biological vision. Human vision is too hard, because it connects with everything else. We don't see things as they are; we see them tinted by language and culture and all the baggage. But if I'm able to build a model of a rabbit's vision or a rat's vision by the time I retire, I think that would be absolutely fantastic. Imagine having a model of this remarkable apparatus that almost all living creatures possess.
Now, because this is such a hard problem, you don't get wins very often. A lot of the time, it's a depressing slog. But once in a while, as a kind of by-product, some really neat things come up that you can use to create pretty pictures. And I think the world needs more pretty pictures.
©2017 ACM 0001-0782/17/09
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.