Research and Advances
Artificial Intelligence and Machine Learning Research highlights

Technical Perspective: Portraiture in the Age of Big Data

Posted
  1. Article
  2. Author
  3. Footnotes
Read the related Research Paper

"I have never been aware before how many faces there are.
There are quantities of human beings, but there are many more faces, for each person has several."
Rainer Maria Rilke

How many faces does a person possess? That is, how much does a face vary in its appearance over the lifetime of a given individual? Aging, of course, produces the greatest changes in facial structure, as anyone who has ever tried to pick out an adult friend from his first-grade class photo can attest. This is why many official ID documents require their holders to update their photograph every 5–10 years. But even at shorter temporal scales (days or weeks) there could be significant variations due, for instance, to the changes in hairstyle, eyewear, facial hair, or makeup. Add to that the changing pose (full face, profile, 3/4 view) and the constant parade of momentary changes in facial expression: happy, amused, content, angry, pensive … there are literally hundreds of words for describing the nuances of the human face.

This, of course, poses a great problem for portraitists, for how can a single portrait, even the most masterful one, ever come close to capturing the full gestalt of a living face? Indeed, many great artists have been obsessively preoccupied with this very question. Rembrandt painted over 60 self-portraits over his lifetime, a monumental study of his own face. Da Vinci, a master of visual illusion, purposefully blurred the corners of Mona Lisa’s mouth and eyes, perhaps in an effort to transcend the immediacy of the moment and let the observer mentally "paint in" the missing details. The cubists argued that to truly seize the essence of a person requires forgoing the traditional single-perspective 2D pictorial space and instead capture the subject from several viewpoints simultaneously, fusing them into a single image. Cinema, of course, has helped redefine portraiture as something beyond a single still image—the wonderful "film portraits" of the likes of Charlie Chaplin or Julia Andrews capture so much more than the still-life versions. Yet, even the cinema places strict limits on the temporal dimension since filming a movie rarely takes longer than a few months, which is only a small fraction of a person’s life.

The following paper is, in some sense, part of this grand tradition—the perpetual quest to capture the perfect portrait. Its principal contribution is in adapting this age-old problem to our post-modern, big data world. The central argument is that there already exist thousands of photographs of any given individual, so there is no need to capture more. Rather, the challenge is in organizing and presenting the vast amount of visual data that is already there. But how does one display thousands of disparate portraits in a human-interpretable form? Show them all on a large grid, à la Warhol? Each will be too small to see. Play them one after another in a giant slideshow? The visual discontinuities will soon make the viewer nauseated.


The following paper could be thought of as a type of Visual Memex specialized for faces.


The solution presented by these authors is wonderfully simple: first, they represent all photos as nodes in a vast graph with edges connecting portraits that have a high degree of visual similarity (in pose, facial expression, among others); then, they compute a smooth path through the graph and make it into a slideshow. The surprising outcome is that, typically, there are enough photographs available for a given individual that the resulting slideshow appears remarkably smooth, with each photo becoming but a frame in a continuous movie, making these "moving portraits" beautifully mesmerizing.

This type of graph representation betrays another intellectual lineage that goes back to Vannevar Bush and his article "As We May Think" (The Atlantic, 1945). Bush proposed the concept of the Memex (Memory Extender), a device that would organize information not by categories, but via direct associations between instances, using a vast graph. This idea has been influential in the development of hypertext, but the original Memex concept is actually much broader, encompassing data types beyond text (for example, photographs, sketches, video, audio), and describing paths through the instance graph (Bush called them "associative trails"). So, the following paper could be thought of as a type of Visual Memex specialized for faces.

In many ways this work signifies the coming of age of computer vision as a practical discipline. The work is one of the first instances when a fairly complex computer vision system (itself utilizing several nontrivial components such as face detection and face alignment) has become a "button" in a mainstream software product (Google Picasa) used by millions of people. So, read the paper, try the software—I do not think you will be disappointed.

Back to Top

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More