When Geoffrey Hinton started doing graduate student work on artificial intelligence at the University of Edinburgh in 1972, the idea that it could be achieved using neural networks that mimicked the human brain was in disrepute. Computer scientists Marvin Minsky and Seymour Papert had published a book in 1969 on Perceptrons, an early attempt at building a neural net, and it left people in the field with the impression that such devices were nonsense.
"It didn't actually say that, but that's how the community interpreted the book," says Hinton who, along with Yoshua Bengio and Yann LeCun, will receive the 2018 ACM A.M. Turing award for their work that led deep neural networks to become an important component of today's computing. "People thought I was just completely crazy to be working on neural nets."
Even in the 1980s, when Bengio and LeCun entered graduate school, neural nets were not seen as promising. Many people thought that building a network with random connections across multiple layers, giving it some data, and letting it figure out how to reach the right answer was just asking too much. "People were very suspicious of the idea you could just learn from the data," says Hinton, a professor emeritus at the University of Toronto and now an engineering fellow at Google.
LeCun read Hinton's work including, he says, a paper written in coded language to get around the taboo about neural nets. "I learned about Geoff's existence, and realized this was the man I needed to meet," he says. LeCun did a postdoctoral fellowship in Hinton's lab, then moved to Bell Labs. He's now a professor at New York University (NYU) and director of AI research at Facebook.
Bengio also wound up at Bell Labs in the early 1990s, where he and Lecun worked together. "What really appealed to me was the notion that by studying neural nets, I was studying something that would be fairly general about intelligence, that would explain our intelligence and allow us to build intelligent machines," Bengio recalls. Today, he is a professor at the University of Montreal, scientific director of Mila (the Montreal Institute for Learning Algorithms), and an advisor to Microsoft.
Their work gained wide mainstream acceptance in 2012, after Hinton and two students used deep neural nets to win the ImageNet challenge, identifying objects in a set of photos at a rate far better than that of any of their competitors. Since then, the field has embraced the technology, which has also seen breakthroughs in speech recognition and natural language processing, and could help make self-driving vehicles more reliable.
LeCun says theories about why neural nets would not workthat the training algorithms would get stuck in the extreme values of mathematical functions known as local minimafell to real-world experience. "In the end, what people were convinced by were not theorems; they were experimental results," he says. Even though there were local minima, those bad enough for an optimization algorithm to get stuck were relatively rare. It turned out that if the neural nets were just big enough for the problem they were trying to solve, they could get stuck, but if they were larger, they became more efficient at optimization. "You make those networks bigger and bigger and they work better and better," LeCun says.
Working both together and independently, the three made important contributions to neural networks. Among their several discoveries, Hinton helped to develop backpropagation, an algorithm that calculates error at the output of the network and propagates the results backward toward the input, allowing the machine to improve its accuracy. LeCun developed convolutional neural networks, which replicate feature detectors across space and are more efficient for image and speech recognition.
Another development that helps the system learn more effectively involves randomly turning off some of the neurons about half of the time, introducing some noise into the network. Bengio says there is noise and randomness in the way living neurons spike, and something about that makes the system better at dealing with variations in input patterns, which is key to making the system useful. "You want to be good at doing the things you haven't yet seen, things that might be somewhat different from the training data," Hinton says.
"Machines are still very, very stupid," LeCun says. "The smartest AI systems today have less common sense than a house cat."
Bengio came up with word embeddings, patterns of neuron activation that represent word symbols, thereby expanding exponentially the system's ability to express meanings and making it possible to process text and translate it from one language to another. Hinton explains that the embeddings make it easier for the system to reason by analogy, rather than by following a logical set of rules; he believes that is more like how the human brain works. The brain evolved to use patterns of neural activity to perform perception and movement, and that makes it more suited to reasoning by analogy rather than logic, he argues.
In fact, artificial intelligence remains limited compared to human intelligence. "Machines are still very, very stupid," LeCun says. "The smartest AI systems today have less common sense than a house cat." Though they excel at recognizing patterns, neural networks have no knowledge of how the world works, and computer scientists have not yet figured out how to give it to them. Humans learn to generalize from a very small number of samples, while neural networks require vast sets of training data. In fact, Hinton says, it was the growth in available datasets, along with faster processors, that led to the "phase shift" from neural networks being a curiosity to a practical approach.
There are hundreds of useful tasks neural networks can accomplish just by using their current pattern recognition capabilities, Hinton says, from predicting earthquake aftershocks to offering better medical diagnoses on the basis of hundreds of thousands of examples. But to give machines a more general intelligence that could solve different types of problems or accomplish multiple tasks will require scientists to come up with new concepts about how learning works, Bengio says. "It might take a very long time before we reach human-level AI," he says.
Meanwhile, society has to have more discussion about how to use artificial intelligence appropriately. Hinton worries about how autonomous intelligent weapons systems might be misused, for instance. LeCun says that without adequate political and legal protections, governments could use the systems to track people and try to control their behavior, or corporations might rely on AI to make decisions but ignore bias in their algorithms.
To address some of these worries, Bengio took part in a group that last December issued the Montreal Declaration for a Responsible Development of Artificial Intelligence, which outlines principles that they say should be used in pushing the technology forward. "We're building stronger and stronger technology based on the premises of science, but the organization of society and their collective wisdom isn't keeping up fast enough. The solution may not be in some new theorem or some new algorithm," he says.
With such concerns in mind, Hinton says he will donate a portion of his share of the $1-million Turing Award prize money to the humanities at the University of Toronto. "If we have science without the humanities to help guide the political process, then we're all in trouble," he says. LeCun says he will likely make a donation to NYU, and Bengio says he's considering some environmental causes.
Based on their experiences as academic heretics who turned out to be right, they advise young computer scientists to stick to their convictions. "If someone tells you your intuitions are wrong, there are two possibilities," Hinton says. "One is you have bad intuitions, in which case it doesn't matter what you do, and the other is you have good intuitions, in which case you should follow them."
©2019 ACM 0001-0782/19/06
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.