The computational demand made by artificial intelligence (AI) has soared since the introduction of deep learning more than 15 years ago. Successive experiments have demonstrated the larger the deep neural network (DNN), the more it can do. In turn, developers have seized on the availability of multiprocessor hardware to build models now incorporating billions of trainable parameters.
The growth in DNN capacity now outpaces Moore's Law, at a time when relying on silicon scaling for cost reductions is less assured than it used to be. According to data from chipmaker AMD, cost per wafer for successive nodes has increased at a faster pace in recent generations, offsetting the savings made from being able to pack transistors together more densely (see Figure 1). "We are not getting a free lunch from Moore's Law anymore," says Yakun Sophia Shao, assistant professor in the Electrical Engineering and Computer Sciences department of the University of California, Berkeley.
Though cloud servers can support huge DNN models, the rapid growth in size causes a problem for edge computers and embedded devices. Smart speakers and similar products have demonstrated inferencing can be offloaded to cloud servers and still seem responsive, but consumers have become increasingly concerned over having the contents of their conversations transferred across the Internet to operators' databases. For self-driving vehicles and other robots, the round-trip delay incurred by moving raw data makes real-time control practically impossible.