Planet-scale applications are driving the exponential growth of the Cloud, and datacenter specialization is the key enabler of this trend. GPU- and FPGA-based clouds have already been deployed to accelerate compute-intensive workloads. ASIC-based clouds are a natural evolution as cloud services expand across the planet. ASIC Clouds are purpose-built datacenters comprised of large arrays of ASIC accelerators that optimize the total cost of ownership (TCO) of large, high-volume scale-out computations. On the surface, ASIC Clouds may seem improbable due to high NREs and ASIC inflexibility, but large-scale ASIC Clouds have already been deployed for the Bitcoin cryptocurrency system. This paper distills lessons from these Bitcoin ASIC Clouds and applies them to other large scale workloads such as YouTube-style video-transcoding and Deep Learning, showing superior TCO versus CPU and GPU. It derives Pareto-optimal ASIC Cloud servers based on accelerator properties, by jointly optimizing ASIC architecture, DRAM, motherboard, power delivery, cooling, and operating voltage. Finally, the authors examine the impact of ASIC NRE and when it makes sense to build an ASIC Cloud.
In the last decade, two parallel trends in the computational landscape have emerged. The first is the bifurcation of computation into two sectors: cloud and mobile. The second is the rise of dark silicon15, 3, 4, 2 and dark silicon aware design techniques13, 14, 10, 16, 11 such as specialization and near-threshold computation. Specialized hardware has existed in mobile computing for a while due to extreme power constraints; however, recently there has been an increase in the amount of specialized hardware showing up in cloud datacenters. Examples include Baidu's GPU-based cloud for distributed neural network acceleration, Microsoft's FPGA-based cloud for Bing Search,9 and by JP Morgan Chase for hedgefund portfolio evaluation.12
At the level of a single node, we know that ASICs can offer order-of-magnitude improvements in energy-efficiency and cost-performance over CPU, GPU, and FPGA.