The computer architecture community is at an interesting crossroads. Moore's Law is slowing down, stressing traditional assumptions around computing getting cheaper and faster over time—assumptions that underpin a significant fraction of the economic growth over the past few decades. But at the same time, our demand continues to grow at phenomenal rates, with deeper analysis over growing volumes of data, new diverse workloads in the cloud, smarter edge devices, and new security constraints. Is the situation dire, or is this the beginning of a new phase in the evolution of system architecture?
Two recent trends provide hope that it is the latter! The first trend, at a microarchitecture level, is around specialization or domain-specific hardware/software codesign. Compared to a general-purpose processor, a specialized architecture such as an ASIC (application-specific integrated circuit) customizes the design for a specific application or workload class. A good example is Google's TPU series of ASICs. Such specialization leads to significant area and power efficiencies. The trade-off, of course, is we now do not have the volume advantages of a general-purpose system, whether it is around software ecosystem support (and ease of development) or around amortization of costs associated with building a custom chip (notably, the non-recurring expenses or NRE). The second trend, at a system level, is around warehouse-scale computing, or more broadly cloud computing, a computing model that treats the entire "datacenter as a computer." This model helps amortize costs across larger ensembles, but also provides additional benefits around ubiquitous access, simpler system management, and better encapsulation of hardware under higher-level software interfaces and abstractions. Initially popularized by large Internet services such as search, email, and social networks, cloud computing is now increasingly being adopted by traditional enterprises as well.
What happens when we combine these two trends? Can we build purpose-built, warehouse-scale datacenters customized for (not just comprised of) large-scale arrays of ASIC accelerators or, to use a term coined in the following paper, ASIC clouds?
Interestingly, a proof point already exists, and from a very surprising source—bitcoin mining. Consider recent designs from companies such as Bitmaina or Bitfuryb that use ASICs with tens to hundreds of cores custom-designed to run Bitcoin's hashing algorithm. Hundreds of these chips are assembled into custom boards, and hundreds of these boards are assembled into custom racks or containers in very specialized datacenters. Bitfury even goes one step further, using specialized immersion cooling to submerge its servers. While such bitcoin-mining ASIC clouds have demonstrably provided massive scale out, are they likely to gain more broader mainstream acceptance? How effective are these designs compared to traditional CPUs and GPUs and traditional data-center designs? How do we reason about the architectural trade-offs or pervasive specialization from the ASIC to the server to the datacenter?
The following paper addresses these issues and more. The authors distill the lessons from bitcoin mining systems to develop a broader architectural framework for ASIC clouds. Specifically, they propose a hierarchical design, starting with core specialized functions that are replicated across ASICs and connected with a custom on-chip network; ASIC voltages are customizable allowing trade-offs for energy efficiency and total costs of ownership. Multiple ASICs are assembled together in a specialized server with custom cooling and power delivery systems and workload-tailored DRAM and I/O subsystems. Multiple servers are further assembled into racks and datacenters, again with computation-specific customization of thermals and power delivery. Using this architecture, the study examines ASIC clouds for four applications that span a diverse range of properties: different flavors of bitcoin mining, but also deep learning and video transcoding. Their results show the promise of ASIC clouds—two to three orders of magnitude improved efficiency advantages compared to traditional CPU- or GPU-based approaches.
Perhaps an even more exciting contribution of the paper is a methodology that federates different modeling approaches to derive pareto-optimal ASIC cloud configurations. Starting with data extracted from "place-and-route" circuit optimizations at the circuits level and computational fluid dynamics models at the systems level, this approach performs an exhaustive search to find the best design optimized across a number of parameters: the area per ASIC and the number of ASICs and their operating voltage, the number of DRAM chips associated per ASIC, and choices around the case design and the power delivery and cooling subsystems. A notable contribution is a refinement of a large amount of data into a "two-for-two" rule on when ASIC clouds are appropriate.
This is but the start of an interesting direction of exploration for the broader community. Given the nascent and fast-evolving nature of current ASIC solutions, how do we enable ASIC clouds to adapt rapidly to changing accelerator designs, to diversity across different classes of accelerators? Can the holistic design of ASIC clouds enable additional optimizations, for example, around addressing the speed and NRE of future specialized designs? These and other open questions highlight how we are entering an era of significant change, one where it is not "business as usual." The architecture and methodology in this paper provide a foundation, and a baseline, to explore more interesting ideas at the confluence of two of the most exciting ideas the community is rallying around.
To view the accompanying paper, visit doi.acm.org/10.1145/3399734
The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.