For decades we have used the RAM (random-access memory)2 and PRAM (parallel RAM) models5 along with asymptotic analysis to measure the complexity of algorithms. The RAM and PRAM models treat all operations, from an integer add to a global memory load as unit cost. This approximation was appropriate during the early days of computing when the costs of arithmetic and communication were somewhat comparable. Over time, however advancing semiconductor technology has caused the cost of arithmetic and logic to shrink by orders of magnitude while the cost of communication has reduced at a much slower rate. As a result, today fetching two 32-bit words from main memory expends 1.3nJ of energy while performing a 32-bit add operation (which requires two 32-bit words as input) takes only 20fJ of energy 64,000x less. A model of computation like RAM or PRAM that treats these two operations as equivalent does a poor job of estimating the cost of a computation, and hence does a poor job of comparing alternative algorithms.
Communication Is the Dominant Cost in Computing. Whether we consider cost to be energy or time, communication dominates modern computation. A 32-bit add operation takes only 20fJ and 150ps. Moving the two 32-bit words to feed this operation 1mm takes 1.9pJ and 400ps. Moving the 64 bits 40mm from corner to corner on a 400mm2 chip takes 77pJ and 16ns. Going off chip takes 320pJ with a delay of 6ns per meter.