Research and Advances
Architecture and Hardware Research highlights

Technical Perspective: Technology Scaling Redirects Main Memories

Posted
  1. Article
  2. Author
  3. Footnotes
  4. Tables
Read the related Research Paper

As predicted by Intel’s Gordon Moore in 1965, based on his observation of the scaling of several generations of silicon technology at the time, the number of transistors that can be integrated on one die continues to double approximately every two years. Amazing to some, Moore’s Law has prevailed for 45 years and is expected to continue for several more generations. Transistor feature size and die integration capacity projections from the International Technology Roadmap for Semiconductors (ITRS) roadmap is shown in the accompanying table here.

These faster and more abundant transistors have been exploited by computer engineers to build processors that double in performance about every two years. Up until the beginning of this decade, that was done through faster clock speeds and clever architectural enhancements. Many of these architectural enhancements were directed at tackling the “memory wall,” which still plagues us today. Early in this decade, we ran into the “power wall” that dramatically slowed the increase in clock speeds. Since then, we are still seeing performance doublea every two years, but now it’s through having more cores (running at only modestly faster clock rates) on one die since technology scaling provides all of those additional transistors.

Another key component on the motherboard affected by technology scaling is the main memory, traditionally built out of dynamic random access memory (DRAM) parts. DRAMs have been doubling in capacity every two to three years while their access latency has improved about 7% per year. However, processors speeds still leave main memories in the dust—with the processors having to wait 100 or more cycles to get information back from main memory—hence, the focus by architects on cache memory systems that tackle this “memory wall.” And multi-core parts put even more pressure on the DRAM, demanding more capacity, lower latencies, and better bandwidth.

As pointed out in the following paper by Lee, Ipek, Mutlu, and Burger, DRAM memory scaling is in jeopardy, primarily due to reliability issues. The storage mechanism in DRAMs, charge storage and maintenance in a capacitor, requires inherently unscalable charge placement and control. Flash memories, which have the advantage of being nonvolatile, have their own scaling limitations. Thus, the search for new main memory technologies has begun.

The authors make a case for phase change memories (PCMs) that are nonvolatile and can scale below 40nm. PCMs store state by forcing a phase change in their storage element (for example, chalcogenide) to a high resistance state (so storing a “0”) or to a low resistance state (so storing a “1”). Fortunately, programming current scales linearly. However, PCMs do not come without their disadvantages: read and, especially, write latencies several times slower than DRAMs, write energies several times larger than DRAMs, and, like Flash, a limited lifetime directly related to the number of writes to a memory location.

This paper is a wonderful illustration of the way computer architects can work around the limitations of the technology with clever architectural enhancements—turning lemons into lemonade. By using an area-neutral memory buffer reorganization, the authors are able to reduce application execution time from 1.6X to only 1.2X relative to a DRAM-based system and memory array energy from 2.2X to 1.0X also relative to a DRAM-based system. They use multiple, narrower memory buffers, which reduces the number of expensive (in terms of both area and power) sense amplifiers and focus on application performance rather than the performance of an individual memory cell.

The authors also describe their investigation of the trade-offs between buffer row widths and the number of rows. To tackle the PCM’s lifetime limitation, the authors propose using partial writes to reduce the number of writes to the PCM by tracking dirty data from the L1 caches to the memory banks. With this approach, they can improve PCM lifetimes from hundreds of hours to nearly 10 years, assuming present 1E+08 to 1E+12 writes per bit for a 32nm PCM cell.

The paper concludes with some suggestions as to how the use of a nonvolatile main memory would change the computing landscape: instantaneous system boot/hibernate, cheaper checkpointing, stronger safety guarantees for file system. Now, if only someone could figure out a way to dramatically improve memory to processor bandwidth.

Back to Top

Back to Top

Back to Top

Tables

UT1 Table. Projections for transistor size and die integration capacity.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More