The separation between stored programs and data-memory used by current-day digital computers has reigned virtually unchecked since its invention by the mathematician and physicist John von Neumann. By allowing different programs to be loaded into program memory, a single computer can calculate, run video games, or simulate everything from solar systems to the human brain. Likewise, by loading different datasets into separate memory partitions, the same program can calculate tour distances or munitions trajectories, play Pac-Man or Call of Duty, simulate Mars' orbit or the brain's neural networks.
By 2020, however, the reign of the von Neumann architecture will begin fading away after 75 years of dominance, to be supplanted by learning computers with heterogeneous architectures by its 100th birthday. IBM, Intel, Hewlett Packard, Microsoft, Google, Nvidia, Cray, Qualcomm and other computer architecture designers have already designed non-von Neumann architectures in order to overcome the bottleneck blocking further growth of classical computing.
"It is true that we are rapidly moving from a programming era to a learning era, and non-von-Neumann architectures are seen as a key enabler for optimizing machine learning—particularly inferencing—and reducing the energy requirements for many computational classes," said Dan Kara, research director at ABI Research. "The road to developing non-von-Neumann hardware platforms has been winding, to say the least. For example, at one time Qualcomm was working with its partner Brain Corporation to develop a radically new class of low-power non-von-Neumann hardware [since abandoned], incorporating the software of Qualcomm's Neural Processing Engine."
The Neural Processing Engine's original aim was to sidestep not one, but two von Neumann bottlenecks, both caused by the common use of a central processing unit (CPU) during each tick of the system clock. In order to execute an instruction, the first von Neumann bottleneck crops up between instructions in program memory, which must be loaded into the CPU; depending on each instruction, there will typically be several load-and-store operations from the separately partitioned data memory containing the parameters, or operands, on which the instruction will operate—for instance, the two numbers to be added together. After the CPU adds the numbers, a last data-memory operation will store the result (here the sum of two numerical operands) back into data memory. This overly complex methodology for merely adding 2+2 then repeats by loading the next instruction and its operands into the CPU through the program- and data-memory bottlenecks.
The overall speed of any von Neumann computer is thus limited by these bottlenecks between program memory, data memory, and the CPU. Von Neumann computers attempt to mitigate this eternal memory thrashing by turning to a hierarchy of level one caches atop level two caches atop level three caches and so on—each cache maintaining the most recently used items. A non-von Neumann architecture, on the other hand, attempts to eliminate these redundant load-and-store operations altogether by performing the numerical and logical operations on the memory elements while they remain in-place. This removes the limits that hobble conventional computers when faced with daunting non-deterministic polynomial-time (NP-Hard) problems. Outfitted with non-von-Neumann accelerators, such problems can be quickly solved by offloading the tough problems to a non-von-Neumann co-processor while continuing to crunch numbers that are appropriate for CPUs.
"Although Von Neumann architectures are too limited for some types of operations and applications, they are very effective for many others," Kara said. "For routine datacenter operations, CPUs will play an important role for the foreseeable future, but they will now be supported with a growing number of specialized non-von-Neumann platforms, as well as graphics processing units (GPUs), custom application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and more. Computational heterogeneity is the overarching trend."
The brain operates this way by solving both types of problems, but assigning different parts of the brain to those problems that are best suited to their varying architectures. The brain's common electrical "bit" is a voltage spike sent from its neurons (memory controllers) through its synapses (its memories). In a single step, the brain can perform both a calculation and store its results sans thrashing, but in specialized ways appropriate to seeing, smelling, touching, tasting, thinking, planning and more. As a result, the brain solves problems unsuitable for the fastest von Neumann supercomputer and in a fraction of the time (milliseconds to seconds, rather than minutes to hours) using a fraction of the energy (about 20 watts, rather than kilowatts) and using a system clock measured in tenths of a second rather than billionths of a second.
"Perhaps the biggest advantage of the existing and emergent non-Von-Neumann chipsets is their energy efficiency—thousands of times less energy compared to traditional processors. This level of energy efficiency for hyper-scale server deployments dramatically reduces costs and carbon footprint. Edge devices [sensors and other Internet of Things devices] demand that energy use be minimized," said Kara "Currently, non-von-Neumann processor technologies are primarily used as coprocessor accelerators for machine learning and deep learning; however, this narrow use will expand over time as new platforms are developed and released."
Most non-von-Neumann architectures are modeled on the human brain. IBM was first to market in 2016 when Lawrence Livermore National Labs went with the TrueNorth architecture, which models the brain right down to the digital spikes it sends between neurons via its synaptic connections. The leader of IBM's TrueNorth e-brain chips project, IBM Fellow (and 2006 ACM Gordon Bell Prize recipient) Dharmendra Modha, says the company has demonstrated how its ultra-low-power TrueNorth Neurosynaptic System can be integrated with ultra-low-power retinal cameras and other sensors.
"Over the past six years, IBM has expanded the number of neurons per system from 256 to more than 64 million, an 800% annual increase," said Modha, who is chief scientist for brain-inspired computing at IBM Research—Almaden. "IBM has developed an end-to-end ecosystem making it easy to use TrueNorth." The company has put TrueNorth hardware in the hands of nearly 200 researchers at over 40 research institutions on five continents, says Modha, and they "have demonstrated a number of applications."
Intel says it has created learning chips similar to TrueNorth but as yet uncommercialized, which it hopes to bring to the market later this year. The company says the sophisticated chips will be based on the brain by virtue of its acquisition of artificial intelligence software company Nervana.
"The future of AI will use humanlike sensors together with brain-like AI to make all sorts of ordinary devices smart," said Michael Mayberry, a corporate vice president and managing director of Intel Labs. "First responders will use human-like image-recognition to analyze streetlight camera images to quickly find missing or abducted people and smart stoplights will automatically adjust their timing to sync with the flow of traffic, thus reducing gridlock and optimizing the delays caused by starts and stops."
While most other players are concentrating on emulating the neural networks of the human brain, IBM introduced an interesting non-von-Neumann architecture recently. A paper by IBM Research—Zurich researchers that was published in Nature in October describes a non-von-Neumann architecture that uses a memory controller like a neuron, with no CPU, to operate on synaptic-like data sets in memory with no shuffling around whatsoever by virtue of harnessing the crystallization dynamics of phase-change memories.
The biggest question for all these architectures is whether programmers' non-von-Neumann brains will be up to the task of programming non-von-Neumann computers. One reason it is so easy to program conventional computers today is that von Neumann architecture is so much simpler than what happens inside our own heads. If we don't understand our own brains very well, will programming computers modeled on them become the new "human" bottleneck?
Observes Kara, "As non-von-Neumann architectures proliferate, either as core systems or coprocessor accelerators, a programming bottleneck could develop. Software engineers, as well as the development languages and toolsets they have worked with for decades (as well as education, training, and software development methodologies), assume a von Neumann runtime architecture. Specialized von Neumann chipsets require specialized software design kits, frameworks, libraries, training and more."
R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.