The same attributes that give deep learning its ability to tell images apart are helping attackers break into the cryptoprocessors built into integrated circuits that were meant improve their security. The same technology may provide the tools that will let chip designers find effective countermeasures, but it faces an uphill struggle.
Side-channel attacks have been a concern for decades, as they have been used in the hacking of smartcard-based payment systems and pay-TV decoders, as well as in espionage. Yet the rise of Internet of Things (IoT) and edge systems and their use in large-scale, commercially sensitive applications makes such attacks a growing worry for chipmakers. The innate connectivity of IoT devices means success in obtaining private encryption keys from them may open up network access on cloud-based systems that rely on their data.
Although there are side-channel attacks that can be deployed remotely by measuring the timing of responses from software running on a server, many of the most pernicious attacks rely on physical proximity and can be performed using low-cost electrical instruments (see "Secure-System Designers Strive to Stem Data Leaks," Communications, April 2015). The switching of logic gates creates changes in the electromagnetic fields around them that can be detected by probes placed close to the chip's surface by an attacker. Another side channel stems from rapid changes in energy consumption that can be seen by attaching probes to the device's power-supply connections.
Though it is not the only approach, a common technique is to profile the target using known keys and plain text, collecting thousands of traces of the emissions of the process. Careful analysis of the traces often will show correlations between power or emissions spikes, and the value of the byte of the encryption key being processed during that interval.
Alric Althoff, principal engineer at secure-hardware tools supplier Tortuga Logic, says, "The majority of the attacks involve very straightforward statistics."
Often, the signals that show a dependency on data are restricted to a few samples within the traces, which may cover thousands of samples that are little more than noise. One approach the design community has tried to apply is to use formal models of computation to predict how much data-dependency is present in the switching of logic gates, and using that information to try to remove correlated samples from appearing in the traces.
"Design based on the theoretical models does mitigate the vast majority of the leakage. The problem is that, if you are free to collect millions of traces and analyze them, correlations are going to start popping out even if they are very small," Althoff says. A second issue is that the layout of transistors on the silicon die lead often leads to larger emissions than predicted. "Those are not modeled using formal techniques," he adds, though improvements in design tools may lead to simulators that can perform accurate-enough assessment without having to go to actual silicon before testing.
The elliptic-curve code known as Curve25519 benefited from algorithmic modeling so it could more easily resist side-channel analysis. However, in 2017, a team from the University of Pennsylvania found smartphone circuitry that clearly leaked the positions of zeroes in intermediate calculations that pointed directly to subkey bytes.
Many countermeasures attempt to hide the samples that correlate well with key data. One common approach is masking, which combines the actual key bytes with randomly chosen dummy values just before the most vulnerable part of the encryption sequence that is most often targeted by adversaries: often where the key is first used to disguise the plaintext data. Dummy operations make it more difficult for hackers to align traces and find the correlations embedded in them.
At the International Solid State Circuits Conference (ISSCC) in February, researchers from Purdue University and the Georgia Institute of Technology showed how the bulk of electromagnetic emissions come from the layers of metal interconnect closest to the top of an integrated circuit. Disconnecting a cryptoprocessor from direct access to the three highest metal layers, the team cut usable interference significantly in one experiment.
A second project led by the Purdue team, presented at the Custom Integrated Circuits Conference (CICC) a month later, isolated the cryptoprocessor behind an on-chip power regulator that prevents an external observer from seeing the small changes in current that reveal circuit behavior. At the same conference, Intel Labs' director of circuit technology research Vivek De described a number of methods that use on-chip power converters to inject noise into the power signals that an attacker would typically probe at the PCB level. He claimed the techniques could cut by five orders of magnitude the signal-to-noise ratio.
Althoff says, "The correlation with the signal-to-noise is very well understood. For a specific reduction in correlation, you want a certain amount of noise, and you can compute that."
The growing question is determining how much noise you need to inject to maintain secrecy. Attackers are moving from conventional statistical tools to deep learning because it readily combines data from disparate positions across traces, rather than focusing attention on a small number of what they hope are telltale samples. In doing so, deep learning reverses the effects of countermeasures. The convolutional filters often used in DNNs to detect features no matter where they are in images appear to be effective at filtering out the noise introduced by masks and dummy operations.
However, deep learning is far from a reliable tool. Althoff points out that subtle artifacts in trace captures, such as a constant offset in the underlying amplitude, do not cause problems for statistical models, but can easily throw off a machine-learning pipeline. Research has shown if training is not carefully controlled, deep-learning models are highly prone to overfitting, which significantly reduces their ability to correctly predict keys when tested on traces they have not seen before.
Guilherme Perin, senior security analyst at Riscure, says another technique that tends to show better performance overall is ensemble learning using subtly different neural-network models and fed with complementary subsets of the traces obtained during profiling.
Because deep learning can be highly unreliable when used in attacks, its use poses bigger problems for those building defenses who try to apply it as a form of penetration testing.
"If you attack something successfully, you've proven that it's vulnerable. If you're not successful, you've proven nothing," Althoff notes. "The best performing attacks in nature are unpublished. Attackers doing nefarious things are not sharing their approaches. An attacker in the business of attacking is going to run many at once and will have a whole set of scripts set up to help perform them."
"There is a trend in explainable learning, to really understand why the models are making the decisions they are making."
There is some hope that machine learning will provide an answer as to whether designs are vulnerable, and this is one reason why Riscure is pursuing it. However, rather than training models just to find keys, the Dutch consultancy's approach focuses on looking inside the model that training creates. Perin says one approach that has demonstrated some success is to focus on the neurons that have the strongest influence on correctly predicted key bytes and tracing back to the combinations of samples that activate them. Although sifting through this data is a manually intensive process today, it may provide the basis for automated systems that identify which operations in a long sequence are leaking the most information.
Althoff sees potential in analyzing the models created by techniques such as deep learning. "There is a trend in explainable learning, to really understand why the models are making the decisions they are making. We want to look at the weights that correspond to a certain region and why it weights them more heavily."
What makes better analysis of information leakage increasingly important is the cost and energy overhead of many of the countermeasures. The Purdue team claims their on-chip regulator made it impossible for a DNN to distinguish data operations from noise on the power rails. Unfortunately, it has a power overhead of 50% when the cryptoprocessor is running. It also adds to cost through increased silicon area.
"Many mitigation techniques come with major overheads in power performance and die area that are impractical for IoT devices," De says.
De says one option is to only invoke strong countermeasures if the device detects behavior that indicates side-channel analysis is under way. One method is direct; he points to an idea proposed five years ago by a team from Tohoku and Kobe universities in Japan; they placed inductors on the surface of the IC at strategic points to pick up the distortion in electric field caused by a nearby measurement probe. Another technique might be to monitor how the cryptoprocessor is being used, and whether that points to a large number of operations being profiled (although this is vulnerable to false positives).
The game of cat and mouse will continue until researchers develop better tools to determine how much information circuitry leaks, and what are the limits of detection.
Das, D., Danial, J., Golder, A., Ghosh, S., Raychowdhury, A., and Sen, S.
Deep Learning Side-Channel Attack Resilient AES-256 using Current Domain Signature Attenuation in 65nm CMOS Proceedings of the 2020 IEEE Custom Integrated Circuits Conference (CICC)
Secure-System Designers Strive to STEM Data Leaks, Communications, April 2015, 18–20, http://bit.ly/38MvW28
Masure, L., Dumas, C., and Prouff, E.
A Comprehensive Study of Deep Learning for Side-Channel Analysis IACR Cryptology ePrint Archive (2019), https://eprint.iacr.org/2019/439
Perin, G., Ege, B., and Chmielewski, L.
Neural Network Model Assessment for Side-Channel Analysis IACR Cryptology ePrint Archive (2019), https://eprint.iacr.org/2019/722
Homma, N., Hayashi, Y., Miura, N., Fujimoto, D., Tanaka, D., Nagata, M., and Aoki, T.
EM Attack is Non-Invasive? Design Methodology and Validity Verification of EM Attack Sensor, Proceedings of the 2014 Conference on Cryptographic Hardware and Embedded Systems (CHES 2014), Lecture Notes in Computer Science, vol 8731.
©2020 ACM 0001-0782/20/10
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.