The guiding principle of this week's study group, led by Carolyn, is the
quest for adversarial confidence in the key guesses output by
side-channel analysis. A common feature of differential side-channel analysis (SCA) is the divide-and-conquer strategy, in which
small parts (subkeys) of the global (full) key are targeted separately,
and concatenated to form a key guess. This strategy is possible since
cryptographic algorithms (like block ciphers) usually, at some stage,
process intermediate values that depend on just a few key bits, so that
the space of possible values is a searchable size. As an example,
consider an S-box in the first round of AES as the adversary's target.
The S-box applies a highly nonlinear transformation to the XOR of a
plaintext byte and a key byte. In this scenario the attacker knows the
plaintext, so can compute the S-box output for each of the 256 possible
values of the subkey. If we assume the attacker has some knowledge of
the functional form of the side-channel leakage (e.g. that it’s
proportional to the Hamming weight of the processed data), then
he can use this to predict the leakage under each of the 256 hypotheses.
Applying this technique to several collected traces (relating to
different plaintexts), the attacker can compute the correlation between
the 256 leakage prediction vectors and the N vectors of trace
measurements, aligned at each point in time. Consequently, the largest
correlation should indicate the true value of the subkey and the time
point at which the S-box output is processed by the device. The attacker
repeats this process for each key byte, and concatenates his 'best
guesses' for each byte to form his best guess for the global key, which
he then tests, for example by attempting to decrypt a known ciphertext.
This will fail even if just one subkey guess is wrong, and the attacker
won't know where the problem is. Thus the attacker seeks some extra
information from the 'divide' stage, for example a ranking (or even a
probability distribution) on the subkeys, and some measure of confidence
associated with the guessed subkeys to allow him to target the weakest
parts of the key in a post-SCA brute-force search.
The focal point of the study group was 'Success through confidence: Evaluating the effectiveness of a side-channel attack' [paper,slides] by Thillard et al., presented at CHES this year. In a nutshell the paper suggests an approach to computing a confidence measure for each subkey (so he knows which bytes need brute-forcing), and shows how picking robust strategies increases confidence. For the proof of concept, the authors consider an idealised attack scenario in which the attacker knows the true leakage function φ so can perfectly predict the data-dependent part of the power consumption, and the remaining variance is Gaussian noise. Assume also that there are N traces {li}i=1N corresponding to N plaintext inputs, and that it's possible to compute the target function under each subkey hypothesis and to map these to columns of power consumption predictions {φk,i}i=1N.
At SAC 2008, Rivain [paper] showed how to compute precise success rates in this setting, beginning with the observation that rankings in the correlation vector are equivalent to rankings in the alternative distinguishing vector 1/N Σi=1N φk,i li. This vector is more convenient to work with since the probability distribution can be nicely characterised as a multivariate Gaussian. It's hence very easy to calculate the success rate, by computing (directly from the multivariate Gaussian cumulative distribution function) the joint probability that the distinguisher score associated with the correct key hypothesis is greater than the scores associated with each of the incorrect hypotheses. The authors also refer to the work of Fei et al. [paper] presented at CHES 2012, which showed that the distribution of distinguishing vectors can be expressed entirely as a function of the algorithmic properties of the target function (scaled by a factor depending inversely on the noise size). These properties were given the description 'confusion coefficients', and are useful because they link side-channel vulnerability to algorithmic properties of a target function.
Thillard et al. extend the notion of confusion coefficients to the setting of correlation differential power analysis (DPA), in which multiple bits are targeted by combining them in a power model leakage function (e.g. the Hamming weight or Hamming distance of intermediate values). Their proposal necessarily depends on the form of leakage and the strong assumption that the attacker knows this form of the leakage. This means that they don’t quite achieve the goal of Fei et al.’s original notion of confusion coefficients, which was to capture the impact of algebraic properties on DPA, independent of the physical implementation. Having defined correlation confusion coefficients, they combine the notion with Rivain's work on exact success rates – that is, they express the success rate as a function of the correlation confusion coefficients. They wish to investigate the stability of a key guess (to encourage confidence) - intuitively if a subkey is ranked top after n trace measurements and also after n+1, n+2, n+3 traces and so on, it's more likely to be correct than if the ranking fluctuates. Helpfully, the authors consider the evolution of the attack giving a matrix of outcomes at selected intervals as the number of traces increases, and they allow for strategies which need not output a subkey guess at all (e.g., in the case that no stable ‘winner’ emerges). They advocate selection strategies which trade off better quality guesses for more missing values (implying more brute-force searching after the side-channel phase of the attack). The confidence metric they propose for a selection strategy is the probability that it returns the correct key, divided by the total probability of it returning any guess. It can be computed by concatenating distinguishing vectors at different points in time (the resulting vector is again treated as a multivariate normal distribution). Taking more of the attack evolution into account increases the confidence of a subkey guess but also the frequency with which the attack fails to return a subkey. The authors also consider different types of selection rules, and how to deal with the inevitable false positives.
Another recent paper by Veyrat-Charvillon et al. at SAC2012 [paper] provides a method of assigning probabilities to subkeys and enumerating over the most likely global keys until the correct key is found, or the attack reaches the computational limits of the attacker. This is clearly related in its aims, but it seems that there is work remaining to be done to bridge the gap between the two papers. This continued work on SCA is taking into account the purpose of device evaluation by trying to represent true capabilities of realistic adversaries; however when this is not possible it is still useful to start at some idealised scenario, and work from there.
The focal point of the study group was 'Success through confidence: Evaluating the effectiveness of a side-channel attack' [paper,slides] by Thillard et al., presented at CHES this year. In a nutshell the paper suggests an approach to computing a confidence measure for each subkey (so he knows which bytes need brute-forcing), and shows how picking robust strategies increases confidence. For the proof of concept, the authors consider an idealised attack scenario in which the attacker knows the true leakage function φ so can perfectly predict the data-dependent part of the power consumption, and the remaining variance is Gaussian noise. Assume also that there are N traces {li}i=1N corresponding to N plaintext inputs, and that it's possible to compute the target function under each subkey hypothesis and to map these to columns of power consumption predictions {φk,i}i=1N.
At SAC 2008, Rivain [paper] showed how to compute precise success rates in this setting, beginning with the observation that rankings in the correlation vector are equivalent to rankings in the alternative distinguishing vector 1/N Σi=1N φk,i li. This vector is more convenient to work with since the probability distribution can be nicely characterised as a multivariate Gaussian. It's hence very easy to calculate the success rate, by computing (directly from the multivariate Gaussian cumulative distribution function) the joint probability that the distinguisher score associated with the correct key hypothesis is greater than the scores associated with each of the incorrect hypotheses. The authors also refer to the work of Fei et al. [paper] presented at CHES 2012, which showed that the distribution of distinguishing vectors can be expressed entirely as a function of the algorithmic properties of the target function (scaled by a factor depending inversely on the noise size). These properties were given the description 'confusion coefficients', and are useful because they link side-channel vulnerability to algorithmic properties of a target function.
Thillard et al. extend the notion of confusion coefficients to the setting of correlation differential power analysis (DPA), in which multiple bits are targeted by combining them in a power model leakage function (e.g. the Hamming weight or Hamming distance of intermediate values). Their proposal necessarily depends on the form of leakage and the strong assumption that the attacker knows this form of the leakage. This means that they don’t quite achieve the goal of Fei et al.’s original notion of confusion coefficients, which was to capture the impact of algebraic properties on DPA, independent of the physical implementation. Having defined correlation confusion coefficients, they combine the notion with Rivain's work on exact success rates – that is, they express the success rate as a function of the correlation confusion coefficients. They wish to investigate the stability of a key guess (to encourage confidence) - intuitively if a subkey is ranked top after n trace measurements and also after n+1, n+2, n+3 traces and so on, it's more likely to be correct than if the ranking fluctuates. Helpfully, the authors consider the evolution of the attack giving a matrix of outcomes at selected intervals as the number of traces increases, and they allow for strategies which need not output a subkey guess at all (e.g., in the case that no stable ‘winner’ emerges). They advocate selection strategies which trade off better quality guesses for more missing values (implying more brute-force searching after the side-channel phase of the attack). The confidence metric they propose for a selection strategy is the probability that it returns the correct key, divided by the total probability of it returning any guess. It can be computed by concatenating distinguishing vectors at different points in time (the resulting vector is again treated as a multivariate normal distribution). Taking more of the attack evolution into account increases the confidence of a subkey guess but also the frequency with which the attack fails to return a subkey. The authors also consider different types of selection rules, and how to deal with the inevitable false positives.
Another recent paper by Veyrat-Charvillon et al. at SAC2012 [paper] provides a method of assigning probabilities to subkeys and enumerating over the most likely global keys until the correct key is found, or the attack reaches the computational limits of the attacker. This is clearly related in its aims, but it seems that there is work remaining to be done to bridge the gap between the two papers. This continued work on SCA is taking into account the purpose of device evaluation by trying to represent true capabilities of realistic adversaries; however when this is not possible it is still useful to start at some idealised scenario, and work from there.