Today's study group was given by Valentina on the 2014 CHES paper titled "Simple Power Analysis on the AES Key Expansion Revisited" by Christophe Clavier, Damien Marion and Antoine Wurcker from the Universite de Limoges in France.
To briefly recap, "simple power analysis" (SPA) is the rather misleading moniker for the class of methods that analyse side-channel information contained within a small amount of side-channel traces captured during the encryption of a single pair of plaintext and key material. The misleading nature of the title is that these methods are anything but simple to perform---the amount of exploitable information leakage contained within a single trace corresponding to the encryption and decryption of a fixed input pair is far, far smaller than that which can be achieved by capturing a large amount of traces for a variety of input pairs to be exploited using differential power analysis (DPA).
In the side-channel community there's a growing shift in perception towards viewing side-channel analysis as an auxiliary phase in an enhanced global key search, rather than a stand-alone ‘win-or-lose’ attack. DPA attacks use the additional information available in the larger set of traces and aim to reduce the final enumeration effort to as small an amount as possible. SPA attacks instead face the challenge of having to exploit all the available information in the trace and perform a potentially demanding enumeration phase.
The work of Clavier et. al. explores attacks on the AES key scheduling algorithm. It is sufficient for the purposes of this blog to know that the AES key schedule takes (assuming AES-128) one 16-byte secret key and expands it to 11 16-byte round keys, the first of which is the same as the input secret key. The authors explore two masked implementations of the key schedule algorithm---masking aims to combine random values (masks) with sensitive intermediate values, to break any relationships between the intermediate values and observed side-channel information. The first is termed "11 byte entropy boolean masking" and generates 11 individual random bytes, each of which masks all of the bytes with each round key (the 16 bytes of a single round key are masked using the same random mask). The second is termed "16-byte entropy boolean masking", and essentially is orthogonal to the 11-byte scheme: each byte within a round key is masked with a different random byte, but each round key is masked by the same 16 random bytes. In an ideal case, all of the 11 x 16 = 176 sensitive bytes would be masked by a new random byte each time---the authors claim that their 11- and 16-byte schemes are relevant in scenarios where a device does not have enough memory or available entropy to store or generate this many random values.
SPA attacks on the key-schedule attempt to reduce the size of the set of possible secret keys as much as possible. In this work, the authors assume that they know the Hamming-weight (a common form of side-channel leakage is that the Hamming weight of an intermediate value is proportional to side-channel information) of the key byte XORed with the mask. This is a strongly simplifying assumption---in practice, an attacker would have to profile the key schedule on a second device to begin to work towards an approximation for the side-channel leakage.
The primary contribution of the paper is an adapted "guess and backtrack"-style algorithm for reducing the set of possible keys by exploiting relationships between key bytes within the key schedule, with full knowledge of the leakage corresponding to the targeted intermediate variable of the key byte XORed with its mask. The additional enumeration effort imposed by the presence of the masks is shown to be manageble, finding that with up to 30 trace measurements their attack succeeds in drastically reducing the remaining search space in a relatively short amount of time. The attacks are further analysed in the presence of a shuffling countermeasure (the order of execution of the combination of the key bytes with the masks is randomised within each 4-byte column), and the authors discover that they can adapt the algorithm to explore all permutations of the ordering, taking on average several hours to succeed. With an assumption of being able to introduce faults in specific parts of the algorithm, this time can be reduced to the order of minutes.
The techniques presented for exploiting the information leakage are intricate and clever. The work motivates the following question for this area of research: to discover methods for relaxing the assumption on the attacker needing full knowledge of the information leakage occurring.