This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography':
a set of questions compiled to give PhD candidates a sense of what they
should know by the end of their first year. This week we consider what can be done to mitigate the thread of side-channels against AES implementations...
Sidechannel defences: Why?
For a modern cryptographic scheme to be taken seriously, we generally require some form of security justification. In the case of AES, we believe it acts like a random permutation if the adversary doesn't know the key. However, if the adversary has side-channel information, it is possible that this is no longer the case. So, what can we do about it? Ideally we would create an implementation that is completely impervious to side-channel attacks. However, this effectively means the implementation must sit in total isolation, with absolutely no output streams - which makes it rather pointless!
Perhaps we can make sure that whatever we do, it doesn't matter if the AES implementation leaks information through side channels? This leads to the field of leakage resilient cryptography, which is a very strong security requirement indeed. As such, schemes secure in these conditions (of which there are very few) tend to be drastically less efficient than those which avoid (/ignore) the problem. Since trade-offs must always be made in design, in practice we tend to use schemes that assume AES doesn't leak any information, and combine them with implementations that contain defences against some of the simpler side-channel attacks. The hope is that this pushes the attack cost beyond the value of the information secured and so (even though they could) no adversary will bother attacking the system because it is no longer viable.
Some Basic Defences
So, with that in mind, let us consider a couple of basic defences against some of the less sophisticated side-channel attacks. As pointed out in the question, these techniques may be easily bypassed, so think of this post as explaining the general concept, rather than providing any sensible suggestions!
Attack: Timing Attack
Weakness:
Some implementations run in different times depending on their inputs.
As such, by observing how long the system takes to respond one learns
something about the key/inputs.
Defence: Constant time implementations. As the title says, the best defence against a timing attack is to make sure the implementation takes constant time, and most implementations will be constant-time nowadays. This might not be too hard in hardware, but in software it can be very difficult indeed, because the microcode (the internal programming of the processor) is generally a trade secret and open to change.
Attack: Power Analysis (DPA,SPA)
Weakness: The power usage of some implementations is correlated with the key material, often due to the hamming distance when storing values. For more information, read the blog from two weeks ago.
Defence (1): Masking Rather than using the AES Sbox directly, one applies a mask to the input value, and looks this up in a masked Sbox (which is effectively the Sbox with its values reordered to accommodate the mask). Then, even though the attacker may be able to detect correlations between certain internal variables, these are all masked, and do not [directly] correspond to the key material as they had previously. More complex masking schemes will be more complex to instantiate, but lead to better resistance to attack.
Defence (2): Shuffling To conduct a power analysis attack, the attacker uses the fact they know the internal structure of the AES scheme. If we shuffle the order of the S-boxes in our implementation (by some secret permutation), the adversary will not know how their readings correspond to the internal key materials. A variation on this is the deliberate use of non-determinism, allowing the processor to reorder certain collections of instructions on it's own.
Attack: Cache-flooding
Weakness: Implementations using lookup-tables (eg for the SBox) will be more or less efficient if the appropriate cells are already in the processors cache. By pushing most of the lookup table out of the cache, an attacker can observe whether the appropriate cells are called, leaking information. Can also be observed as a timing attack or by power analysis if the cost of loading the cache can be observed.
Defence: Dont use lookup tables on secret data! The simplest defence in this list - if you don't want to leak information about which lookup entries have been used, don't use lookup tables. In the case of AES this is reasonable, because the AES Sbox can actually be computed as a simple function on the input byte. This would be much less practical with (for example) the DES Sbox, which does not have this structure.
A blog for the cryptography group of the University of Bristol. To enable discussion on cryptography and other matters related to our research.
Thursday, July 30, 2015
Friday, July 24, 2015
52 Things: Number 42: Look at your C code for Montgomery multiplication above; can you determine where it could leak side channel information?
This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to
do Cryptography: a set of questions compiled to give PhD candidates a
sense of what they should know by the end of their first year. In this
post (spoiler alert!) we enumerate various flavours of side-channel
attacks.
A
few months ago (back in March) you may remember I posted for the 23rd
of the 52 things in this series (can be found here). The article was entitled “Write a
C program to implement Montgomery arithmetic” and contained parts
of an implementation to do this. In this article we are going to look
at this implementation and see how it could leak information to give
us a practical idea of what leakage may look like.
Before
progressing however I would like to remind you of my last post which
examined the difference between SPA and DPA attacks. You will
remember from there that SPA attacks use a single or very few traces and
work by spotting trends in the pattern (such as timing or instruction
sequences) while DPA attacks use many traces and aim to derive
intermediate vales of an algorithm and thus the secret information by using hypotheses of the secret data and a correct leakage model.
Before looking at the Montgomery multiplication algorithm then it is
worth stating from the outset that if hypotheses of secret data and
corresponding leakage models can be derived for the algorithm, DPA style attacks can be used to derive
intermediate values which will mean that the algorithm will leak data
being processed. If this data is therefore secret, we can
already say that this algorithm is going to leak through using DPA
style attacks.
As
mentioned in my last post however, we know that DPA style attacks
are significantly harder than SPA as they require the ability to form
hypotheses of secret data, more traces and their success will depend
significantly on the noise produced by the device. The question then
is how our implementation will leak using SPA attacks where things
such as timing or sequences of instructions being executed can be
exploited. To understand this lets look at each of the four stages I
presented for the algorithm last time separately. Things we are
interested in are things such as conditional statements or loops as
this may be exploited for timing attacks.
1. The GCD Operation
This
section uses the binary extended gcd operation to find the integers
r-1
and m’
such
that rr-1
= 1 + mm’.
If we assume therefore that our algorithm for the extended gcd
operation runs in constant time then we can assume that this
operation is safe.
2. Transform the Multipliers
This
stage computes abar
= ar mod m
and
bbar = br mod m and
therefore as this is a simple operation, it is unlikely to leak
provided the operations and instructions required (such as the multiplier) run in
constant time.
3. Montgomery Multiplication
This
is the main section of the algorithm. The first sub-stage to of the
function is to calculate t
= abar*bbar
and in the same way as the previous two stages we can assume no
leakage.
The
second stage however is the computation of u
= (t + (tm’ mod r)m)/r and
it is here that we encounter two
conditional statements the first i:
if
(ulo < tlo) uhi = uhi + 1;
which
allows for a carry as we have a 128 bit implementation on a 64 bit
architecture (see article for more information). Observing the time of execution or a power trace will
therefore give insight into whether or not this conditional was
executed and therefore whether or not these ulo was higher than tlo.
Again
we have a second conditional statement which tests if u >= m.
if
(ov > 0 || ulo >= m)
ulo
= ulo - m;
This
will also reveal whether either of these conditions is true and thus
leak information about the values
being worked on.
4. The Inverse Transformation
Here
we
compute ur-1
mod m which is the product of
a and b modulo m.
In
the same way as stages 1 and 2 did not leak we can also say that this
stage will not leak.
Friday, July 17, 2015
52 Things: Number 41: Are all side-channels related to power analysis?
This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this post (spoiler alert!) we enumerate various flavours of side-channel attacks.
Right, shall we keep this one simple? Side-Channel Attacks (SCA) utilise information acquired in the context of physical implementations of cryptographic algorithms, as opposed to classical cryptanalytic attacks which target theoretical weaknesses. Power analysis is perhaps the most popular flavour of SCA, but it is certainly not the only one.
It would be beyond the scope of this blog to provide a comprehensive list of all side-channels and attack methodologies. Instead, here are some of the most common SCA targets, together with references to simple but clever attacks:
- Power consumption. The instantaneous power consumption of a device can leak information about the processed value, e.g. its Hamming weight. I recommend Mangard's attack on the AES key schedule as a starting point for those interested in Simple Power Analysis (SPA).
- Execution time. Such attacks exploit data-dependent differences in running time of algorithms. A famous target is the square-and-multiply algorithm used in modular exponentiation, for example Kocher's attack. Fun fact: even a constant-time implementation is vulnerable to power attacks.
- Electromagnetic radiation. Apparently, it's rather tricky to get the measurements right for this one, but once that's done -- the attack methodology is similar to Power Attacks. Here's the most cited paper dealing with EMR.
- Other. There is no limit to what can constitute a target for SCA, and that's in part why they are so interesting. Here's some more ideas:
- an acoustic attack on RSA,
- an attack that uses visible light emitted from computer LED (light-emitting diodes),
- a smudge attack on smart phone touch screens,
- an attack exploiting error messages, also known as padding oracle attack
Writing the above, I realise how it could be unsettling to know or to find out that there are so many loopholes. SCA is very much a cat-and-mouse game, and researchers usually recommend ways to avoid the signaled vulnerabilities.
Monday, July 13, 2015
Parallel Key Enumeration
Now available on eprint is: "How to Enumerate Your Keys Accurately and Efficiently After a Side Channel Attack".
This paper by Bristol Crypto shows how the key ranking and key enumeration problem can be framed as an optimisation problem, the knapsack problem, to allow extremely quick and efficient solving. The paper details a novel graph algorithm which enables the ranking of keys accurately and nearly instantly, and provides best in class performance results for key enumeration. The algorithm can easily be fully parallelised, with excellent scaling results as more processors are used.
For full details, see the associated blog article on the Silent website, here. The parallel key ranking algorithm has been developed into a public API, which is available on the tools page.
This paper by Bristol Crypto shows how the key ranking and key enumeration problem can be framed as an optimisation problem, the knapsack problem, to allow extremely quick and efficient solving. The paper details a novel graph algorithm which enables the ranking of keys accurately and nearly instantly, and provides best in class performance results for key enumeration. The algorithm can easily be fully parallelised, with excellent scaling results as more processors are used.
For full details, see the associated blog article on the Silent website, here. The parallel key ranking algorithm has been developed into a public API, which is available on the tools page.
Friday, July 10, 2015
52 Things: Number 40: What is normally considered the difference between SPA and DPA?
This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography':
a set of questions compiled to give PhD candidates a sense of what they
should know by the end of their first year. We continue with our
side-channel track and discuss the differences between a side-channel
attack and a fault attack.
Power or Electromagnetic (EM) analysis
attacks divide into two types of attacks, Simple Power Analysis (SPA)
or Differential Power Analysis (DPA). Both of these types of attack
work using either electromagnetic or power traces of a device but
vary fundamentally in the number of power of traces they require and
how these traces are analysed. Before examining the differences
between these attacks, it is worth looking at what a power/EM trace
is.
Power traces
The power of CMOS circuits can either be static or dynamic. Static power consumption is the power consumed when the circuit is static (ie. no switching is taking place) and is typically small. Dynamic power consumption is the power consumed by the switching that occurs in the circuit between 0 and 1 or 1 and 0. Dynamic power consumption is typically the greatest contributor to power consumption in a circuit and as it depends on the data being processed by the circuit.
Dynamic power consumption comes from two factors. The first is the capacitance charging current and the second is the short-circuit current. Each CMOS cell has a load capacitance connected to the output of the cell. This load capacitance includes the wires that connect the cells to subsequent cells and also the input capacitances of the cells.
A CMOS cell draws current from the power rail $V_{dd}$ to charge these capacitances which in turn leads to power consumption according to $P = \alpha f C_l V_{dd}^2$, where $\alpha$ is the average number of $0 \rightarrow 1$ transitions which occur every clock cycle. \cite{dpabook}. This charging happens when there is a switch from $0 \rightarrow 1$ at the output. When there is a switch from $1 \rightarrow 0$, the current is drawn from $C_L$ to $gnd$ via the NMOS and not $V_{dd}$.
The second part of the contribution to power consumption is the short circuit current. This occurs during a switch when it is moving from $0 \rightarrow 1$ and $1 \rightarrow 0$ and occurs when both the pmos and nmos transistors are conduction at the same time - though this happens very briefly. This power consumption is in accordance with $P_{sc} = \alpha f V_{dd} I_{peak} t_{sc}$, where $I_{peak}$ is the current peak during switching and $t_{sc}$ is the time for which the short circuit exists. [1].
Understanding these two points of dynamic power consumption we can see that all switching $1 \rightarrow 0$ and $0 \rightarrow 1$ will consume power through the short circuit current, however switching from $0 \rightarrow 1$ will consume more power due to the charging of the load capacitance. If we are able to measure the power consumption (or EM field as current of varying strength will produce an EM field of equal variance allowing the measurement of the EM field to give a measurement of power consumption) of a device accurately we can therefore determine the number of switches which will let us look inside the device in two ways. First by allowing us to determine a particular operation (a multiplier for instance may require more switches than an x-or gate for instance) and secondly, and more crucially, the data being operated on by the operation, as this may affect the switching.
SPA and DPA Attacks
The main difference between SPA
attacks and DPA attacks is the number of traces required. SPA attacks
typically use one or very few traces whereas DPA attacks use many.
They also vary in the way they exploit the dynamic power consumption
of the device with SPA attacks identifying sequences of operations, however they can also exploit data dependency as in the case of
templating attacks for instance. This is illustrated by the well
known SPA attack on the square and multiply algorithm for binary
expansion in RSA. Here, if the binary value in the exponent is 0, the value
is squared and if it is a 1 then the value is squared and multiplied.
Viewing this on a single trace it is possible to see the shape of a
square operation and the shape of a square and multiply operation and
thus read of each bit of the key as a 0 or a 1. The beauty of this
attack is that only a single trace is required to make this
observation, making it an SPA attack.
DPA attacks on the other hand
exploit only the data dependency element of the power consumption by
using multiple traces and statistical techniques. They focus on the
data dependency of the power consumption and work by creating
hypotheses of how much switching (and therefore change in power
consumption) there will be for given data. These hypotheses are known
as leakage models and are usually hamming weight or hamming distance.
If this leakage model is correct, the power traces should reveal
information being processed according to it, although in reality this
is always combined with noise which distorts the data/power
relationship. In DPA attacks, secret data values being operated on
can be determined by estimating them and seeing if the representation
of them according to a leakage model correlates to a number of
different power traces. A DPA attack therefore requires a number of
traces – the number can vary between as few as 50 to thousands depending on the level of noise and accuracy of the
measurements.
[1] Mangard, Stefan, Elisabeth Oswald, and Thomas Popp. Power analysis attacks: Revealing the secrets of smart cards. Vol. 31. Springer Science & Business Media, 2008.
Thursday, July 9, 2015
52 Things: Number 39: What is the difference between a side-channel attack and a fault attack?
This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue with our side-channel track and discuss the differences between a side-channel attack and a fault attack.
Side-channel attacks (SCA) are a class of attacks in which an adversary attempts to deduce information about a targets computation by observing side-channel leakage (e.g. timing, power consumption, electromagnetic emanations, acoustic noise etc.).
Fault attacks (FA) are a class of attacks in which an adversary attempts to exploit the result of a faulty computation. The fault may be the result of a program/design bug (e.g. Intel's famous FDIV bug), or a fault induced directly by the adversary (e.g. power glitch, clock glitch, temperature variation, ion-beam injection etc.). 'The Sorcerer’s Apprentice Guide to Fault Attacks' gives a good overview for a few of the more hands-on techniques for fault injection.
To sum up, the primary difference between the two is that SCAs exploit computation leakage and FAs exploit the result of erroneous computation. It's not a very insightful (or interesting) answer so perhaps a better question is to 'Discuss some similar approaches to side-channel and fault attacks'.
I'll discuss examples for three sub-classes for both SCAs and FAs: non-invasive, semi-invasive and invasive. Just as a side note, there may be some disagreement on the exact classification of non-invasive, semi-invasive and invasive attacks. These are my opinions and if you disagree, I invite you to take it up with my publisher.
Non-Invasive:
An attack is classed as non-invasive if the adversary has no physical contact with the target. I give solitary examples below but there are more out there.
SCA: Timing attacks are arguable both the most applicable and the least invasive SCA vector. This can be partly attributed to the fact that timing attacks can be carried out remotely but also because they are easily introduced. Consider a large library such as OpenSSL that supports numerous platforms and a whole suite of crypto applications. Ensuring that all sensitive computation is programmed to be constant time for each individual platform quickly becomes cumbersome. Add to this that you will be fighting the compilers efforts to optimise your code makes it an uphill battle.
FA: Non-invasive fault attacks are somewhat less common as it requires the faulty behaviour to be triggered based on the input. One could possibly build an attack using the aforementioned Intel FDIV bug which will return an incorrect division result for one in every 9 billion random inputs. I can't seem to find any reference to a concrete attack so we'll take it on faith that it may be possible.
Semi-Invasive:
An attack is classed as semi-invasive if the adversary has limited physical contact with the target. I give solitary examples below but there are more out there.
SCA: In a power analysis attack the adversary monitors the power consumption of a the target device during the computation of a secret datum. The theory behind the attack is that the dynamic power consumption will be (in some way) related to the data being processed. If the adversary can approximate this relationship then they may also be able to make deductions about the data being processed. This is arguably semi-invasive as the adversary would require a power tap to the target device which is not always available and so the adversary would need to modify the target.
FA: Clock and power glitch attacks aim to induce faulty behaviour in a target device by altering the target environment. Consider the clock input to a chip, this clock governs the speed at which the target device operates. The maximum clock rate of a device is bounded by the 'critical path' of the target which is the longest amount of time required for a combinatorial circuit to reach a stable state. Violation of this bound will trigger race conditions in the device and hence result in undefined behaviour. This makes glitch attacks somewhat unpredictable and sometimes hard to reproduce but incredibly effective!
Invasive:
An attack is classed as invasive if the adversary has unlimited resources and access the target.
SCA: Probing SCAs use a direct tap on the data bus of a target device allowing an adversary to (almost directly) read off any information that goes across the bus. The target must be fully decapsulated and carefully examined to precisely target the secret data.
FA: Probing FAs allow an adversary to completely change the behaviour of the target device. Once again, the target must be fully decapsulated and mapped out to accurately influence the behaviour but once this is done, adversary has the ability to re-wire the target and consequently alter its operation.
Both SCA and FA have been demonstrated on real-world devices to devastating effect. There are, of course, several countermeasures devised to mitigate for these attacks but these will come up later in our '52 things' series.
Monday, July 6, 2015
Call for participation: Summer School on Secure and Trustworthy Computing
As part of the PRACTICE project our group organizes together with TU Darmstadt a Summer School on Secure and Trustworthy Computing in Bucharest, Romania (23-27 September). The school will cover a broad range of topics from hardware and cryptographic underpinnings to practical deployment in cloud scenarios.
The school will benefit from a large number of international experts and Bucharest is a fun (and cheap) city to visit.
More information on how to apply, deadlines, etc. can be found on the school webpage:
The school will benefit from a large number of international experts and Bucharest is a fun (and cheap) city to visit.
More information on how to apply, deadlines, etc. can be found on the school webpage: