Bristol Cryptography Blog: October 2011

Monday, October 31, 2011

InfoSecHiComNet 2011

Last week I was attending a conference we helped organise in India entitled InfoSecHiComNet 2011. The conference received 122 submission, of which 14 went in to the conference program. These were complemented by invited talks by Jorn-Marc Schmidt, Ingrid Verbauwhede, Benedikt Gierlich, Saibal Pal, Palash Sarkar, and Sanjay Burman.

We expect that this conference will be repeated under the less general theme of "Cryptography and Cryptographic Engineering", which we hope will help encourage cryptography research in India. This conference is expected to complement CHES, since the number of submissions generally received by CHES demonstrates the popularity of research into topics related to cryptographic engineering. The intention is also to hold conferences that will not be in direct conflict with Indocrypt.

Tuesday, October 25, 2011

Study Group: Physical security and FPGAs - Part I

In this study group Marcin and Philip started the discussion on Physical security of FPGAs which will be followed by a second part in two weeks. They based the study group mostly on two papers by Saar Drimer, "Volatile FPGA design security - a survey" [pdf], "Security for volatile FPGAs (Technical Report)" [pdf] and on the paper "Protecting multiple cores in a single FPGA design" [pdf] by Drimer, Güneysu, Kuhn and Paar.

As we have both theoretic and practical cryptographers in our group, Marcin and Phil started by describing the basic technology of a Field-Programmable Gate Array: Unlike ASICs (Application-Specific Integrated Circuit), the function of a FPGA is not determined when it is manufactured. Instead, FPGAs have to be configured; based on their configuration mechanism, they are either

volatile, i.e. the configuration can be changed, or
non-volatile where the configuration is "burnt" into the device, e.g. using anti-fuses.

The key manufacturers of FPGAs are Xilinx, Altera and Actel which was bought by Microsemi.

The study group focused on volatile FPGAs, which essentially consist of an array of SRAM lookup tables (LUTs) which map a fixed number of input bits (e.g. 4) to a smaller number of ouput bits (e.g. 1) and configurable interconnect. The LUTs basically emulate the truth tables of the logic gates used in ASICs while the configurable interconnect assures that basically all LUTs can be "wired together" as required. Being SRAM, the LUTs have to be reconfigured at boot time which provides the volatility of the FPGA. A comparison of FPGAs to CPUs and ASICs results (roughly) in the following table:

	CPU	FPGA	ASIC
Speed	low	medium	high
Power consumption	high	medium	low
Implementation cost	low	low	high (tapeout, masks)
Cost per unit	medium	high	low
Updateability	good	doable	none
Implementation	C, Java,...	HDL	HDL

Of course FPGAs come with a couple of security related issues including such as

leakage (side channel attacks)
semi-invasive attacks (fault injection)
invasive attacks
trustworthy (remote) configurations (hw trojans, reliability)
protection of Intellectual Property in configuration bitstreams (IP cores).

A FPGA is configured using a "bitstream" which consists of the data to be written to the LUTs and of the configuration bits for the interconnect. Since updateability is one of the key features of a FPGA, one would of course want to do it remotely and allow only trustworthy, i.e. authenticated bitstreams to be loaded into a FPGA. At the same time, the bitstreams contain valuable IP and whoever can read the bitstream can either clone the device (by loading the bitstream into another FPGA) or reverse engineer the IP from the bitstream. Furthermore, it is common that a System Designer doesn't implement all the IP in a bitstream himself - instead he licenses IP cores from several IP Core vendors. (Commonly, an embedded system on a FPGA consists of a microprocessor core, some interface cores such as UART, USB or Ethernet, some accelerator cores e.g. for graphics, crypto, signal processing and a bus core to connect all of the other cores. These may all be supplied by different IP core vendors.) So the IP core vendors have to ensure that their cores are only used in devices for which someone bought a license from them.

The first protocol presented by Marcin and Phil is a simple remote update protocol for which the FPGA needs to have

a fixed update logic UL besides the configurable logic which can compute a keyed MAC,
a unique and public fingerprint F,
a version number V of its current bitstream (to guarantee freshness)
and of course a key which will be used as input for the keyed MAC.

Then, the FPGA has to identify itself and its current bitstream version to the update server so that both sides know F and V. Additionally, both sides need to generate nonces in order to avoid replay attacks; for the FPGA it is sufficient to use a counter while the update server should create a truly random nonce. The bitstream is then sent to the FPGA together with its MAC value which serves both as authentication and error protection mechanism. It is important for the remote update protocol to provide a backup solution for the case that something goes wrong during the backup - otherwise the FPGA would be unusable. One option is to have a default bitstream stored in some non-volatile ROM without using too much (expensive) memory. Note also that the remote update protocol does not need to decrypt the bitstream - commonly the encrypted bitstream is stored in the non-volatile memory as is and gets only decrypted when it is loaded into the configurable logic at boot time using a key and the decryption routine embedded into the boot logic.

The second protocol presented by Marcin and Phil addresses the bit stream decryption and IP core protection problem. Basically, if the bitstream is completely owned by the System Designer (SD), the SD could just encrypt it using a symmetric key he has loaded into the FPGA. But IP core vendors want to protect their IP even though it will always be used by SDs as part of a larger system. So a more complex system is needed and Diffie-Hellman style key-exchange can provide it as the presented 4 stage protocol with three parties (FV - FPGA Vendor, CV - Core Vendor, SD) shows. The needed prerequisits of the FPGA are

a unique key K_FPGA,
unique identifier FID (both stored in non-volatile memory inside the FPGA),
some hardwired additional registers to store a bit stream encryption key for each CV and
a hardwired decryption module which has access to all of the above keys.

Obviously, none of the keys should be accessible to the outside world. The 4 stages are:

Setup: FV generates a secret x and computes a public g^x and embeds the secret into the personalisation bitstream PB and encrypts PB using K_FPGA and sends [FID, g^x, ENC_{K_FPGA}(PB)] to SD.
Licensing: For each CV and each FPGA device, SD sends (FID, g^x) to the CV who choses a secret y, computes a public g^y and uses a key derivation function KDF (specified by the FV) to compute a key K_CV=KDF(y,g^x,FID) which he uses to encrypt his IP core and sends [ENC_{K_CV}(core), g^y] to the SD.
Personalisation: Having assembled his own bitstream containing all the encrypted cores of the CVs, the SD loads first the encrypted PB onto the FPGA which the FPGA is able to decrypt using K_FPGA. The PB contains an implementation of KDF and of the vendor secret x. Furthermore, the SD provides all the g^y to the FPGA which in turn uses the KDF to compute all K_CV=KDF(x,g^y,FID) which it stores in the registers of the decryption logic. (The computation works thanks to Diffie-Hellman.) Now the Personalisation logic (within the configurable part of the FPGA) isn't needed anymore as the (static) decryption logic is fully prepared to decrypt the bitstreams.
Configuration: The SD sends all encrypted bitstreams to the FPGA which decrypts them using the respective K_CV's which were stored during personalisation in the decryption registers.

Note that the SD learns neither the secret x nor any of the secret y's and therefore can not compute any K_CV to decrypt any of the IP cores he licensed. Additionally, the K_CV are tied to the individual FPGA chip thanks to FID so cloning of a device is not possible.

Tuesday, October 18, 2011

Study group: RSA-based standard model signatures

Nigel and Ming-Feng presented the results of 4 papers on provably secure digital signatures in the standard model. Ming-Feng started with Cramer and Shoup's seminal paper from 2000 [ps], and then discussed reductions of the signature length by Fischlin (PKC 2003) [pdf]. Nigel then mentioned the first (stateful) scheme based on the RSA assumption by Hohenberger and Waters [pdf], and dedicated a more thorough presentation to their follow-up work at CRYPTO 2009 [pdf], which is a stateless scheme.

Cramer-Shoup signature scheme was the first to have a standard-model proof of security against adaptive chosen-message attack. It is stateless (which is now the standard for digital signatures, meaning the signer need not keep any state information) and based on the strong RSA assumption. The RSA assumption is, given an RSA modulus n (which is the product of two large primes) and r, z ∈ Z_n^*, it is hard to find y ∈ Z_n^* s.t. y^r≡z (mod n). For the strong RSA assumption, the adversary has more flexibility: given n and z, she has to find both r>1 and y ∈ Z_n^* s.t. y^r≡z (mod n).

Note that while the latter assumption is stronger than the former, the best known attack against both of them is to factor the modulus n.

A signature scheme SIG=(KG,Sign,Ver) consists of algorithms for key generation, signing and verification, which can be sketched as follows (pk is the public key and sk is the signing key):

1^k → KG → (sk,pk)

(sk,m) → Sign → σ

(pk,m,σ) → Ver → 1/0

A scheme is secure against chosen-message attacks if an adversary getting the public key and a signing oracle that returns signatures on messages of its choice cannot output a message it has never queried and a valid signature on it.

Note that the "text-book" RSA signature (where a signature is simply m^d mod n) scheme is not secure under chosen-message attack, since signing is homomorphic. "Hashed RSA" is secure in the random-oracle model.

Cramer and Shoup proposed the first scheme that is provably secure against chosen-message attacks in the standard model. The scheme has 2 parameters l and l' with l+1<l'. Typical assignments are l=160, l'=512.

Key generation

generate an RSA modulus n=pq where p=2p'+1 and q=2q'+1 for p' and q' large primes (such p and q are called strong primes)
generate quadratic residues h,x ∈ QR_n
generate an (l+1)-bit prime e' and
define pk=(n,h,x,e') and sk=(p,q)

Signing

for a hash function H with l-bit outputs set x=H(m)
choose (l+1)-bit number e
choose y ∈ QR_n
choose x' where (y')^e = x' h^H(m)
compute y where y^e=xh^H(x')
define σ = (e,y,y')

Verification

check whether e is a (l+1)-bit integer and e≠ e'
compute x'=(y')^e' h^-H(m)
check x= y^e h^-H(x')

The proof distinguishes different kinds of forgers, depending on whether their forgery reuses the e' of a signature it obtained from a signing query or it reuses the x' or both. While forgeries that reuse parts can be reduced to the standard RSA assumptions, for forgeries with a new e we need to resort to the strong RSA assumption.

In his PKC '03 paper, Fischlin gives a variant of Cramer-Shoup signatures (also provably secure under the strong RSA assumption) which has shorter signatures: instead of |σ| = 2|n|+l+1, we now have σ=|n|+2l+1, but the public keys are longer.

While the CS signature scheme was state of the art, it requires the strong RSA assumption. The reason being that every signature requires its own value e, therefore in the security reduction, in order to simulate the signing oracle, we allow the simulator (who does not know the factorisation of n) to somehow "cheat" by creating new e's.

It thus seems the that we cannot get around this stronger assumption for RSA-based signatures. However, at EUROCRYPT 2009, Hohenberger and Waters presented a scheme that was proven secure under the RSA assumption. The caveat being that it was a stateful scheme, meaning that the signer had to store a counter, which it had to increase every time it issues a signature; else all bets are off. Their signatures were short and looked like standard hash&sign signature à la H(M)^d mod n.

At CRYPTO the same year, they then extended their ideas to define a stateless signature scheme, based on RSA. They start with defining a scheme that only satisfies security against weak chosen-message attacks, where the adversary has to give the challenger all messages it would like to have signed before it gets to see the public key.

Then they transform their scheme into a fully secure scheme by using a known trick: Chameleon hash functions. These are functions that have input a message and some randomness: H(m,r) → y. Now, given a trapdoor τ one can for any y and m efficiently compute r s.t. H(m,r)=y.

The good news is that there exists a chameleon hash function based on the RSA assumption. Given a weekly (not daily) secure scheme and a Chameleon hash function, one can generically construct a fully secure scheme as follows.

Add to the public and private key the description of the hash function. To sign pick a random r and sign x=H(m,r): σ'=S(sk,x). The signature is then defined as σ=(σ',r). To verify it, just recompute the hash and verify σ' on it.

Now given this transformation, all that needs to be done is construct a weakly secure scheme, which is done as follows (remember, a simulator's goal is given N,e and y to compute y^1/y mod n):

Key generation

pick an RSA modulus N, pick h ← Z_n^*
pick a pseudo-random function F_k:{0,1}^* → {0,1}^l
pick a random c ← {0,1}^l
define H_k,c(z)=F_k(i||z) ⊕ c, where i is smallest number s.t. H_k,c(z) is an odd prime
output pk=(N,h,c,k), sk=(c,k,p,q)

(Whereas in the EUROCRYPT paper, the state i is responsible for choosing the primes, now the picked prime depends on the input, not on the index of the signature.)

Signing

let m⁽ⁱ⁾ be defined as the first i bits of m
define e_i=H_k,c(m⁽ⁱ⁾) (thus, given a message we generate n prime numbers)
define σ = h^{∏ e_i^-1}

Verification

compute e_i=H_k,c(m⁽ⁱ⁾)
check σ^{∏ e_i} = h

The proof idea is the following. First, after the adversary outputs m₁, ..., m_n, we fix pk. Since the output m* is a forgery, it must be different from all m_i, so there has to be a j s.t. (m^*)^(j) is different from all m_i^(j).

Suppose we guess this position, then we know what the (m*)^(j) is, and can then fix c s.t. H on (m^*)^(j) returns e, the target from the RSA challenge. To simulate signatures on the queried messages m_i, we raise y to the power of all primes e_i,j = H_k,c(m_i^(j)) and set it as h. Now we can take e_i,j-th roots of h. (This is similar to how Boneh and Boyen transform a challenge for the strong Diffie-Hellman assumption into signatures in their scheme.)

Sunday, October 16, 2011

WESS 2011

It is not an easy task to organize a workshop on Embedded Systems Security just a little more than a week after CHES 2011. And it didn't help, that the Workshop Chair couldn't attend his workshop due to visa issues. So I didn't expect too much from WESS 2011 but was pleasantly surprised by it. For me, the highlights were:

Steve Checkoway and Karl Koscher demonstrated in their keynote talk how to hack a car. If you've been to CHES 2010 that might sound familiar to you. But this time, they had a second part which could've been titled "How to hack a car remotely". Scary!
In the surprise key note Nikil Dutt from the University of California (Irvine) presented an approach towards virtualization of scratch-pad memories in a multi-core embedded system, supporting secure execution environments for sensitive applications.
Dominik Merli gave an interesting talk on "Semi-invasive EM Attack on FPGA RO PUFs and Countermeasures". For their semi-invasive attack the authors removed the packaging to improve the measured EM signal and succeeded in executing a not so simple SPA (Simple Power Analysis) attack on the ring oscillator PUFs.
Ralf Huuck gave an interesting talk on "Model Checking Dataflow for Malicious Input", presenting his Goanna tool (commercially available, a limited version is available for free download) to detect bugs in software projects. Prior to the talk, I did know a few things about software bugs, their origins and debugging but I did not know much about the techniques used by bug detection tools. This made the talk that explained some of the basic approaches interesting to me.

It was apparent throughout most ESWeek events, that security wasn't a concern for the majority of the ES community. (Almost all concerned people were at CHES instead.) So having WESS as part of the ESWeek to fill the gap was helpful but having a stronger showing of security topics at ESWeek in the future would be very desirable. Security is an issue for all embedded systems as the talk by Steve Checkoway and Karl Koscher demonstrated. Next year's ESWeek will be in Tampere, Finland.

P.S.: This post is a bit delayed due to my busy traveling schedule.

Thursday, October 13, 2011

Embedded Systems Week day 3

CODES+ISSS (one of the three ESWeek main conferences) offered a very interesting session on emerging new techniques for non-volatile memories. The talks of this session gave a nice survey of recent advances in the quest to replace current DRAM as well as SRAM caches with new, denser (and bigger) memories based on technologies such as Phase Change Memory (PCMs) and Spin-Transfer Torque RAM (STT-RAM). For some reason unbeknownst to me, the announced fourth talk on Memristors was replaced by a talk on the future of Flash memory and 3D stacking of flash memory.

Phase Change Memory cells consist of a GST (Ge2Sb2Te5) chalcogenide block which can be either in its amorphous phase (0, high resistance) or in its crystalline phase (1, lower resistance). The GST is placed between Vdd (heater) and GND. In his talk Youtao Zhang showed that the first PCM based products have been announced and highlighted some of the advantages and challenges for PCM to replace/enhance current DRAMs. One of the issues is quite obvious: Writing a bit, i.e. changing the state of one GST block requires a lot of energy and time. Another issue is less obvious but maybe even more problematic: While DRAM basically works forever, PCM will fail after roughly 100 million writes with process variations adding a relatively large amount of uncertainty. The third issue is related to security: usage of non-volatile memory as DRAM replacement makes cold-boot attacks a lot easier. On the plus side, the non-volatility of PCM would enable instant-on functionality and PCM is competitive with RAM for reading speed, reading energy and idle power leakage and thus out-performing NAND-Flash. Youtao Zhang then explained some of the techniques used to mitigate the draw backs, which fall into one of the following categories:

Reducing the number of writes
Wear-levelling
Salvaging

My impression was that PCM may be a good technology to replace NAND-Flash in solid state disks but that it will be difficult to replace DRAM entirely with PCM. Splitting the RAM into a DRAM and a PCM part, as outlined by Youtao Zhang, may be a more realistic solution.

Spin-Transfer Torque Memory consists of a Magnetic Junction Tunnel which (in its simplest version) looks like a sandwich with a thick "reference layer" on the bottom that is magnetized in one direction, a thin barrier layer inbetween and a thick "free layer" that stores the bit on the top. The free layer can be either magnetized in the same direction as the reference layer (0, low resistance) or in the opposite direction (1, high resistance) . The bit line is connected to the free layer, the source line is connected via a transistor to the reference layer and the word line is connected to the transistor gate. The advantages of STT-RAM are that it offers a higher density than SRAM, has a lower leakage power and is far more robust against errors so that no ECC is needed. Unfortunately, the time and energy needed to write a bit are still too high (by a factor of approx. 5 or 6 respectively). Yuan Xie and Yiran Chen explained in their talks possible hardware and cache architectures in which STT-RAM can replace or, if used in a combination, enhance traditional SRAM.

P.S.: This post is a bit delayed due to my busy traveling schedule.

Wednesday, October 12, 2011

Embedded Systems Week day 2

Today I would like to highlight the talk of S. Wildermann on "Symbolic Design Space Exploration for Multi-Mode Reconfigurable Systems" (F. Reimann, D. Ziener and J. Teich are co-authors.) In their paper the authors address the question how to choose optimal profiles and layouts for FPGAs (Field Programmable Gate Arrays) that are partially reconfigurable during run-time. A growing number of embedded systems has to support different operation modes with different task and load profiles depending on their current requirement. FPGAs that can be partially reconfigured during run-time are a very interesting, emerging technology and usually the basic idea is to have a processor core and I/O interfaces statically configured into the FPGA and to load special accelerators into the reconfigurable area as needed. Thus, the required chip size is limited (which reduces the production costs) and unused accelerators do not add to the power consumption. However, each reconfiguration consumes time and power requiring a careful trade-off. Furthermore, a multitude of other problems are still open, starting from architectural efficiency (see e.g. our paper at CHES 2011, "An Exploration of Mechanisms for Dynamic Cryptographic Instruction Set Extension") over the question addressed in this talk - how to satisfy the constraints and requirements while avoiding more reconfiguration than needed - to security questions such as the verification and authentication of configuration streams and isolation of security relevant modules.

The common hardware platform made this talk interesting to me - I'm not going to pretend to be an expert for automated Design Space Exploration and the associated optimization algorithms but let me try to explain one interesting point made in the talk: The authors found that existing DSE approaches cannot trivially be adopted to the scenario of reconfigurable devices as the reconfiguration cost adds a new dimension to the problem: What used to be a static problem (production cost) has now run-time cost added to it. They use a SAT solver to efficiently explore design spaces in lower dimensions so that the SAT solver can focus on what it does best: solving scalar and linear equations. Additionally they use an evolutionary algorithm to choose optimal values for the missing dimensions. (This combined use of an evolutionary algorithm and a SAT solver has been introduced as "SAT decoding" by Lukasiewycz et al. ) Wildermann et al. develop a model for reconfigurable design spaces and use the SAT decoding on two case studies to efficiently find optimized solutions that fullfil the constraints. As follow up work, I'd like to see how security constraints can be added to the new model of reconfigurable design spaces.

Tuesday, October 11, 2011

Study group: Distinguishers in DPA attacks

Today Carolyn and Jake gave the first talk in our new series of study groups, entitled "Distinguishers in DPA attacks", focusing on the evolution of research into the optimal choice of distinguisher and discussing current issues and challenges in this area.

In differential power analysis (DPA) attacks, an adversary records power traces corresponding to a known set of plaintexts from a device with a secret key. The power consumed by the device at a particular point in time is partially dependent on the data being processed at a particular point in time, so with careful selection of a point in time the power consumption can be found to be be partially dependent on the result of some intermediary function used in the cryptographic algorithm on the device, for example, a combination of a byte of the secret key and the output of an 8-bit AES S-Box. The adversary can then enumerate the result of this intermediary function for all possible key values. Assuming the adversary knows a function that can suitably approximate how the device consumes power for a particular piece of data, he or she can then model the power consumed by the device for each of the intermediate values and for each possible key. Hence, the 'hypothetical' consumed power for the intermediate values created using a correct key guess will be most similar to the real recorded power consumption.

The comparison tool used to compare the hypothetical and real trace values is called a distinguisher. As an aside - in practice a direct comparison isn't always optimal or practicable; distinguishers can essentially be classified into two types: 'comparison' distinguishers that do directly compare real and hypothetical power values, and 'partitioning' distinguishers that use the hypothetical power values to partition the real ones; the idea being that a meaningful partition will be made when the key is guessed correctly and then the distinguisher tries to quantify this (see here for a nice explanation). In essence, using either approach, a distinguisher is a statistical test and should be analysed as such - the distinguisher statistic can be computed exactly when we know all of the variables in the system, but as we don't usually have this information in practice, the statistic has to be estimated using the available sample data - the power traces. The result of having to do this is that a distinguisher may have theoretic features or advantages that will not necessarily translate into practical results because of the quality of the estimator for the distinguisher statistic. The estimator ideally needs to be both consistent, and efficient - inefficient estimation procedures can add a potentially unforeseen extra layer of data complexity to an attack.

A recurring theme in DPA research has been the aim of constructing distinguishers that are more efficient (require fewer traces to recover the key) and more generic (require fewer assumptions about the target device; in particular how it leaks power information). The requirement for efficiency is obvious; fewer traces translates to a device being weaker in practice (although this is obviously not a complete metric - what about the cost of profiling in a template attack? etc). Genericity is a slightly more subtle requirement; specialist distinguishers might be more powerful but may also require more assumptions to be made about the device. An interesting recent result in this area (paper here) has found that when using nanoscale technologies, inter-device variability becomes so high that a leakage for a particular device model will vary from one instantiation of the device and another. In this situation, even full Bayesian template attacks (theoretically the most powerful type of DPA adversary) will fail. This drives the requirement for some generic distinguisher that can perform equally well irrespective of the leakage function of the device and the assumptions an adversary can make about it. Perhaps the ideal distinguisher therefore is one that is both generic and efficient.

Carolyn and Jake described how research has found that the dual goal of efficiency and genericity is currently infeasible; instead what we see is a trade-off between the two. The more generic distinguishers have a higher data complexity and cost in efficiency over the less generic distinguishers. The parametric or moment-based statistics typically require stronger assumptions about the leakage model but require fewer traces, but the jump to including all of the distribution through use of nonparametric/generic statistics seems to add an unavoidable cost in data complexity. Four particular distinguishers were discussed: correlation, distance-of-means, mutual information analysis and linear regression.

The correlation distinguisher utilises Pearson's correlation-coefficient to quantify the linear dependencies between the real and hypothesised consumption values. When the two distributions are jointly Normally distributed, the correlation coefficient completely characterises the distribution, and although this performance degrades for a different non-Normal type of distribution, the correlation can remain as a useful measure of linear dependence. This is useful as long as the hypothesised consumption under the correct key is linearly related to the real consumption (and vice versa under an incorrect key) - i.e. when the power model used by the adversary is accurate up to proportionality. The estimator for Pearson's correlation coefficient is the sample correlation coefficient. This is a consistent estimator as long as the moments themselves are consistent, which is usually the case and under the law of large numbers, once a large enough sample is obtained the sample average will give the theoretical expectation. These factors combined equate to correlation being an efficient distinguisher (partly because of it's simplicity) but also being reliant on having an accurate power model and hence losing genericity. It is worth noting that up to a point, correlation can still be a successful distinguisher (with a cost in efficiency) in 'generic' scenarios if there is at least some linear dependencies between the real and hypothesised values that can be exploited.

The original DPA distinguisher, as introduced by Kocher, Jaffe and Jun in their 1998 paper, is the "distance of means" test. This is a 'partitioning'-type distinguisher, unlike correlation; the hypothesised values are used to partition the real traces into sets. In the distance of means test, the first bit of the leakage values is analysed. If it equals 0, then the corresponding real traces for the plaintexts that produce hypothetical leakage values of 0 are put into a set, and similarly for 1. The intuition is that when the key is hypothesised correctly, the partitioning of the real consumption values will be done correctly, and the two distributions of the partitioned values will be different (have different means), and vice versa for an incorrect hypothesis because each partition will contain a random sampling of the global distribution. The downside of the distance-of-means distinguisher is that the signal leaked by one bit of power consumption information is small compared to the amount of noise created by the overlapping distributions contributed by the remaining, ignored bits, and so the efficiency here is low, but again at with a benefit in genericity; there are few assumptions required.

A theoretically more powerful type of generic partition distinguisher is mutual information analysis (MIA), proposed in 2008 at CHES. MIA partitions the real traces using all of the discrete values in the hypothesised leakage space, so again a meaningful partition is created only when the key is guessed correctly. Rather than computing means, MIA computes the reduction in the trace entropy after knowing the hypothesised values under a particular key. Another way of describing MIA is that it computes the total amount of shared information between two random variables. In a partitioning attack the converse applies: for a correct key hypothesis, the conditional (or partitioned) distribution will have the smallest entropy, leading to the highest mutual information. The major benefit of MIA is that it can be made to be entirely generic; the intermediate function targeted acts as an inherent power model and so the adversary does not need to model the device leakage function at all. Another way of saying this is that MIA only requires a meaningful grouping of the leakages because mutual information itself is maximised by the raw intermediate values, and this is provided inherently by non-injective S-Boxes. The non-injectivity is a fundamental requirement - as mutual information is invariant to permutation, injective intermediate functions such as the AES S-Box lead to constant mutual information values over all key hypotheses. Hence the full genericity of MIA is restricted to non-injective targets such as the DES S-Box. Again we see the trade-off between efficiency and genericity with MIA - the data overhead required to estimate entropies is significant. To perform an MIA attack, first the underlying leakage distribution needs to be estimated, and then this needs to be substituted into estimators for marginal/conditional entropies. Additionally, all known estimators are biased; no ideal estimator exists, so in practice there are multiple problematic factors in play when performing MIA: a data overhead, sensitivity to estimator choice/configuration and unpredictability of performance based on the underlying data. So whilst MIA is generic up to a point, this genericity comes at a considerable cost in efficiency, so much so that often even with a poor power model correlation may be the more efficient distinguisher.

There has been research into distinguishers that are conceptually similar to MIA, such as the Kolmogorov-Smirnov test and the Cramer-von-Mises test, but they suffer from similar problems. One current, promising avenue of research is into distinguishers based on linear regression techniques. This concept was originally described as a template-style attack but recent work has adapted the approach to essentially allow for 'on-the-fly' profiling. The concept is to fit a linear regression model of the leakage values against a configuration of selected dependent variables, and then compute a goodness-of-fit measure to distinguish. This approach is not truly generic and comes with a computational cost, but is very flexible.

Perhaps the most important message to take from the talk and discussion is that currently there is no such thing as a general optimum distinguisher; each choice comes with some trade-off between genericity and efficiency. This means that the best distinguisher choice will only become obvious once all the other variables in the attack scenario are known: the device leakage, the attacker's model, the targeted function, and the noise. Whether it is possible to construct a distinguisher able to meet both these requirements is an ongoing challenge. Additionally, the question of whether it is possible to construct a generic distinguisher suitable for deployment against injective target functions is an interesting open problem.

Monday, October 10, 2011

Embedded Systems Week day 1

ESWeek consists of three conferences (CASES, EMSOFT and CODES+ISSS) which run in parallel followed by two days with multiple workshops including the Workshop on Embedded Systems Security. Each of the conference days starts with a common keynote followed by the separate conference sessions. From today's keynote I'd like to highlight on small example which demonstrates some problems of VLSI development quite well: the Hamlet circuit:

ToBe := ToBe or (not ToBe); --unclocked

(i.e. http://9gag.com/gag/248779) In a purely logic world, this will stabilise at ToBe = 1 no matter what the initial value of ToBe is. But in the real world, wire delays exist and - for certain wire delays - the circuit will not stabilise . This is a nice example of the problems faced by place & route tools as well as by verification and certification methods.

One of the most interesting talks I've heard today was the first talk of the CODES+ISSS Session 1B on "An Energy-Efficient Patchable Accelerator for Post-Silicon Engineering Changes" by H. Yoshida and M. Fujita. The authors propose an ASIC architecture for HW accelerators that is to some degree patchable to handle errors in the chip design. A HW accelerator (e.g. for a block cipher) is a specialized chip consisting of one or more functional units (e.g. one round of the block cipher), registers, an interconnect network and a finite state machine that controls what happens in the chip. Thus, the chip can perform the function (e.g. encryptions and decryptions) with a very high throughput and a relatively low energy consumption. But if an error in the chip design is detected after the chip has been manufactured, the manufactured chips are useless and a very costly respin of the production masks is required to produce error free chips. Programmable chips such as FPGAs, microprocessors or CPUs on the other hand incur a significantly lower throughput and a relatively high energy consumption.

The patchable architecture proposed by the authors creates a new category of chips situated between fixed function ASICs and programmable chips to offer a throughput almost as high as that of a pure ASIC at an only moderately increased energy consumption. They modify only the finite state machine of the ASIC so that, still at the manufacturer but after production, a set of additional rules can be added to the finite state machine. The example shown in the talk fixes an erroneous transition in the finite state machine itself and, depending on the functional units available and the type of error, errors in a functional unit should be fixable as well although this comes with a higher reduction of the throughput. From a security point of view, the added flexibility makes it harder to verify and certify a chip e.g. for highest security levels but it is still a lot easier than verification and certification of programmable chips.

I do expect that there will be further proposals of possible architectures filling up the gap between fixed function ASICs and programmable chips; if the market trends (respins becoming more often with each technology step while the costs of a respin is growing exponentially) that were shown in the talks continue, the economic value of patchable architectures is bound to grow accordingly.

Tuesday, October 4, 2011

Mifare DESFire MF3ICD40: Death by Side-Channels

Definitively the best and the most practical talk at CHES 2011 was given by David Oswald. The work has been descried in the paper titled "Breaking Mifare DESFire MF3ICD40: Power Analysis and Templates in the Real World" by David Oswald and Christof Paar both from Ruhr-University Bochum and concerns practical attacks on contactless smardcards.

So what has really happened? Authors performed non-invasive side-channel attacks on aforementioned types of smartcards, which means that they were able to reveal full 112-bits key from a card. This allows to modify, read out and clone content of an attacked smartcard very easily.

So why is it so important? Firstly, this is another (after side-channel attack on KeeLoq) practical attack on a device widely deployed on all over the world and also (according to the authors) the first available in an open literature template attack performed on real a world device. Mifare DESFire MF3ICD40 is a 3DES alternative for Mifare Classic (reverse-engineered and then broken) which is widely used in mobile payments and in access controls, e.g., by Czech railways (in-karta) or in San Francisco (Clippercards). Secondly, to perform a non-invasive power attack and a template attack on contactless smartcard, David and Christof used low cost equipment which cost is estimated to be $3000 and computational time of a few hours. That means that a key can be reveled by an attacker having a side-channel knowledge and practical skills.

The presented paper shows once again that side-channel attacks are not only theoretical, performed in labs on well-known designs, but might be applied to black-box devices and so might have a big impact on real world security.

Goodbye Mifare DESFire MF3ICD40, welcome Mifare DESFire EV1 ...

Monday, October 3, 2011

Abstract cryptography

World, Hello! ...hmmm, that did not come out right. Oh well...it's clear what I ment.

I was at research meeting at Dagstuhl this past week and this seems like a good opportunity to cave in to outside pressure and to the revelation that our blog is actively read to start blogging about something (that I find) interesting and that is somehow related to cryptography.

There have been many very very interesting talks (see Program). I particularly liked Eike's construction of a lossy trapdoor permutation from LWE (a much simpler and direct approach than in the original work by Peikert and Waters) and Yuval Ishai's talk on generalizing the Yao circut construction from boolean circuits to arithmetic ones. However, the talk that I found most intriguing was Ueli Maurer's talk on "Constructive cryptography" and this post is about this subject.

There were many ideas compressed in the only 30 minutes talk, and there was a hail of questions at the end. I'll sketch two ideas that stood out for me.

The first idea is a shift in the way we define security (and even functionality) of cryptographic primitives: instead of thinking of the security of systems in isolation, define security by explicitly including in the picture the task for which the primitive is intended. For example, a secure encryption scheme would be whatever allows one to implement a private channel. An authenticated encryption scheme would be whatever allows for the construction of an authenticated channel. Ueli termed the approach constructive cryptography. It was unclear to me how the idea would work for security properties that look very directly at the properties of the implementation and for the case of more complex primitives (e.g. direct anonymous attestation protocols). More generally, it is unclear how far can this idea go but this is something I'm very interesting in understanding.

An idea that I found even more exciting is what underlies "Abstract cryptography", a program Ueli is in the process of developing. Let me try to explain through an example what is the problem that this framework tries to solve.

We have by now several frameworks that allow for some form of general composability results. These include the reactive simulatability framework of Pfitzman and Waidner,the framework of universal composability (and its many variants) by Canetti, the task PIOA framework of Canetti, Cheung, Lynch, Kaynar, and Pereira, the one based on inexhaustible Turing machines by Kusters etc, etc, etc. All of these framworks come with a composability theorem that allows for some form of modular desing of cryptosystems.

I conjecture that a thorough reader raises his hands in despair when confronted with these papers (raise your hand if you did). Reading usually starts with tons of unplesant technical details that define (as precisely as the authors care to) the different components of the frameworks, execution, and adversarial models, specific notions of efficiency etc. Then, when reaching the composition theorem one cannot help but have a sense of deja vu. Despite the different technicalities, all of these frameworks share the same nice and simple definitional idea of security and in turn this leads to similar proofs of the composition theorem.

This suggests the existence of a domain which abstracts away the nasty technical details irrelevant for the validity of the composition theorem. Proving the theorem in this domain would still be simple (if not simpler) and, importantly, existing composition theorems would follow. It should be sufficient to observe that existing frameworks are concrete instantiations of the abstract framework. It is this abstract framework that Ueli advocates developing and which he calls abstract cryptography. In category theory terms, what is desired is an initial object in the category of cryptographic frameworks with composition theorems.

I find this research direction fascinating and useful. Unencumbered by useless details the core ideas behind theorems and proofs can be more easily grasped and explained. Similarities shared by apparently different problems can be more easily evidenced and their solutions reused. For example, Ueli sketched how to cast the impossibility of UC commitments and ZK proofs, in a unified way, as the impossibility of solving a system of equations. It would be great if all UC-like results could be explained equally simple.

I hope that abstract cryptography will mature soon: using simulation frameworks may turn out to be way simpler than it is nowadays and would save cryptographers a significant amount of sleepless nights (A first paper on abstract cryptography had appeared in Innovations in Computer Science'11 and can be downloaded here).

Saturday, October 1, 2011

CHES '11 - DDSLL S-Box analysis

One particular talk that caught my eye during side-channel session of CHES was 'Information Theoretic and Security Analysis of a 65-Nanometer DDSLL AES S-Box' by Mathieu Renauld, Dina Kamel, Francois-Xavier Standaert and Denis Flandre. An implementation of the S-Box in the DDSLL dual-rail precharge logic style was compared with a static CMOS implementation. The analysis was done using the evaluation tools outlined at NIAT (co-located with CHES) by F.-X Standaert (first proposed at Eurocrypt in 2009). Using the perceived information, the amount of physical information leakage can be quantified independently of an adversary, and using a security analysis the relative effectiveness of distinguishers can be evaluated.

Conclusions made from a comparison of perceived information leakage under template and linear stochastic attacks were that DDL can provide a useful improvement in security over CMOS, but that the leakage reduction from DDL alone was not sufficient to provide complete security. The security analysis of the success rate of a template attack against increasing sample size (so the worst-case scenario) reveals that the DDSLL S-Box provides an increase in security of approximately one order of magnitude, and also that for both implementations there are time samples that are sufficiently vulnerable as to be exploitable with non-profiled attacks.

An interesting discussion points arising from the work are that given the quantity of information leakage reduction, practically secure implementations will need to incorporate additional countermeasures such as masking; how does DDL perform for varying technology nodes (the example here is 65nm); whether other DDL styles have similar linearity properties as the example here.

CHES '11 - Invited talk 1

The first invited talk at this year's CHES was given by Ernie Brickell of Intel. The talk was on 'Technologies to Improve Platform Security' and in particular Intel's plans and progress towards achieving this goal. Ernie defined a grouping of four particular 'pillars' of security and outlined for each work done to date to build each pillar: identity protection and fraud deliverance, detection and prevention of malware, securing data and assets, recovery and enhanced patching.

One particularly interesting innovation with respect to the identity protection and fraud deliverance pillar was a hardware token, isolated from any other CPU code, that allows for the generation of one-time passwords; effectively having a traditional physical token for two-factor authentication inside a PC. In addition to improvements in both hardware and software implementations of crypto, the 'securing data and assets' pillar outlined a new digital source of random number generation.

There was also an interesting discussion on ways of stopping escalation of privilege attacks; for instance stopping agents executing user-mode code in ring 0 - this was one of the attack vectors used by Stuxnet.