Thursday, October 31, 2013

How Effective is your Side-Channel Attack?

The guiding principle of this week's study group, led by Carolyn, is the quest for adversarial confidence in the key guesses output by side-channel analysis. A common feature of differential side-channel analysis (SCA) is the divide-and-conquer strategy, in which small parts (subkeys) of the global (full) key are targeted separately, and concatenated to form a key guess. This strategy is possible since cryptographic algorithms (like block ciphers) usually, at some stage, process intermediate values that depend on just a few key bits, so that the space of possible values is a searchable size. As an example, consider an S-box in the first round of AES as the adversary's target. The S-box applies a highly nonlinear transformation to the XOR of a plaintext byte and a key byte. In this scenario the attacker knows the plaintext, so can compute the S-box output for each of the 256 possible values of the subkey. If we assume the attacker has some knowledge of the functional form of the side-channel leakage (e.g. that it’s proportional to the Hamming weight of the processed data), then he can use this to predict the leakage under each of the 256 hypotheses. Applying this technique to several collected traces (relating to different plaintexts), the attacker can compute the correlation between the 256 leakage prediction vectors and the N vectors of trace measurements, aligned at each point in time. Consequently, the largest correlation should indicate the true value of the subkey and the time point at which the S-box output is processed by the device. The attacker repeats this process for each key byte, and concatenates his 'best guesses' for each byte to form his best guess for the global key, which he then tests, for example by attempting to decrypt a known ciphertext. This will fail even if just one subkey guess is wrong, and the attacker won't know where the problem is. Thus the attacker seeks some extra information from the 'divide' stage, for example a ranking (or even a probability distribution) on the subkeys, and some measure of confidence associated with the guessed subkeys to allow him to target the weakest parts of the key in a post-SCA brute-force search.

The focal point of the study group was 'Success through confidence: Evaluating the effectiveness of a side-channel attack' [paper,slides] by Thillard et al., presented at CHES this year. In a nutshell the paper suggests an approach to computing a confidence measure for each subkey (so he knows which bytes need brute-forcing), and shows how picking robust strategies increases confidence. For the proof of concept, the authors consider an idealised attack scenario in which the attacker knows the true leakage function φ so can perfectly predict the data-dependent part of the power consumption, and the remaining variance is Gaussian noise. Assume also that there are N traces {li}i=1N corresponding to N plaintext inputs, and that it's possible to compute the target function under each subkey hypothesis and to map these to columns of power consumption predictions {φk,i}i=1N.

At SAC 2008, Rivain [paper] showed how to compute precise success rates in this setting, beginning with the observation that rankings in the correlation vector are equivalent to rankings in the alternative distinguishing vector 1/N Σi=1N φk,i li. This vector is more convenient to work with since the probability distribution can be nicely characterised as a multivariate Gaussian. It's hence very easy to calculate the success rate, by computing (directly from the multivariate Gaussian cumulative distribution function) the joint probability that the distinguisher score associated with the correct key hypothesis is greater than the scores associated with each of the incorrect hypotheses. The authors also refer to the work of Fei et al. [paper] presented at CHES 2012, which showed that the distribution of distinguishing vectors can be expressed entirely as a function of the algorithmic properties of the target function (scaled by a factor depending inversely on the noise size). These properties were given the description 'confusion coefficients', and are useful because they link side-channel vulnerability to algorithmic properties of a target function.

Thillard et al. extend the notion of confusion coefficients to the setting of correlation differential power analysis (DPA), in which multiple bits are targeted by combining them in a power model leakage function (e.g. the Hamming weight or Hamming distance of intermediate values). Their proposal necessarily depends on the form of leakage and the strong assumption that the attacker knows this form of the leakage. This means that they don’t quite achieve the goal of Fei et al.’s original notion of confusion coefficients, which was to capture the impact of algebraic properties on DPA, independent of the physical implementation. Having defined correlation confusion coefficients, they combine the notion with Rivain's work on exact success rates – that is, they express the success rate as a function of the correlation confusion coefficients. They wish to investigate the stability of a key guess (to encourage confidence) - intuitively if a subkey is ranked top after n trace measurements and also after n+1, n+2, n+3 traces and so on, it's more likely to be correct than if the ranking fluctuates. Helpfully, the authors consider the evolution of the attack giving a matrix of outcomes at selected intervals as the number of traces increases, and they allow for strategies which need not output a subkey guess at all (e.g., in the case that no stable ‘winner’ emerges). They advocate selection strategies which trade off better quality guesses for more missing values (implying more brute-force searching after the side-channel phase of the attack). The confidence metric they propose for a selection strategy is the probability that it returns the correct key, divided by the total probability of it returning any guess. It can be computed by concatenating distinguishing vectors at different points in time (the resulting vector is again treated as a multivariate normal distribution). Taking more of the attack evolution into account increases the confidence of a subkey guess but also the frequency with which the attack fails to return a subkey. The authors also consider different types of selection rules, and how to deal with the inevitable false positives.

Another recent paper by Veyrat-Charvillon et al. at SAC2012 [paper] provides a method of assigning probabilities to subkeys and enumerating over the most likely global keys until the correct key is found, or the attack reaches the computational limits of the attacker. This is clearly related in its aims, but it seems that there is work remaining to be done to bridge the gap between the two papers. This continued work on SCA is taking into account the purpose of device evaluation by trying to represent true capabilities of realistic adversaries; however when this is not possible it is still useful to start at some idealised scenario, and work from there.

Saturday, October 26, 2013

The (B)ASICs of Bitcoin

"This planet has -- or rather had -- a problem, which was this: most of the people living on it were unhappy for pretty much of the time. Many solutions were suggested for this problem, but most of these were largely concerned with the movement of small green pieces of paper, which was odd because on the whole it wasn't the small green pieces of paper that were unhappy." (Douglas Adams, The Hitchhiker's Guide to the Galaxy (1979))
30-odd years later, the planet and unhappiness endure but the small green pieces of paper are teetering on the edge of archaism as much of life -- economy and trade included -- plays out increasingly in the 'digital realm'. Bitcoin -- the topic of our latest study group, led by Jake -- is a popular recent solution to the problem of making payments without recourse to anything so vulgar as physical matter; the upshot being, in place of notes and coins transferred from hand to hand one now exchanges solutions to hard mathematical problems electronically across a network.

The primary focus of the session was the technical workings of the system, but we were inevitably drawn into a variety of intriguing digressions on the 'value' of bitcoins (and the nature of value generally), the dependence of the system on majority consensus, and the difficulty of genuinely achieving the idealised goal of decentralisation with the introduction of powerful (and energy efficient) custom-built hardware miners. In what follows, I will give my best attempt at a technical overview, and then touch on some of the ideas and questions which came up in our discussions.

Bitcoin is a "peer-to-peer electronic cash system" proposed by the (presumed pseudonymous) developer Satoshi Nakamoto in 2008 [PDF] and released into the wild in early 2009. It aims to provide a network-wide-agreed payment mechanism which doesn't rely on any central authority or (as long as there's an honest majority) on mutual trust between the nodes in the network. 

Jake began his explanation of the Bitcoin system by describing a transaction: A bitcoin owner possesses one or several 'addresses', each associated with a bitcoin balance and a public/private key-pair (P_k,S_k). To make a payment of (say) 8 BTC from an address with a balance of 10 BTC the owner broadcasts two transactions: one with a value of 8 BTC to the payee, and one with a value of 2 BTC back to himself. A transaction is a hash of the previous transaction, the value to be transferred, and the public key of the recipient, signed under the private key of the sender. It is recommended (but not mandated) that the sender generates a new key pair and address for the portion of the value that he 'keeps', in order to promote a degree of anonymity in the system (more about that later). 

New transactions are gathered into blocks by miner nodes, at which point it becomes a race to find a difficult 'proof-of-work' for the block (see below) and broadcast it to the rest of the network. The network nodes check the proof-of-work, and the validity of the transactions in the block; if satisfied, a node drops the block it has been working on, adds the accepted block to the chain, and commences working on the next block. Because the new block contains the hash of the previous block, nodes are able to detect if they have missed a block and request it to keep their version of the chain complete. Likewise, if the chain forks (i.e., if different nodes receive different versions of the same block simultaneously), the fastest-growing branch is the one which is preserved (that is, majority consensus decides), and any nodes working on the discarded chain are able to self-correct according to the information in future blocks. (For this reason, verification checks must go back further than one step, and payees should ideally wait for 'confirmation' -- which happens when a block is sufficiently buried under new blocks -- before spending received bitcoins).

The 'proof-of-work' in this case is to find a nonce which, when hashed (twice with SHA-256) along with the rest of the block -- all new transactions, a reward transaction to the miner, a time-stamp, and some meta-data -- produces a value with a certain number of leading zero bits. The number required is specified according to the current difficulty level and evolves according to increasing computer power and varying network participation, the intention being to control the rate of block creation. Over time, new blocks become harder to mine, and the reward decreases (it began at 50 BTC, is currently at 25 BTC, and will be halved every 4 years), until the number of bitcoins reaches 21 million (estimated to happen around 2140) and thereafter remains constant.

Incentives to mine arise from the reward paid to the node which finds the proof-of-work, and an optional transaction fee which right now most nodes are willing to waive but which will play a more important role as the reward, and mining rate, decrease. Many miners cooperate in pools, which run from central servers and pay out mined bitcoins to all individuals in proportion to the computing power contributed. Incentives to obey protocol arise from the fact that only the longest chain (i.e. the one agreed on by the majority) will survive, so it is a waste of time trying to hash 'rubbish' or introduce invalid transactions: only with a colluding majority could valid transactions be rejected or invalid ones accepted. 

We talked a little about the 'success' of Bitcoin as a currency. Ultimately, money only has value to the extent that it can be exchanged for 'stuff' -- "Surely use alone // Makes money not a contemptible stone" (George Herbert) -- and this extent is entirely dependent on the confidence of a community in money as a store of value and their subsequent willingness to accept it in exchange for said 'stuff'. Unsettling circular, no? The variety and quantity of goods and services available for BTC is still very limited, and prices tend to be fairly independent of the dramatically fluctuating Bitcoin-to-dollar/sterling/etc exchange rates. This reality, coupled with the deflationary bias of Bitcoin value (which national currencies work hard to avoid), provide strong incentives to hoard rather than to spend Bitcoin, so that it functions more like an investment than a currency -- potentially problematic given its lack of intrinsic value (if its value depends entirely on a community's willingness to accept it as payment, what becomes of it if nobody trades in it?)

And then there is the question of anonymity -- a motivating factor in the development of many electronic cash systems, Bitcoin included (the idea being that physical coins cannot be traced to previous owners, and one might naturally wish to emulate this functionality in an online setting). Anonymity in the Bitcoin system is achieved to a superficial degree, because the 'addresses' with which bitcoin balances are associated need not be publicly tied to an identity; however, the ability to link transactions to individual pseudonyms, coupled with the opportunity to exploit public information linked from outside sources, means that there is substantial scope for de-anonymisation (a problem which is by no means unique to Bitcoin).

We also talked about the rise of the Bitcoin ASIC miners, which are quickly concentrating network influence into the hands of a small number with exceptional computing capabilities. A participant's ability to mine essentially depends on their computing power relative to other participants. With the introduction of ASICs to the network, mining difficulty rapidly increased (to keep pace with mining speed) to the point where it is no longer cost effective for 'normal' users (with only CPU or even GPU capabilities) to mine -- the energy costs outweigh the reward. This is problematic because it undermines the Bitcoin 'philosophy' of decentralised power, by creating an elite core of uber-nodes with potential opportunity to over-rule the 'majority' because they hold the larger part of the computing power in the network. Various 'solutions' to the ASIC problem have been touted -- for example, switching to SHA-3 for future transactions. Of course, that would put the current ASICs out of useful action, but it would only be a matter of time before new SHA-3 ASICs were developed. Likewise, a dynamic system where the mandated inputs to the hash were periodically changed would only encourage the development of hardware miners able to operate with variable inputs.

Perhaps Bitcoin can never live up to the ideological hype surrounding it. But I think many of us came away from the session reinforced in our impression of it as an elegant piece of cryptography and a fascinating high-impact experiment with much to teach us about the challenges and possible solutions associated with the increasing dominance of the Internet in many aspects of personal and societal life, including the economy. (Of course, it goes without saying that the above opinions and ponderings are only my interpretation of a conversation which was not in any case intended to be authoritative or conclusive on what is a highly complex and potentially divisive topic -- I hope they will be taken in that light!)

Wednesday, October 23, 2013

Indistinguishability Obfuscation

Last week's study group was given by Joop. The theme was Indistinguishability Obfuscation and the talk centered around the recent results in [PDF].

Informally, obfuscating a program is the process of concealing what the program is doing while at the same time preserving its functionality. Among other things this serves to protect against code tampering and reverse engineering. Barak et al. [PDF] showed a number of interesting cryptographic applications of such a paradigm including: constructing homomorphic encryption and removing random oracles [PDF].

 In a bit more details, an obfuscator  O is an efficient (probabilistic) compiler that takes a program (or a circuit) P and produces an intelligible version O(P) of P with the same functionality as that of P.  This is formalized by viewing O(P)  as a black-box in the sense that whatever O(P) can compute, can also be computed  given an oracle access to the original program P.
Barak et al. [PDF] showed that the above formalism of obfuscation is problematic and is, in fact, infeasible. They showed that for some families of functions with a property Π : → {0,1}, there is f ∈ where given any program that computes f, Π(f) can be efficiently computed. However, Π(f) cannot be efficiently computed given oracle access to f other than by random guessing. Thus, they suggested the following weaker definition termed indistinguishability obfuscation to replace the above formalism:
An indistinguishability obfuscator iO for a class of circuits  C ensures that given any two equivalent circuits C1C and C2C, the two distributions iO(C1) and iO(C2) are indistinguishable.

The covered paper provides an indistinguishability obfuscator constructions for NC1 (i.e. all polynomial-size, log-depth circuits) using Multilinear Jigsaw Puzzles. It exploits the theorem proven by Barrington [PDF] that proves any circuit in NC1 can be transformed into an oblivious branching program using 2k square matrices M01,…, M0k and M11,…, M1k. Therefore, evaluating C on input x where |x|=l can be done via the following matrix product equation C(x) =0 iff ∏i=1kMixf(i)= I. 
This observation forms the basis of transforming any NC1 circuit into a branching program.
Barrington also showed how to transform any circuit {0,1}m → {0,1} of depth d into a width-5 permutation branching program of length n≤4d. The program is defined by the tuples (inp(i), Ai,1 , Ai,0) for 1≤i≤n, where inp(i) is the function inp: [n]→[m] describing the bit of the input examined at step i, and the matrices Ai,b are 5x5 permutation matrices. For any input ( computing the program on the input corresponds to computing P=∏i=1nAi,binp(i) an returning 1 if P=I (i.e. the identity matrix) and 0 otherwise.

Now, suppose that two parties Alice with input x and Bob with input y want to evaluate a circuit C∈NC1 on their inputs, i.e. jointly compute C(x,y). 
The construction is based on an earlier scheme by Kilian [PDF] which is as follows: Alice computes a randomized version of the branching program for the circuit by choosing n invertible matrices R1,...,Rn and computes their inverses. Then she sets Ãi,b=Ri-1 Ai,b Ri-1 for i=1,...,n and b ∈ {0,1}. Alice now sends to Bob the matrices corresponding to her input x. Also, the two parties run an oblivious transfer protocol so that Bob obtains the matrices corresponding to his own input y. By putting the matrices in the right order, Bob can now compute C(x,y).

The OT protocol ensures that Alice does not learn Bob's input. On the other hand, the randomization step ensures that Bob also does not learn Alice's input. 
Here Alice is regarded as the obfuscator, whereas Bob is the evaluator.
The authors classify possible attacks on Kilian's scheme into 3 categories as follows:
  • Partial Evaluation Attacks: The adversary computes and compares partial products corresponding to the same input x but different inputs y and y'. Thus, learning some information about x.
  • Mixed Input Attacks:  The adversary performs the matrix product correctly but does not respect the input function.
  •  Non-Multilinear Attacks: The attacker either computes non-multilinear functions over the matrices or does not conform to the algebraic structure of the matrices.

To apply this to Poly-sized circuits, the authors combine the above obfuscator for NC1 with a fully homomorphic encryption scheme (FHE). The obfuscator generates to pairs (SK1,PK1) and (SK2,PK2) of secret/public keys  for the FHE scheme. The circuit C is then encrypted under both public keys.  Both public keys, both ciphertexts and an NC1 obfuscator for decrypting using SK1 are published. To evaluate the program on an input x, the evaluator uses the Eval algorithm of the FHE scheme on x and both ciphertexts to get the ciphertexts e1 and e2 of C(x) under both public keys. The evaluator keeps a track of all the intermediate bits in the evaluation phase which acts as a proof. The evaluator then feeds the proof along with x, e1 and e2 to the  NC1 obfuscator. If ciphertexts e1 and e2 are consistent, the obfuscator decrypt e1 using SK1 to reveal C(x).

Since the proof ensures that e1 and e2 are always consistent, we can switch back and forth to decrypting using the other secret key and thus realize the required indistinguishability.

Sunday, October 20, 2013

Secure End-to-End Communication with Optimal Throughput and Resilient Against Malicious Adversary

On the third day of DISC'13, the second session was on "Crypto, Thrust and Influence". The session started with my talk on the paper titled "Asynchronous Multiparty computation with Linear Communication Complexity". The next talk was by Paul Bunn on his work "Secure End-to-End Communication with Optimal Throughput and Resilient Against Malicious Adversary" with Rafi Ostrovsky. This post briefly covers the paper by Paul Bunn and Rafial Ostrovsky.

In the paper, the authors investigate the feasibility of routing or end-to-end communication in a network in which neither the nodes nor the links are reliable. Namely, for the network links, they do not assume any form of stability: the topology of the network is dynamic (links may spontaneously fail or come back to life at any time), transmission time across each link may vary from link to link as well as across the same link from one transmission to the next (i.e. asynchronous edges), and there is no guarantee enough links are available (even over time) for communication to even be possible. Unreliability of network nodes means that they may actively and maliciously deviate from protocol specifications, attempting to disrupt communication as much as possible. In particular, a malicious adversary may corrupt an arbitrary subset of nodes, taking complete control over them and coordinate attacks to interfere with communication between the uncorrupt nodes. Finally, the nodes are corrupted by polynomially bounded adversaries.

It is not hard to see that very few guarantees can be achieved by any protocol that is forced to operate in networks with so few assumptions. So in many situations, it may be possible that there is no robust protocol at all. Therefore, instead of measuring the efficacy of a given protocol in terms of its absolute performance for robustness, the authors employ competitive analysis to evaluate protocols: the throughput-performance of a given protocol with respect to the network conditions encountered is compared to the performance of an ideal protocol (one that has perfect information regarding the schedule of active/inactive links and corrupt nodes, and makes perfect routing decisions based on this information). Indeed, the combination of the strong notion of unreliability (of both edges and nodes) together with the use of competitive analysis provides a meaningful mechanism to evaluate routing protocols in networks that demonstrate unreliability in unknown ways. Competitive analysis is an existing and useful tool for evaluating protocols in unreliable networks and distributed algorithms with shared memory computation.

They present a protocol that routes effectively in the above network setting, utilizing standard cryptographic tools to guarantee correctness (i.e. Receiver gets all of the messages from Sender, in-order and without modification) with low memory burden per node. They show that their protocol achieves optimal throughput while evaluating the throughput-efficiency of their protocol using competitive-analysis. Specifically, they prove the following theorem.

Theorem. Assuming Public-Key Infrastructure and the existence of a group homomorphic encryption scheme, there exists a routing protocol that achieves correctness and optimal competitive-ratio in a distributed asynchronous network with bounded memory Θ(n2) and dynamic topology (and no connectivity assumptions), even if an arbitrary subset of malicious nodes deliberately disobey the protocol specifications in order to disrupt communication as much as possible.

The memory usage of their protocol matches the memory requirements of the protocols that do not provide security against corrupt nodes. Overall, the talk as well as the work appear pretty interesting.