Tuesday, December 9, 2014

Large-scale Computation and Exploitation of RC4 Biases

The excellent first invited talk at Asiacrypt 2014 in Kaohsiung, Taiwan was given by Kenny Paterson (@kennyog) from Royal Holloway, entitled "Big Bias Hunting in Amazonia: Large-scale Computation and Exploitation of RC4 Biases". Kenny outlined the progress of himself and his co-authors in analysing the single and double byte biases in the RC4 keystream in detail, and using the resulting information to construct attacks on cryptographic protocols such as TLS and WPA/TKIP.

Keystream bias and plaintext recovery

The fundamental flaw with RC4 is that (at least) the first 256 bytes of the keystream are not uniformly distributed: the probability of each byte taking each of the 256 possible values 0x00,..,0xFF being 1/256, the actual probabilities exhibit small differences (biases) away from this number. I find that the visual representation of these biases gives the best indication of the nature of the problem---check out the graphs here.

These biases are problematic in situations where a fixed plaintext (think cookie, or password) is repeatedly encrypted under new RC4 keys. In the ideal world, the distribution of the resulting ciphertexts for these plaintext should be uniform, but because the keystream is biased, it isn't. This allows an adversary who sees enough of these ciphertexts to use a fairly simple Bayesian analysis to learn about portions of the plaintext. Protocols sitting in this repeated plaintext context include TLS and WPA/TKIP.

The talk went into detail on how these biases can be exploited to attack these two protocols, but for the purposes of this blog I'm going to stick to discussing how these biases were estimated. For more on the actual attacks, have a look at the RHUL homepage, or a couple of blog posts from Matt Green here and here, or our own blog post here.

Generating the biases

The final section (and motivation for the title) of Kenny's talk outlined how the keystream distribution information was estimated. Estimating the distribution is functionally simple: simply generate lots and lots of random 128-bit RC4 keys, record the first 256 bytes of the keystream generated using each key, and sum up the resulting values of each byte (or pair of bytes in the case of double-byte biases) to estimate the probability of each byte value occurring.

The key here is the "lots and lots" part of the above paragraph. The biases are small---the uniform probability would be 0.00390625, and from a rough glance at the estimated distributions a typical deviation might be as small as 0.00001. As a consequence to eliminate the variance in the estimation process and to estimate the distributions with sufficient precision to allow a plaintext recovery attack, a large number of random keys are needed.

How many? In the case of estimating single-byte biases under the WPA/TKIP protocol, the number of keystreams required was 2^48, and for the double-byte biases 2^46 keystreams. Running this kind of computation using standard desktop-level resources is infeasible (unless you have a lot of time on your hands), and so the authors turned to the computational resources available through cloud computing services.

The authors used Amazon's EC2 compute cluster to deploy 128 nodes, each containing Intel Xeon E5-2680 v2 processors clocked at 2.8 GHz, giving a grand total of 4096 Ivy Bridge cores, totalling 8192 virtual cores in the presence of hyperthreading. Kenny gave some interesting anecdotes about issues with managing this amount of hardware, including having to ask Amazon to manually disable controls stopping users from spinning up more than 10 nodes at a time, and maxing out the capacity of the US West data centre.

As well as the computation required to estimate the distribution, storing the distribution is also a challenge---as always, data I/O is lurking in the background ready to pounce just as you think the difficult part is over. The single-byte distribution requires 32 GiB of storage (which I'd assume to be double-precision floating point values), and the double-byte requires a massive 8 TiB. If you're using a remote compute service, then transferring that kind of data onto local storage for further analysis requires significant effort.

Finally, the cost of the computational effort is interesting---clearly, renting compute cycles is cheaper than purchasing the necessary hardware yourself. The total cost was 43K USD; given the total compute time of 63 virtual core years, we can arrive at a cost of around 7 cents per virtual core hour.

The take-home message

For me, a few points stood out as particularly noteworthy. The maxim "attacks always get better" is definitely applicable here: from the initial observation that plaintext recovery might be possible, we've arrived at the point at which widely-used secure protocols can be considered to be broken when used in tandem with RC4. The results from the team at RHUL and elsewhere also demonstrate how you should keep pushing further with investigations into vulnerabilities, and that is certainly worth investing the time and resources necessary to fully experiment with the nature of the vulnerability.

Secondly, we again return to the problems that arise at the point of intersection between applied cryptography and real-world deployments. Kenny talked about the "inertia" of implementations of protocols: from the high of 50%+ usage of RC4 in TLS, we're still seeing a usage of 33%. Having the agility to rapidly patch/update all the necessary hardware, firmware and software to fully eliminate the usage of a broken ciphersuite is clearly very difficult. Moti Yung made a good point about the relative seriousness of these cryptographic flaws---whilst a plaintext recovery attack requiring 2^20-30 TLS sessions is clearly bad and necessitates a fix, the kind of software flaw vulnerabilities we've recently observed in almost all of the most popular TLS/networking stacks are far more impactful.

No comments:

Post a Comment