Thursday, October 8, 2015

Partitioned caches redux: Intel CAT to thwart side-channel attacks?

Caches: a Computer Science 101 explanation

The so-called memory wall is now and has long been a thorn in the side of computer architecture. The issue is simple: modern processors are able to execute instructions at a high rate (magnified by superscalar, SMT, multi-core and whatever else), but are effectively limited by the (relatively) low performance of memories they are attached to. The sticking plaster over this issue, which has worked fairly well so far at least, is the memory hierarchy. Again put simply, in the absence of an ideal, infinitely large and infinitely fast memory we're stuck with less ideal alternatives. So as a compromise, we try to get the best out of each technology we do have: this results in the memory hierarchy, where we have, for example, some small, fast registers at one end and a larger, slow memory at the other. As long as we retain the working set towards the faster end of the hierarchy (which the principle of locality suggests we can), performance approaches the ideal.

Somewhere in the hierarchy between registers and memory, we typically place a cache (or various levels of them). A given cache will have less capacity than main memory, but can be faster as a consequence; using control logic that transparently capitalises on locality of reference, most accesses are (in theory) satisfied by the faster cache rather than the slower memory. Or, if you just care about software, they pull off a kind of magic trick: the program executes faster, and does so "for free" in the sense the cache is transparent so the program needn't do anything special. Or at least it does on average: if can write some interesting programs to exercise the worst-case behaviour, which show the real and quite alarmingly sized gap in performance patched over by caches.

Partitioned caches, CAT and side-channel attacks

After years of increasing cache size, and tweaking their organisation and control strategies, Intel have now included something called Cache Allocation Technology (CAT) in their Xeon model processors: there's a great write-up here, although the original white-paper doesn't really do justice to the vast range of literature in this area. The idea is that the cache can be partitioned, or divided into protected regions with the goal of improving performance. For a server delivering a service with variable demand, for example, it makes sense to allow forms of variable resource allocation to match. Given it is transparent to your programs, by design, you might not think of the cache as a resource of this type. But it is of course: processes share and so compete for space in (any level of) the cache, so by allowing protected allocation via of said space via partitioning, one can eliminate said contention. For example, the OS might decide to allocate a large(r) region to some high-priority process and a small(er) region to another process deemed lower-priority; the former might execute faster as a result, since more of its working set is resident.

This is interesting from a security perspective, because when any resource is shared between two processes, there is potential for leakage of information from one process to another. For the specific case of caches, there's again a huge amount of literature from the side-channel community, but the (fairly) recent work of Eisenbarth, Sunar et. al goes a long way toward summing up one strand of the wider problem. The good news is that wrt. the Last Level Cache (LLC), which is problematic in the sense it's a target for cross-VM type attacks in the context of cloud computing, CAT might offer a countermeasure. Depressingly (for me), this is a sort of bitter sounding "I told you so" stemming from admittedly quite speculative work I did 10 years ago. It's also depressing that was 10 years ago, but that's another story.

Knee-jerk conclusions

I said might offer carefully though, because although Intel may have bought the idea of exposing the control of shared resources, this is down to a focus on non-security metrics such as performance. That focus seems to undermine the value of CAT for security: for example, the technical documentation says that "power management may override CAT settings". It might be hard to mount a PoC, but it at least seems feasible that an attack process might force the security unaware power management system to abandon CAT, then mount an LLC-based attack as normal.

I've only really started to look at the technical detail, so it's too soon to jump to strong conclusions, but, to me, work like Sanctum and CHERI offer more considered (albeit more invasive) approaches to isolated, compartmentalised instruction execution; in contrast, and at first glance at least, CAT seems more like another bolt-on and so a missed opportunity. Given the effort involved in deploying a technology like CAT at all, it seems a shame security is again somewhere down the metric pecking order: probably Intel could have done us all a favour by helping to solve the underlying security problem (and, to be fair, there may well be ways of using CAT to do so), but instead opted to target (more saleable) performance again.

No comments:

Post a Comment