CCS 2015 kicked off in Denver today, with three parallel sessions in play throughout the conference. Today I'm going to write a little bit about the first talk in the session on censorship and resistance, "Seeing through Network Protocol Obfuscation" by Liang Wang, Kevin P. Dyer, Aditya Akella, Thomas Ristenpart and Thomas Shrimpton, with the talk delivered by Liang.
The authors framed the adversary in the context of their work as a nation state armed with deep-packet inspection (DPI) capabilities, where the adversary wishes to block network access to certain sites or resources, and uses DPI to identify and nullify attempts to circumvent these blocks.
Typically, anti-censorship attempts follow fairly typical strategies; if a particular circumvention protocol is blocked (Tor being the canonical example), then an anti-censorship mechanism will typically try to either randomise bytes sent over the wire with the goal of looking like 'nothing', to 'look' like an unblocked protocol (e.g make Tor look like HTTP) or to tunnel the blocked protocol over an unblocked protocol.
The adversary here is probabilistic; it must guess whether a particular connection is attempting circumvention or not. As a consequence, it's important to understand both the false positive (blocking a genuine connection) and false negative (not identifying a circumvention attempt) rates to be able to judge most accurately the effectiveness of the strategy followed by the adversary.
A valid and I think very important criticism of a proportion of research into this area of classification of network trace information is that often the data set used to evaluate success rates of adversarial strategies isn't representative enough of the real-world environment; the laboratory data sets may be too small, only sampled from a small subset of the 'population', and a classifier may be heavily over-fitted to the training data and thus performance results will not translate to a different data set (see, for example, Mike Perry's blog post and an associated paper). In this regard, this work appears to do a reasonable job; one data set gathered contained packet level information from a variety of locations and over a variety of connection types from a campus networks, and researcher-generated obfuscated Tor connections generated from a reasonably wide range of environments.
The attack strategies used by the adversary are often very straightforward. The authors explored a wide range of circumvention strategies and consequent detection mechanisms that are better explained in the research paper rather than here, but as a motivating example, to detect usage of the format-transforming encryption (FTE) technique, which attempts to make Tor traffic mimic standard HTTP traffic, the adversary can achieve fairly low false positive rates (< 4 percent) by simply reassembling HTTP streams and checking the header information for correctness.
Across the board the authors find that they can construct attacks with false positive rates as low as 0.03 percent. The precise level at which the FPR becomes unacceptable for an organisation inspecting extremely large amounts of traffic (such as the Great Firewall of China) isn't clear, but it seems that anti-circumvention technologies face a tricky task in remaining useful over time. There's also an interesting comparison here with data exfiltration techniques---some anti-censorship mechanisms are viable as data exfiltration tools for extracting secure data unnoticed from a network.
Particularly when machine-learning techniques are involved, the future may yield a scenario in which the obfuscation tool has to adapt faster than the detection mechanism used to 'survive' (and vice versa).