Friday, March 28, 2014

Something is leaking in the state of Denmark

A recent study group brought about a micro-preview of this year's Eurocrypt, as we visited one of the 'Best Paper' nominated submissions, Unifying leakage models: from probing attacks to noisy leakage [by A. Duc, S. Dziembowski and S. Faust]. Speaking of Eurocypt, Susan's (who is now part of our Bristol Crypto group) and her co-authors' paper is also on this year's programme.

Side-channel attacks target implementation-specific characteristics of algorithms deployed on physical devices, such as power consumption, running time and so on. Sure, this information is "observed" with specific pieces of hardware (e.g., an oscilloscope), but since such equipment is relatively cheap, it is plausible to assume that even a not-so-experienced attacker can gain access to sensitive information. There is a significant number of papers describing scenarios in which and methods by which the then attacker walks successful (at least in theory), even when side-channel information is incomplete or erroneous. This is a highly optimistic perspective, perhaps a more realistic one would be the cat-and-mouse game, where deployers constantly develop countermeasures and attackers/evaluators strive to come up with ingenious ways around them, both tasks being equally hard. And let's not forget that the attackers may not have it easy to begin with: alas, although in theory there may be no difference between theory and practice, in practice, there always is [fact]. In particular, the two shortcomings mentioned above hold base for the most common paradigms in leakage modelling: there's only so much information which is actually available (bounded leakage) and physical measurments will always reflect some degree of intrinsic noise (noisy leakage).

The paper that we have visited aims at unifying leakage models, and of course does so by introducing a new leakage model. Moreover, it extends the previous work of Prouff and Rivain presented at last year's Eurocrypt [paper]. In particular, the latter paper analyzed the security of a block-cipher implementation protected with additive masking, but had some shortcomings, namely 1. some components were assumed leak-free, 2. the security was guaranteed against random (i.e., not chosen) messages 3. it had limited application.

Duc et al. make their argument based on simulation. Clearly, the first issue will then be how to model noise. Basically, noise is regarded as a randomized function over a finite set of values, independent for all elementary operations, bounded by some parameter, δ (δ-noisy), and is as such a statistical distance from the "ideal" leakage.

Three models for adversaries are studied, the noisy, the random probing and the t-threshold-probing model. In this third example, the adversary can learn the value of t intermediate values that are processed during the computation. A random probing adversary recovers an intermediate value with probability e and obtains a special symbol ⊥ with probability 1−e. Quite intuitively, it is possible to change from this model to the t-threshold-probing adversary, the intuition being accompanied by a formal proof and examples.

On the other hand, in order to address the topic of the noise-resilience of cryptographic computations, a model for arithmetic circuits over a finite field is presented. A (stateful arithmetic) circuit Γ over a field F is an oriented graph. The nodes are called gates, and belong to a pre-determined set (e.g., input/output gates, multiplication/addition, random/constant, memory and so on). A black-box circuit adversary A is a machine that adaptively interacts with a circuit Γ via the input and output interface. Given two stateful circuits Γ and Γ′ over some field F, Γ′ is said to be a (δ, ξ)-noise resilient implementation of a circuit Γ with respect to some (encryption) function f, if the output of Γ with some chosen key k and input is equally likely observed as the output of Γ' with the f(k) and same input, and for every δ-noisy adversary A there exists a black-box circuit adversary S such that the distance between the output of A after interacting with Γ with key k and the output of S after interacting with Γ' with key f(k)  is bounded by ξ. Finally, security in the probing model and resilience to leakage are addressed and a comparison to the previous work is given.



Thursday, March 27, 2014

"Nobody" has written this document...



In today's study group, Panos discussed a paper that focuses on Forensic Stylometry, a form of Authorship Attribution. The paper proposes the Classify-Verify method and it can be found at: www.stolerman.net. There are 2 basic categories in Authorship Attribution (AAtr): The Closed World problem and the Open World problem. The former category implies that the author is in the suspect set. In Open World problems the author might not be in the set. For this reason, the authors merge CLASSIFICATION and VERIFICATION. They utilise an existing distance-based authorship VERIFICATION method but they also add per-feature standard deviations normalisation and per-author threshold normalisation to the scheme.

Stylometry is used to analyse anonymous written communications. The goal is to de-anonymize them. Traditional methods need a suspect set to do that in a reliable way. The paper bypasses the strong assumption that the author is inside the suspect set. The current state-of-the-art methods rely basically on AI techniques. They can identify with an accuracy of 90% individuals in sets of 50 people. However, there are certain restrictions to the already proposed algorithms. In Authorship Attribution, given a document D (unknown authorship) and a set of authors (A = {A1, A2, … , An}), we have to determine the author Ai of D. With authorship verification, given D and A we must decide if D is written by A. The paper suggests merging the two worlds: Given a D of unknown authorship and documents written by a set of known Authors, determine the author Ai ∈ A of D, OR highlight that the author of D is not in the Authors Set.

The authors are using two different corpora for their experiments: EBG corpus (45 different authors/at least 6,500 words per author) + adversarial documents and ICWSM 2009 Spinn3r Blog (blog corpus, around 44M blog posts). For classification they use a CLOSED world SVM classifier provided by Weka (SMO SVM with complexity parameter C = 1) and they choose only one type of feature sets from the Writeprints feature set, which was used to quantify the EBG corpus (over 90% accuracy for 50 authors). The evaluation was done using 10-fold cross validation. (Measured on the most common k, 1-5grams, 50 < k < 1000, step = 50). Finally, they chose the 500 most common character bigrams (they call it <500,2>-chars) as their feature set. FEATURE EXTRACTION is done using JStylo and JGAAP authorship attribution APIs.

For verification they use Classifier-induced verifiers that require a closed-world classifier and use its class output for verification. Another family of verifiers is the Standalone verifiers, which rely on a model built using author training data, independent of other classifiers and authors. Classifier-Induced Verification use distance-based classifiers. A higher confidence in an author may indicate that the author is in the suspect set while a lower confidence may indicate that he is not. So, after the classification, verification is based on setting an acceptance threshold t. We thus have to measure the confidence of the classifier and accept the classification if it is above t. We use probabilities in this category. The paper evaluates three different verification methods: P1, P1 – P2 –Diff and Gap-Conf.

For Standalone Verification the evaluation of the Classify-Verify model is done using either Distractorless verification or the proposed Sigma Verification. For the basic Verification (V: Distractorless) we use distance combined with a threshold: The method suggests to set an acceptance threshold t, model as feature vectors document D and author A, measure distance between them and decide that D is written by A if distance is below t.

The proposed Sigma Verification scheme applies two adjustments to the former method. a) Vσ: Per-Feature SD Normalisation, which is based on the variance of the author’s writing style (uses standard deviation of features) and b) Vα: Per-Author Threshold Normalisation. The evaluation of the former approaches indicates that there is no specific verification method preferable than the other and the selection of a verifier should rely on empirical testing. Sometimes it happens V to outperform Vσ or Vα and the opposite.

The proposed C-V algorithm is an abstaining classifier. In other words, it is a classifier that refrains from classification in certain cases to reduce misclassifications. Basically, the authors expand the closed world authorship problem to open world adding another class: the class ‘UNKNOWN’. Therefore, closed-world classification is applied on D and A = {A1, A2, … , An} and the output is given to the verifier. The verifier determines whether to accept Ai or to reject it (⊥) because C-V is a classifier over the suspect set A∪{⊥}. The threshold of the C-V verifier can be determined as follows:
  • Manually: Manually set by user (make classifier strict or relaxed), 
  • p-Induced Threshold: The threshold can be set empirically over the training set to the one that maximises the target measurement, e.g. F1-score, in an automated process, 
  • in-set/not-in-set-Robust: If p is not known we examine various p and various t. There will be a point where the curves intersect. (p is the likelihood the author of the document to be in the set).

The authors use 2 different settings for evaluation: when the authors of the documents are in the set of suspects (in-set) and when they are not (not-in-set). One assumption they make is that if D is written by A, classified as B but the verifier replaces B with ⊥, they consider the result as true. For their CLASSIFICATION phase they train n (n-1)-class classifiers using the SMO SVM as discussed previously. For the VERIFICATION phase they evaluate several methods: A standalone method for each corpus and all the classifier-induced methods: Gap-Conf, P1, P1-P2-Diff.  Also, they use F1-score because it provides a balanced measurement of precision and recall. For the threshold they use two automatic verification methods: If p is known they use the p-induces threshold that maximises the F1-score on the training set (p = 0.5). If p is unknown they use the robust threshold p-F1R. As baseline they compare F1-scores with 10-fold cross validation results of closed-world classification using the SMO SVM with the <500,2>-chars feature set. Finally, for the Adversarial Settings Evaluation, they train their models on non-adversarial documents of EBG and test them on the imitation documents to test how well C-V thwarts attacks.

Their results can be aggregated as follows: For both EBG and the blog corpora, 0.5-F1 results significantly outperform 0.5-Base using any of the underlying verification methods, at a confidence level of p-val < 0.01. Generally, the underlying verifiers (leading to an overall increase in F1-score) thwarted a large proportion of misclassifications. Among all verification methods, P1-P2-Diff is proven the most preferable verifier to use, since it consistently outperforms the other methods across almost all values of p for both corpora, which implies it is robust to domain variation. For adversarial settings, they prove that the closed-world classifier is highly vulnerable to these types of attacks but C-V manages to thwart the majority of the attacks. In addition, the results suggest that classifier-induced verifiers consistently outperform the standalone ones. Overall, the method is able to replace wrong assertions with more honest and useful statements of “unknown”.

Replacing Random Oracles using Indistinguishability Obfuscation

This week's study group looked at the powerful tool and burgeoning tongue-twister that is indistinguishability obfuscation, with specific focus on the forthcoming Eurocrypt '14 paper titled 'Replacing a Random Oracle: Full Domain Hash From Indistinguishability Obfuscation' by Hohenberger, Sahai and Waters (HSW). Since the work of Garg et al. last year there has been a flurry of work on obfuscation (25 papers on ePrint in 2013 with 'obfuscation' in the title and two whole sessions at TCC 2014 devoted to it), and naturally this will be a well-attended topic at all the major conferences this year.

The idea of full-domain hash (FDH) signatures dates back to 1993 and Bellare-Rogaway's seminal 'Random Oracles are Practical' paper. The idea allows construction of a signature scheme from any trapdoor permutation (TDP) by simply signing the hash of a message using the trapdoor: σ = g-1sk(H(m)). Verification simply employs the inverse of this permutation using the public key. The idea is that the hash function maps an arbitrary message space to the domain (and co-domain) of the permutation, and when modeled as a random oracle, this scheme is secure for any TDP. The proof, employing a reduction to the adversary's success against a TDP challenger, was influential and straightforward: for all but one of the (unique) queries to the RO, the reduction algorithm chooses a random value from the domain and outputs gpk(t), and at one query point m* it programs the output of the RO to be z* = gpk (t*) where z* was the value given by the TDP challenger.

Since this landmark work, a number of 'FDH-like' schemes have appeared, such as the seminal Boneh-Franklin IBE scheme and Boneh-Lynn-Shacham signatures, which both utilise the 'random oracle programming' idea in their proof method. The idea of the HSW paper is to provide a replacement hash function for the random oracle without modifying the underlying signature construction. To do this, the authors utilise an indistinguishability obfuscator and constrained PRFs. More information on indistinguishability obfuscation can be found in Essam's blog post of Joop's study group last October, and many of the recent papers on obfuscation provide a concise overview of indistinguishability obfuscation. A major challenge in the field of obfuscation is dealing with the black-box impossibility results, and basically this paper manages to avoid these issues because the hash function constructions are inherently black-box. The idea of a constrained PRF is that it is only defined on a subset of the usual input space, and HSW focus on puncturable PRFs meaning PRFs that can be defined on all bit strings of a certain length, except for any polynomial-size set of inputs.

The main idea of the proof for selectively secure FDH signatures is that the hash function is an obfuscation of gpk(F(K,m)) where gpk is the public direction of the TDP, F is a puncturable PRF and K the key for this PRF. Since the scenario is selective security, the proof involves hopping from this hash function to an obfuscation of the 'punctured version' of the hash function, and since the input/output behaviour is the same this hop is dealt with by the security of indistinguishability obfuscation. In essence, this punctured version fixes a point (that is programmed to be the signature security game challenge) such that the only way an adversary can win is by defeating the PRF itself. This result is particularly neat because it essentially follows the same strategy as the original RO proof. In the adaptive case things get slightly more complicated, as in past proofs the hash function needs to operate in two modes: a 'normal' mode where the hash function parameters are chosen at random, and a 'partitioning' mode where the parameters are chosen in such a way that inversion is possible for a large fraction of points. However obfuscation can defeat this difficulty too, because the description of the hash function itself can be obfuscated. The paper demonstrates the power of indistinguishability obfuscation and the methods used may well be useful in circumventing or expanding on the black-box impossibility results in the literature.

Thursday, March 13, 2014

You can have the result, but you're not getting my data!

A recent study group focused on the ZQL query language that was published in USENIX 2013 by Microsft research (http://research.microsoft.com/pubs/184348/zql.pdf). The ZQL query language was created to address a simple problem. Given a set of private client data, how can an external server learn the result of a function on that data without ever (directly) learning anything about the private data. We focus on a setting of a single client that knows all the data and does not want to give it to the server. Although it may sound trivial, devising a practical method to achieve this whilst providing the desired security guarantees is far from it. ZQL, which supports a subset of SQL functions, achieves this through the use of zero-knowledge protocols to produce code that performs the data certification, computation and verification. The final compiled code aims to provide the following security properties:
  • Correctness. For any given source inputs, the sequential composition of the cryptographic queries for the data sources, the user, and the verifier yields the same result as the source query.
  • Integrity. An adversary given the capabilities of the user cannot get the verifier to accept any other result (except with a negligible probability).
  • Privacy. An adversary given the capabilities of the verifier, able to choose any two collections of in- puts such that the source query yields the same result, and given the result of the user’s cryptographic query, cannot tell which of the two inputs was used.

One running use case in the paper is in the setting of smart metering. A household may be billed at a different rate as per the usage at different times of the day (e.g. economy7 meters but more fine-grained). The privacy concern here is that a user may not want the energy company to know the exact usage for each second of the day (possibly allowing the energy supplier to learn information about the users habits - such as when the user has a cup of tea, leaves for work in the morning or what time they watch T.V.) but the energy supplier is entitled to know what the bill for the user should be. ZQL allows for the meter to perform the bill calculations (as per the request of the supplier), certify the data and prove that the resulting bill is correct without revealing any of the data used to compute the bill.

The authors constructed ZQL to have sufficient abstraction such that using ZQL does not require a thorough understanding of the underlying crypto functions (making it appealing/practical for industrial deployment). Each time a query is constructed, the ZQL compiler will synthesize all the required zero-knowledge protocols (currently supported by RSA and Elliptic Curve primitives). The package is tied up nicely with cost metrics for evaluating and verifying the queries on the data. On the other hand - ZQL is still in it's infancy and limited to relatively simple SQL-like functions in order to remain practical and preserve the security guarantees.

I would say this is a nicely presented paper. My understanding of SQL is sparse at best, but that being said, the authors do well to present the intuition of what they are trying to achieve and what they have attained thus far. People are becoming increasingly aware of the data being given away about their personal habits and this paper does well to demonstrate the application of various crypto and computer science primitves to piece together a nice solution to this challenging problem.

Tuesday, March 4, 2014

Beware – Your user agent could be a double agent!



Even though there were many interesting talks at the RSA 2014 conference in San Francisco, including a few discussing the recent NSA-RSA privacy drama, I chose to blog about a short talk by Mike Shema from Qualys centering around security and privacy on the web.

In his talk, Mike highlighted the difference between having a secure browser and secure date. Basically, every day we use web browsers (known as user agents since they interact on behalf of the user) to access the web. Many websites rely on the collection of users' data to generate their revenue and thus as users access websites, those websites interact with users' browsers behind the scene not always necessarily the way the users want.

Many vendors offer big money prizes to help them identify and mitigate security vulnerabilities in their browsers. Although there have been many recent initiatives towards making browsers more secure, including the advent of HTML5, we still have not reached what we consider default stands for privacy. Vendors have conflicting perspectives regarding not tracking users on the web. The implementation of the recently proposed “Do Not Track” HTTP header is one obvious example. Internet Explorer suggested that the default value of such a field should be set to enabled so that tracking is prevented by default unless the user decides to opt in. On the other hand, Google Chrome had a different point of view. Advertisers also thought such a feature would impact their business.

Constantly, different security features are added to browsers to make them more secure. As well as self updates, among other things, such features include: process separation and sand boxing. Informally, the former allows different tabs/web pages to be run as separate processes, whereas the latter limits access to user's resources on per-application basis. As an example, latest versions of Google Chrome embed Flash on its own.

There are many ways one can envision metrics used in evaluating browsers' and data security or assessing whether one browser is more secure than another or if a data set is more private than another. On the one hand, there are Malwares (short for malicious software) which attack browsers directly. On the other hand, we have attacks like resources framing, Clickjacking and Cross-Site Request Forgery (CSRF). CSRF (sometimes is also abbreviated as XSRF) allows an attacker to force the victim to send HTTP requests to another target website and therefore making use of any capabilities the victim has with the target website, e.g. exploiting an existing authenticated session that the victim has with the target website. Recently, Mozilla and Safari turned on TLS 1.1/TLS 1.2 and are using the recommended cipher suite by default to promote network encryption which is vital for data security.

Many web pages these days are built with multiple origins and their content comes from different sources. Advertising, in particular, is inherently cross-origin. It was not until about 10 years ago that the risks of mixing different contents from multiple origins in a web page came into the fore.
Even though the Same Origin Policy used by browsers ensures that a resource cannot read a response from another origin, it still does not provide proper isolation of cookies and resources across different origins and thus it does not rule out CSRFs. More precisely, the same origin policy does not stop a resource from one origin from making a request to another one from a different origin.

By ensuring that websites use SSL (Secure Socket Layer) as emphasized by the Electronic Frontier Foundation (EFF) in its “HTTPS Everywhere” initiative, many security vulnerabilities such as mixed content and information leakage can be prevented. However, there have been examples of many browsers (especially on mobile devices) which ran HTTPs but still was vulnerable because they were skipping important steps such as certificate validation.

In order to enhance users' data privacy, many potential solutions could be implemented. One important step is to impose penalties on servers which do not honour the Do Not Track requests made by the user. Another possible countermeasure is using data pollution where, for instance, users can swap their cookies such as Google PREF IDs or Double-Click cookies in order to achieve anonymity.

In summary, there a number of ways which could help preserve users' privacy on the web. Firstly, enabling “deny third party” cookies as a default as done by Safari. Secondly, using identity management and separating cookie jars in order to ensure that the interaction of a user with one website is independent from their interaction with another one.

Saturday, March 1, 2014

CloudFog: Stephen Colbert's solution to all of your secure data requirements

"If you're not kraeusening your data, you might as well be giving it away". -- Stephen Colbert (RSA Conference 2014, closing keynote)

The grand denouement of the 2014 RSA Conference was the product launch for Stephen Colbert's new secure data service 'CloudFog'. It's not really my field so I struggled once he got going on the technical details (something about an advanced, polyhedral something-or-other). But he's obviously made substantial breakthroughs in practical fully homomorphic encryption -- exciting times for the crypto community.

Of course, Colbert wasn't just there to drum up business for his side project (a practice which I thought was generally frowned upon in keynote speakers).[1] Presumably the conference organisers invited him in his capacity as "perspicacious social commentator". For me, surrounded by security and crypto chat every day -- like pretty much every other delegate there, I expect -- it was interesting to get an "outsider's" take on what the big topics are and how they are perceived by the wider world.

So, yes, we had "The Cloud". It doesn't take a security expert to spot the steady growth and increasing reach of that nebulous beast. But I did think it was rather insightful on Colbert's part to apparently pick up (perhaps inadvertently) on the particular challenge of computing on encrypted data, not just storing it.

He also touched on Bitcoin a couple of times: from the perspective of its instability -- joking that RSA had paid him in MtGox vouchers -- as well as from the perspective of the arbitrary nature of 'value'. "I don't really understand it. I don't really understand gold. It sounds like a fun game that got out of control… Just like money generally." (Or words to that effect -- apologies for inaccuracy/degradation of wit).

Of course, he couldn't really not talk about PRNG backdoors, given the RSA 'scandal' (it surprises me that no-one seems to have yet coined the oh-so-tempting moniker 'backdoorgate'), and the subsequent pressure on him to boycott the conference. If I understood his reasoning rightly (and I might well have got the wrong end of the stick) his argument seemed to be that escaping the reach of the NSA was a pretty tall order for any corporation and that he'd rather they were getting paid for the obligatory relationship than not. As for his personal reasons for keeping his engagement to speak, he toed the classic self-reproachful line with a few wisecracks about financial incentives -- "my conscience was clear as long as the cheque was", but also seemed quite keen to highlight media insistence as another form of encroachment on freedom.

And that brings us round to the really hot topic of government surveillance in general, and the Snowden revelations. He was quite restrained, I thought (much more so than in his legendary performance at the 2006 White House Correspondent's Dinner). But still, he got his jibes in. To paraphrase: "These programs are designed to root out terrorists. It shouldn't bother you if you're not hiding anything. And since nothing can be hidden from the NSA, nothing is bothering you." … "Who here supports Edward Snowden? Keep your hands up so the cameras can get your faces…" … "Mind you, the revelations haven't affected recruitment. The NSA are still getting a lot of résumés -- some of which were even sent to them."

However, he seemed careful to neither over-simplify, nor turn the NSA into an easy 'bad guy' out against the 'innocent public'. Voters elected the leaders that voted on the Patriot Act, and continued to support them in the wake of it. He didn't labour the point, but I believe "passive acquiescence" were the words he used. "After all, give someone unlimited power and no supervision and the results are always fantastic…" (or words to that effect). He was also critical of Snowden, on the basis that his stated motivation was to make Americans aware of the extent of domestic surveillance, but in practice he has been no less willing to leak information on foreign intelligence -- which in Colbert's view has jeopardised appropriate national protections. He was measured and cautious in his comments on Snowden, but did make the interesting suggestion (in a moment of seriousness during the Q&A) that if you believe something is the right thing to do, and that that something is against the law, then part of doing the right thing involves facing the legal (and other) consequences of those actions -- his implication being that maybe Snowden should return willingly to face trial.

Colbert may not be an expert on cybersecurity, nor the legal and political issues surrounding it, but he does make a living from scrutinising society and highlighting absurdities in human behaviour -- in my opinion, very insightfully and wittily. His talk was lively and impressively well-researched, and got the issues out on the table in a way that most of the experts and officials speaking throughout the week were necessarily restrained from doing. For me, it was the perfect finale to a fascinating and somewhat surreal conference experience.


[1] I feel obliged to clarify that "yes, I do know he was only joking", lest I invite a deluge of mockery or correction.