Thursday, February 27, 2014

To be fair, go geometric

There has been many topics in TCC this year. Some  'deobfuscated' their complex meaning in nice presentations, others proudly 'leaked' some FHE on their way, inferences from coding theory dared to appear, and not surprisingly, fans of zero knowledge were feed appropiately . But for me, with no doubt, the best of all of them, with three dedicated sessions, no less (and more that ZK), were those talks on secure computation.

In this post I commit myself to one of them, which I think it was very well presented, and most importantly, it came with an ingenous idea. It was given by Gilad Asharov from Bar-Illan, and dealed with an important aspect of secure function evaluation, which is that of fairness. Informally, this property says that if one particular party is going to learn the output, then everyone should learn it. When a majority of the parties are honest, there exist  well known fair protocols, but the situation is different when a majority of the parties is behaving dishonestly, e.g., 2PC: Cleve showed that unbiased coin tossing cannot be achieved with fairness, and hence nor the XOR function can. Of course, in practice this is not really an issue, because the fact that you don't get to know the intended outcome, doesn't mean that you are unable to certify the  validity of whatever comes out from the protocol, just abort if you are not convinced. Nevertheless, the question of what functions can, and what functions cannot be computed fairly, still stands. It  seems that almost any functionality cannot be realized fairly. You reason like this: to implement it you are likely to design a protocol which is round-based, but by this mere fact you are introducing asymmetry among the knowledge that the parties have, because is hard to imagine how you can make all parties gain simultaneously the same information within each round.

This belief was changed in 2008, when it was shown that there exist some functions, (other than the easy ones, i.e., constant functions or those with  input(output) from(to) only one party), can be computed with fairness. But wait a moment, didn't we just try to justify that fairness is hard? How do they do it then? Well, let's stick to what Cleve told us: you cannot do fair XOR with dishonest majority, ok, but, can you achieve fairness in something which has embedded a XOR gate? Turns out you can, and the key to see this lies in observing that depending on the structure of the function we can make the simulation go through: it is possible if the simulator can choose the input sent to the trusted party. In particular they gave a concrete function which has fairness. Too few though.

The wit of this paper is that it identifies which functions are suitable for this simulation strategy. In more detail, the authors show that, provided oblivious transfer exists, any boolean function, taking two inputs, on domains of size l,m, respectively, is fair if its truth table defines a full-dimensional convex hull in R^m. This result also extends to the non binary case. Aditionally, they also show that boolean functions that are  sampled at random from those whose two inputs come from different domain have their own full-dimensional convex hull with overwhelming probability, and therefore, are fairly computable. There are many of such functions, including those for the task of set-membership, or secure evaluation of a private (boolean) function.

TCC 2014: How to Not Know What You're Doing

After a Wired article prompted a reply by our favorite crypto blogger, it is fair to say that obfuscation is a hot topic at the moment. So it comes at no surprise that TCC started off with not one but two sessions on obfuscation on Monday morning.

Informally, obfuscation means transforming a piece of code such that the transformed code still does the same thing but does not reveal anything about the original code. The formal definition of virtual black-box (VBB) obfuscation uses circuits or Turing machines as code and bit predicates for the security part. And here come the bad news: Barak et al. showed that this is impossible. It follows that a closed-source binary or a piece of "obfuscated" Javascript code can be analyzed to extract for example a private key embedded into it (given enough resources of course).

Is that the end of the story? Not quite. Barak et al. proposed the notion of indistinguishability obfuscation (iO), which postulates that one cannot distinguish between the obfuscated versions of two circuits computing the same function. This is weaker than VBB obfuscation. In more bad news, it is not known how to achieve iO under standard assumptions.

Unsurprisingly, all the papers in the first TCC session were concerned with circumventing the central impossibility result in one way or another. The first paper entitled "Virtual Black-Box Obfuscation for All Circuits via Generic Graded Encoding" did so by using graded encoding schemes, an idealized model of multilinear maps where the adversary is only allowed to execute public operations and test for zero. The authors use so-called matrix branching programs to construct an obfuscation scheme, which achieves iO and VBB, the latter under a further complexity assumption. The speaker, Zvika Brakerski, stated the opinion that the result might just expose the weakness of the generic model.

The second paper in the session, "Obfuscation for Evasive Functions", went down a different path. The authors restrict the functions computed to so-called evasive functions. These are a family of functions such that random one among them outputs zero with overwhelming probability. An example given in the talk is a patch for a piece of software that crashes on certain inputs. Obfuscation for evasive functions allows to create a patch that rejects those inputs without disclosing on which inputs the software crashes. The authors also work with the virtual gray box model. This differs from the virtual black box model in that it allows for an unbounded simulator. The authors prove various results around the two definitions such as average-case VGB obfuscation for evasive functions implying average-case VGB obfuscation for all function families or average-case VGB obfuscation being equivalent to average-case VBB obfuscation for evasive functions.

Most interesting for MPC folks like me was of course the talk by Shai Halevi in the second session. He presented the paper "Two-round secure MPC from Indistinguishability Obfuscation". The title basically tells it all. They construct two-round MPC from any MPC protocol by obfuscating the parties' computations, which then can be executed by anyone. Because the constructions uses NIZKs, the resulting protocol provides active security. The most intricate part of the construction is the move from security under VBB obfuscation to iO obfuscation. This is achieved by hard-coding the MPC messages into the obfuscation.

Wednesday, February 26, 2014

RSA 2014: A Post-Snowden Platform for Dialogue

The RSA Conference is an unusual beast: you are more likely to overhear conversations about market capitalization and promoting synergy than MPC or indistinguishability notions, and session titles like "Securing Boomers, Gen Xers and Gen Yers: OMG We Are So Different!" are the norm. The whole week is undoubtedly an invaluable learning/networking experience for so many of the delegates, as the convergence of so many security professionals and academics is unparalleled. The industry expo part of the conference is overwhelming in many respects, and it's quite baffling that so many companies assume that delegates who paid upwards of $2000 to get in really want free pens/stress balls/keyrings. Maybe I'm just bitter that I didn't win a remote-controlled helicopter.

The cryptographers' panel was hosted by Paul Kocher, featured Adi Shamir, Ron Rivest, Whit Diffie and Brian LaMacchia (MSR) and was mainly focused on the NSA/Snowden files. It was a fairly fiery affair: Adi Shamir inferred that the US government acts like an APT; and when asked about Bitcoin, Rivest called it "an interesting research project". The panel keenly concluded that encryption itself works, and that it is important to ensure that the thousands of industry professionals work with people who really know what they are doing when it comes to combining the encryption with other layers. While this is not news, the increasing influence of Real-World Crypto and other such meetings suggest that we are heading in the right direction.

Bruce Schneier stepped up the rhetoric as part of the 'Industry Experts' track, and his talk was tantalisingly titled 'NSA Surveillance: What We Know, and What to Do about it'. Schneier started off with an overview of some of his favourite project codenames (EgotisticalGiraffe the clear winner) contained in the documents leaked by Edward Snowden in 2013, and discussed the immediate and future implications of the revelations. While there were few revelations in the talk, the emphasis was very much on how to encourage the average internet user to think that invasion of privacy and mass collection of data is in fact a bad thing. It is known and often discussed that most users will happily and knowingly access an insecure site in order to get what they want, and it's important to educate on a grand scale to change this mindset. The conference also features talks from the other side of the fence, and it'll be interesting to see what kind of reaction FBI director James Comey receives for his Keynote.

The invited talk of the Cryptography track was given by Antoine Joux, and unsurprisingly it concerned the development of index calculus methods and the discrete logarithm problem. Whilst sparsely attended, the talk gave an excellent overview of the problem at hand and the currently known tools used to improve index calculus algorithms. Recent developments in this field by Joux and others have naturally gained considerable attention in the community and on this blog.

I'll be commenting on aspects of the conference throughout the week on Twitter, and many of the Keynote talks will become available in due course.

Friday, February 14, 2014

On Transaction Malleability in Bitcoin and other systems

Last week the MtGox bitcoin exchange, a web-site originally built to exchange Magic: The Gathering cards online, went offline. The "attack" on bitcoin this time is that people are exploiting a problem known as "transaction malleability". Following the MtGox attack the same problem allowed for denial-of-service attacks on the remaining bitcoin exchanges and further stretched the time needed to confirm a bitcoin transaction. The malleability issue in bitcoin has been known for well over two years now [1]. At the time it seems to have been dismissed as ugly, but not worth bothering about. Now, the issue is back to bite us.

I'm a theoretical cryptographer who spends his time shifting letters around in mathematical formulae that describe abstractions of cryptographic protocols such as encryption or digital signatures. Whether or not this has any relation to how cryptography is used in the real world is a matter of debate [2]. One thing I know for sure though is that in my formulae, when the issue of malleability pops up you can't just dismiss it as not worth bothering about - in many cases, allow things to be malleable and the maths simply doesn't work anymore.

We've known about this for way more than two years. David Chaum and others first suggested cryptographic "digital cash" back in the 1990s [3]. It was based on digital signatures; double-spending and malleability were already known hazards back then for which mitigations were developed. Sadly, Chaum's e-cash never really got off the ground although it's still a staple of cryptography courses around the world. (e-cash differed from bitcoin in that e-cash didn't derive value from cryptographic operations; banks would issue e-coins backed by other instruments of value.)

If malleability was a known and solved problem so long ago, why did Satoshi miss it in the design of bitcoin? I can't answer that question. I can however show another two examples of systems in which malleability is known to exist but had been dismissed by the designers. Satoshi is not alone in ignoring these issues in the design of a cryptographic system.

Direct Anonymous Attestation

The trusted computing group [4] maintains a standard for a protocol known as Direct Anonymous Attestation or DAA for short. This protocol is built into cryptographic coprocessors known as trusted platform modules. Proponents of DAA say that these modules have been available in all high-end laptops for years and that over 500 million such modules have been shipped. Opponents might counter that that's all well and good, but could someone name a single application that actually uses DAA? I couldn't. (The trusted module itself is used in practice to support Bitlocker hard disk encryption under "ultimate" versions of Windows, but that's not DAA.)

DAA described in the original paper [5] and every presentation I've seen on it since as a form of signature. Signatures, the theoretician says, are not supposed to be malleable without good reason. Yet a couple of years ago I discovered that the standardised and shipped protocol actually contains a case in which signatures are malleable [6].

A DAA signature signs a pair of objects, a message and a so-called basename. DAA signatures come in two forms, ones in which the basename is empty and ones in which it's a kind of identifier for the recipient. The idea is that signatures by the same signer on the same basename are 'linked' whereas others are not: if I sign two different messages for the basename 'amazon' then anyone can tell that these two signatures were made by the same person, but if I give you one signature for 'amazon' and one for 'ebay' then no-one can tell if it's two signatures from the same user or from two different ones. And signatures with an empty basename are always supposed to be unlinkable to any other signatures.

The problem is that in the shipped protocol, you can take any signature under any basename and claim that it's a signature for the empty basename: it will still look valid. This way, you can produce signatures that look like they're on the empty basename and so shouldn't reveal they're from the same user, but interpreting them as using a non-empty basename they might very well link. And you can claim someone signed a message under the empty basename when in fact they didn't. The problem isn't at all hard to fix --- this would require a very minor change to the protocol.

If you haven't heard of this vulnerability before, it's probably because no-one actually uses DAA. But let's suppose that suddenly it becomes the next big thing: I'd almost be willing to bet money that the mallebility problem will come back with a vengeance and bite us in a few years' time.

Cryptographic voting

Cryptographic voting is perhaps even more popular than cryptographic currencies: Estonia, Norway and Israel have all used it in some form in national-level political elections. Even the US has deployed it in some local elections such as for the mayoral election in Tacoma Park, Florida.

Here's how it works. Every voter encrypts her vote and posts the encrypted ciphertext on a "bulletin board", which has the same role as the block chain in bitcoin: a public collection of all "transactions" (ballots).

The real magic is that you can tally such a board in a way that you get the result and only the result --- so not, for example, how I voted --- and you can be sure that the talliers haven't been cheating. Anyone can download and audit the board and unless the claimed result matches exactly what's in the ballots, it won't verify. No more urns of ballots for the opposition "accidentally" falling off a truck on the way to the counting centre.

The problem with malleability here is that I could potentially just copy someone else's ballot and call it my own to vote the same way as them. Or even copy and twiddle their ballot to vote for the exact opposite of their choice, thus effictively erasing their ballot. Not only does this create new potential forms of protest votes beyond the usual spoilt ballots but two researchers called Véronique Cortier and Ben Smyth have actually shown that ballot-copying could be used to gain information on how individuals voted in a case study on French legislative elections, were they to use cryptographic voting and not worry about malleable ballots [7].

One protocol that Cortier and Smyth studied is called Helios and has been used among other things to elect the president of the Catholic University of Leuven and the board of the International Association for Cryptologic Research. And guess what, it has malleable ballots. In my own work [8], together with Cortier and Smyth, my supervisors and the Helios authors, I discovered further flaws: a "bad ballot" that erases one candidate's tally under certain conditions and a way for the voting authorities to set up a "trapdoor" election in which they can alter the election result without anyone being able to notice. (This will be fixed in the next version of Helios.)

Malleability is bad. Stop it.

The issues with malleability in bitcoin remind me a lot of other common security problems on the web. Travel back in time a bit: ok, so my site breaks if anyone puts '; DROP TABLE users; -- in the username field. But who on the internet would be that evil [9]? And besides, I've got some client-side javascript that validates the fields before you can even click the submit button. Such problems were discovered, initially dismissed as not relevant in practice, started to get exploited widely and nowadays if you get this wrong, you will get hacked sooner rather than later, usually a day or two after your site becomes popular. It's happened to bitcoin exchanges not too long ago. If you think SQL injection is not going to happen to your site these days, you shouldn't be let anywhere near a database.

Transaction malleability isn't quite as well-known a problem as SQL injection or XSS but if, in the wake of the fallout from the NSA revelations and the popularity of bitcoin, sophisticated cryptographic protocols start to become more common then I wouldn't be surprised if signature malleability makes its way onto the OWASP top 10 some day. Whether you're building a currency, a cryptographic coprocessor with attestation features or a voting scheme, malleability is a hazard. And if you do the maths and malleability breaks your security analysis, "that's annoying but I don't think that's too much of a problem in practice" just doesn't work for me. Eliminate it.

Wednesday, February 5, 2014

One Day, Two Birthdays

Today, Feb 5th, is the 70th anniversary of the first operation of Colossus. Colossus was the worlds first programmable digital electronic computer. It was built in the Second World War at Bletchley Park so as to enable the efficient breaking of the German Lorenz cipher traffic. The machine was built by the Post Office engineer Tommy Flowers, and utilized the amazing cryptanalytic breakthrough of Bill Tutte. A number of Colossi were built in the war but most were destroyed on the German surrender, with only two being passed to GCHQ for future work. The secret of the existence of the Colossus only came out in the 1970s. Since that time a replica of the Colossus has been built at The National Museum of Computing; with most of the work being done by a Tony Sale. You can now see the Colossus at the museum. 

Today was a special day, and a special event was held at the museum. A large number of guests turned up, including some of the original operators to see the entire breaking of Lorenz operation in action. First a message was received at a replica Y Station. Then the received data was converted to paper tape and passed to some WRENs for entering into Colossus. Colossus then found the wheel settings for the Lorenz cipher. Finally, the message was decrypted using the Tunny machine.

But there was also another birthday today. Whilst Tommy Flowers died in 1998, today was the second birthday of his great grandson (also called Tommy Flowers). And the event ended with the crowd singing young Tommy happy birthday.

Monday, February 3, 2014

Honey Encryption

As a mostly asymmetric cryptographer, my non-hummus highlight of this year's Bar-Ilan Winter School on Cryptography came when Tom Ristenpart gave us "A Contemporary View of Symmetric Encryption". Avoiding both S-boxes and bit flipping, this talk included a sneak peak at his upcoming Eurocrypt paper with Ari Juels on "Honey Encryption".

Honey encryption addresses the issue of being able to detect the key a ciphertext was encrypted under through trial decryptions, by recognising valid data. For example, an encryption of ASCII text decrypted under the wrong key is unlikely to look like ASCII text, so if your decrypted ciphertext is readable then odds-on you've found the right key. While this is intractable for a large key space, the scenario here is password-based encryption where users choose their own (typically awful) passwords, so trial decryption becomes a feasible attack. In a honey encryption scheme decryption under the wrong password returns plausible looking data, frustrating this attack since all recovered plaintexts appear equally likely.

The approach taken to realise honey encryption is through a pair of algorithms called a distribution-transforming encoder (DTE). A DTE consists of a (usually) randomised encoding algorithm that maps a message to a point in a set, and a deterministic decoding algorithm that reverses the encoding. In constructing a honey encryption scheme messages are first encoded through the DTE, then encrypted under a standard symmetric encryption (SE) scheme using a key derived from the supplied password. Decryption involves applying the SE scheme's decryption algorithm, followed by the DTE decoding algorithm. This straightforward construction means any semantic security guarantees offered by the SE scheme are retained by the honey encryption scheme, should users pick their passwords wisely.

So with all the responsibility now pushed into the DTE, what are the requirements on it? To provide security it should have the property (roughly) that encoding a message sampled from the message distribution (e.g. equally likely strings of ASCII text) results in a uniformly distributed value in the SE scheme's message space, and applying the decoding algorithm to a uniformly chosen value from the SE scheme's message space should result in a message distributed according to the honey encryption scheme's message distribution.

This turns out to be really difficult, and in fact dealing with encryptions of ASCII text is beyond the scope of current constructions. The DTEs given are for encrypting credit card numbers, and prime numbers, with the motivation for the latter being storing RSA secret keys away from the public modulus.

If you can't wait until Eurocrypt to find out more, want to try your hand at constructing a DTE for a new message distribution, or just enjoy wacky combinatorial proofs, you can check out the preprint on Ristenpart's webpage [].