Thursday, April 30, 2015

Eurocrypt 2015: When life gives you Fully Homomorphic Encryption...

On Wednesday morning, Kristin Lauter gave an invited talk about the Practical Applications of Homomorphic Encryption. She started by talking about a recent competition on Secure Genome Analysis sponsored by the (American) National Institutes of Health at the iDASH Privacy & Security Workshop, which received a substantial amount of media coverage.

Kristin reasoned that this excitement was caused in part by the fact that with the rise of relatively inexpensive genome sequencing, privacy is becoming a fundamental problem. A person's genome can give information on genetic diseases that the person has, as well as partial information on the genetic make-up of close family members. Leaking this information can lead to genetic discrimination, for example from health insurance providers. Still, this kind of data can be put to good use as well, such as personalized medicine, biomedical research and ancestry or paternity tests, so we would like to give researchers access to this data without breaking the privacy of the data donors.

Therefore, the aim of the competition was to devise ways of performing analysis on genomic data in a secure manner, which was implemented with two different tracks. The first track considered the scenario where one party owns all the data, like a hospital that wants to protect the privacy of its patients, which is an ideal scenario for Homomorphic Encryption. The second track considered the scenario where there are multiple data owners that mutually distrust one another, which is an ideal scenario for Multi-Party Computation. There were teams from both industry and academia. The used data was part real data from donors and part simulated data.

Kristin described a few different tasks: counting the frequency of minor alleles, and computing the Hamming and Edit distances for the alignment of genomes. The frequency count can be performed by purely additive homomorphic solutions, but schemes based on RLWE allowing for more homomorphism were still competitive. On the other end of the spectrum was the Edit distance, which consists of a relatively deep circuit. However, in practice it is often good enough to have an approximation, as it allows for a shallower circuit such that even existing schemes can provide a practical solution. Each task had a different winner, indicating that there is currently no best solution for all applications.

This competition spurred fundamental progress in the field of secure genome analysis through the interaction between computer scientists, mathematicians, bio-informaticians and policy-makers. Kristin therefore also highlighted the importance of communicating the concept and possibilities of Homomorphic Encryption to the general public, in order to find more good applications. One major point of the invited talk was that there are applications out there right now that can be implemented using relatively shallow circuits, such as the aforementioned secure genome analysis, but also the evaluation of private models in the cloud and machine learning. These applications can be solved by the schemes we have today, even with their inefficiency drawbacks.

That is not to say there are no more challenges to be solved. Storage costs are a problem as Fully Homomorphic Encryption schemes suffer from substantial ciphertext expansion. The security is not always as well understood as we would like. The efficiency does not scale very well to deep circuits and large amounts of data. These are but some of the challenges that Kristin outlined at the end of her talk. So let us work together and make some lemonade!

Co-chairing Eurocrypt two years running, reflections of a program chair (Part 1)

How do you get selected as program co-chair in the first place? Well having been asked for ‘two' major IACR events (CHES 2008, and then for the pair Eurocrypt 2014 and 2015) it seems to me that the process is that somebody drops your name in a board or steering committee meeting … you better chose your ‘friends’ carefully! Once asked there is no way out as such opportunities do not come often and so you better go with the flow! 


I was asked often over the last couple of years how the ‘co-chairing’ was for me.  Now, because CHES embraced the tradition of having two chairs from the beginning (one selected from academia and one from industry) I have only ever have been a co-chair and I know nothing else. Hence, my standard answer was ‘fine’ (remember also my prior remark about friends) ...

Silly jokes aside: I cannot imagine doing such a role alone. Being part of a small team suits me. I was also blessed by working with three exceptional colleagues: Pankaj Rohatgi during CHES 2008, Phong Nguyen during Eurocrypt 2014 and Marc Fischlin during Eurocrypt 2015. I had a great rapport with all three, which contributed to being able to enjoy the madness! Over the three times that I co-chaired, I experienced different ways to distribute the workload: by dividing up the set of papers into two subsets, by dividing up into topics, and alternatively by simply time-slicing: one takes the lead on the various tasks when the other one can’t. All these strategies worked. 


Another question I was often asked  was how much time it takes. Well ‘a lot’ is my answer. I am not a fan of exact time keeping (no matter how many work sheets my employer asks me to fill out ... I have a ‘template’ anyway), because it costs me time that I don’t have! But if you want to get a feel for the input required then think about looking after about 200 ‘researcher entities’. They send you a paper and then they want feedback, which you get based on other people’s opinion. This should not be a time consuming task once all PC members are appointed, as the number of interactions is limited to a few which require manual involvement (thanks to Shai’s effort with our Web Review System). If only that was the case. My Inbox lists just under 1000 emails for Eurocrypt. Removing automated notifications for submissions and results and final versions, this leaves around 400 emails. Because I can’t afford a personal assistant, I have to read/think/act/respond/ myself, which in some cases takes a few minutes and in many cases requires many more  minutes during the ‘act’ and ‘respond’ phase. Luckily I have held a personal Fellowship since 2011 and so could spend plenty of time …


Yet another area of interest was that around how we (i.e. Marc and I) managed to convince the PC to go with our suggested tweaks for the review process (we introduced a two stage process and changed the submission requirements around paper length). My answer was ‘with difficulty’. There was at first a number of people who expressed their concerns in a very vocal manner. Which is fair enough. After quite some email exchanges, it became clearer that not everybody who was more ‘quiet’ was against our proposal. After some people offered us their support more openly, we were lucky enough that everybody ‘gave it a go’. 


It’s useful, I find, to experiment, but only if afterwards one also asks for feedback and reflects on it. So after all notifications had been sent out in 2015, and people had some time to ‘cool down’, I sent out a Doodle poll asking people for their views. Here are the results (121 people responded, I list the number of votes each statement received):


I like the fact that there were two rounds: 92 
I hated the fact that there were two rounds: 12
I think it was useful to only elicit rebuttals for the second round papers: 63
I think that everyone should have been allowed to write a rebuttal: 38
I think the increase in page limit was useful: 86
I think we should stick to 12 pages as limit for always and ever: 12
I want to have a page limit of 20 pages LNCS in the future: 38
I want to have a page limit of 25 pages LNCS in the future: 19
I want to have a page limit of 30 pages LNCS in the future: 31


I also received anonymous comments which I will supply to the IACR board in my report. They mainly touch on the perceived lack of impact of the rebuttals that authors wrote on the final decision. I think this is a valid point. As all second round authors were given the chance to write a rebuttal, most authors did take that opportunity. But for many papers reviewers hadn’t directly required clarifications from authors and so these papers probably didn’t benefit much then from that opportunity. Some papers however did get questions very directly to be addressed during the rebuttals and there the rebuttal had a useful role to play (even though it was limited to a single
iteration, i.e. we refrained from allowing lengthy conversations between reviewers and authors, which we inevitably would have had to moderate to preserve anonymity). 


I was to do some chairing again (which I will not any time soon) then I would want to discuss the question of only giving selected papers the opportunity to respond (based on reviewer recommendations). But one thing I took away from the review process is that the two stages, where one is short and the second one focused on only a smaller set of papers makes a lot of sense. 


This is the end of part 1 … 

... come back to the blog in a few days time to find out about what it was like to make decisions about accepting/rejecting papers, our attempts to create parallel sessions, and to read up on how the attendees though about the experiment! 

Eurocrypt 2015 Rump Session

It was on a sunny evening that the rump session commenced. The call for presentations was one of the most convoluted one ever conceived in the history of IACR. The inevitable effect was a fair amount of confusion during the submission process, leading to a brilliant program.

Elisabeth Oswald (with Marc Fischlin loitering in the background) gave an insightful view in early perceptions of the new submission format and the use of parallel sessions. They had conducted a questionnaire among Eurocrypt authors and program committee members. One of the questions related to the submission page limit, should it be 20, 25, or 30? The responses were split between short and long, with very little support for the middlle. (One day later, during the membership meeting Cachin mentioned the Board had settled on 30 pages LNCS style.) Yvo Desmedt followed up with a plea to publish corrections to proofs etc. more prominently.

Christina Bruszka (with Marc Fischlin lurking in the background) presented the best future conference award to Jens Groth, who will be chairing the 15th IMA Conference on Cryptography and Coding. For most other events announcement slides were made available during the break. Of note were TCC 2016, which will take twice next year! To unclutter the cryptacademic calendar, the TCC community has decided to migrate their conference from February/March to November. The migration will be in steps, so next year TCC 2016-A will be in January in Tel Aviv (submission deadline will be 13th July with notification on October 2nd, roughly a week before Eurocrypt'16 submission). TCC 2016-B will take place later in the year (November or December). Another event worth noticing taking place next week already (at the TU Eindhoven, Netherlands) is Security in Times of Surveillance.

The final talk of the rump session was by Ivan Damgård (with Marc Fischlin substituted by little fish swimming about). He had conceived a new cryptosystem secure in the presence of powerful adversaries by cleverly inverting the paradigm used by quantum cryptography. There if you try to eavesdrop the message, you will destroy the message. The dual approach advocated by Ivan is that if you try to eavesdrop the message, the message will destroy you! I am always amazed how long it sometimes takes to find ideas that are so obvious in hindsight. This is definitely one of them. Ivan had found a way to implement his revolutionary idea by means of a recently discovered fish. There were still some minor niggles that would prevent imminent deployment (for instance, tampering with the ciphertext could lead to destruction of the legitimate receiver), but a very promising research direction nonetheless. Ivan won the Best Rump Award for his presentation, fully deserved in my opinion!

Wednesday, April 29, 2015

Eurocrypt 2015: Bootstrapping for HElib

Fully homomorphic encryption allows evaluation of arbitrary functions on encrypted data. In an FHE scheme, the ciphertexts are "noisy". This noise is necessary for security reasons, but can interfere with decryption if it grows too large, and the noise increases with each homomorphic operation. This means that there is a limited number of homomorphic operations which can be performed, and we then have what is called a Somewhat Homomorphic Encryption scheme (SHE). In this context, bootstrapping is used as a noise management technique to turn an SHE scheme into an FHE one. This is done by homomorphically decrypting the ciphertext before the noise grows too large, in order to reduce the said noise so that more homomorphic operations can be performed. It is implemented by augmenting the public key with an encryption of the secret key (under the public key) $Enc_{pk}(sk)$, then by a homomorphic decryption which results in a "fresh" ciphertext. Note that for this to work, circular security is assumed.

There are three generations of homomorphic schemes, the first being the scheme proposed by Gentry. Then, there are the packing scheme (which is faster asymptotically) and the small noise one. So far as we know,  slow noise growth is incompatible with the ciphertext packing technique. This paper implements the ciphertext packing scheme, described by [BGV12].

Quick overview of the BGV scheme

The parameters of the scheme are integers $m$, $p$ and $q$. The plaintext space is integers in the group $R=\mathbb{Z}[x]/ (\Phi_{m}, p)$, where $\Phi_{m}$ is the $m^{th}$ cyclotomic polynomial. Ciphertexts and secret keys are vectors in $R$, and we require $q$ to be much larger than $p$. A ciphertext $c$ encrypts a plaintext $m$ with respect to a secret key $sk$ if it holds that $m=[[ \langle c, sk \rangle ]_{q}]_{p}$, where $[.]_{p}$ denotes modular reduction in an interval centered around zero (rather than in ${0, 1, ..., p-1}$) and $\langle ., . \rangle$ stands for inner product. The homomorphic operations are done over $R$. The scheme is also parameterised by a sequence of gradually decreasing moduli $q_{i}$, where we perform a modulus switch in order to reduce noise levels (bootstrapping is then performed upon reaching the last modulus).

Work presented

Bootstrapping for HElib which was presented today by Shai Halevi follows the method of [GHS12], and generalises it.  We take $q$ such that $q=2^{e}+1$ for some $e$ not too small. Recryption is done by storing $Enc_{pk}(sk)$ with respect to a larger plaintext space where $p'=2^{e+1}$. We then compute $u=[ \langle sk, c \rangle]_{2^{e}+1}$, and extract the plaintext as the xor of the least and most significant bits of $u$. $u$ is very easily and quickly computed, and it is the extracting of the most and least significant bits that is the most expensive part of the recryption procedure. It is done in three steps, the first of which is a linear transformation which outputs ciphertexts that have the coefficients of $u$ in the plaintext slots. Then a bit extraction operation is performed and an inverse linear transformation, which recovers a ciphertext encrypting the initial plaintext.

The linear transformations are achieved through imposing a hypercube structure to the group $R_{p}=\mathbb{Z}[x]/(\Phi_{m}, p)$ with one dimension per factor of $m$, where we recall that $p$ can be any prime power. The inverse linear transformation is implemented as a sequence of multi-point polynomial evaluations operations, one per each dimension of the hypercube. It can happen that some of these linear transformations are linear over the base field, but not over the extension. To avoid that, various constraints are applied to the parameters.

Results

This paper generalises [GHS12] and [AP13] by extending their methods from the prime $p=2$ to any prime power $p^{r}$. The implementation presented supports recryption of packed ciphertexts, i.e. recryption of ciphertexts encrypting vectors of plaintexts. The procedure also has low enough depth so as to allow for a significant processing in between recryptions while remaining efficient. The ePrint version can be found here.


References:
[GHS12]: http://eprint.iacr.org/2011/566.pdf
[AP13]: http://www.cc.gatech.edu/~cpeikert/pubs/simpleboot.pdf










Tuesday, April 28, 2015

Eurocrypt 2015: Nobody Should Know What You Bought Last Summer

It has emerged that Bitcoin does not really live up to the promise of anonymous transactions because the transactions can be linked publicly. While there are mixers that offer to break this link (but sometimes simply steal your money), it would be desirable to have unlinkability built in to a crypto currency. The challenge here is the following: If the transaction graph is public as it is in Bitcoin, it is easy to find the predecessor of a new transaction to check its validity. For real anonymity, one has to be able to prove that one knows the secret key of one of the existing transaction without revealing which one.

Jens Groth presented such a scheme at his talk. More technically, the scheme allows to prove that one out of a list of Pedersen commitments hides zero. Recall that Pedersen commitments are homomorphic with respect to a group operation both for the secret and the randomness. This makes them particularly suited for the so-called Sigma protocols, which consist of three rounds and can be made non-interactive in the random oracle model by hashing the first message to get the challenge. Furthermore, they offer special soundness, that is, one can extract the secret from a number of correct answers to different challenges, and special honest-verifier zero-knowledge, which means that the protocol can be simulated without knowing the secret if the challenge is chosen independently of the first message.

In summary, they authors propose a Sigma protocol to prove that one out of $N$ Pedersen commitments hides a zero. It provides computational $(\log N + 1)$-soundness and logarithmic communication. The core idea is to understand the $N$ commitments as the leaves of a binary tree and then let the prover commit to the bits of the path to the commitment he knows. This allows to obliviously compute a commitment of the secret known to the prover from the list of commitments. The communication is logarithmic in the size of the list because it is already known to the verifier. On the downside, the verification complexity still is linear in the list size, which seems to be inherent to the problem.

The paper can be found on ePrint.

Eurocrypt 2015: The Sum Can Be Weaker Than Each Part

Tuesday's Real track began with a session on hash functions. The second of the two talks was about the SPHINCS signature scheme, a paper Ryan recently blogged about [A]. The first was presented by Gaetan Leurent about work coauthored with Lei Wang [1], and is the subject of this blog. It continues the the study of how you can combine two hash functions to make something that protects against a weakness in one of the hash functions...
The Problem
There are countless times when schemes assume they have access to a secure hash function, and may fail catastrophically if this is not met (a secure hash function should be collision and preimage resistant [B]). However, the hash functions we actually use are always being attacked, and as these attacks get better the security provided decreases. So, designers often choose to combine the output of two different hash functions $H_1,H_2$, hoping that to break the combination the attacker will have to break both of them. The two most intuitive methods for doing this are concatenation and xor: $C(m)=H_1(m)||H_2(m)$ and $X(m)=H_1(m)\oplus H_2(m)$. Generally (as will be assumed for the rest of this blog) both $H_1,H_2$ are "narrow pipe" (their internal state is the same size as their output) MD [C] hash functions (eg MD5,SHA1,SHA2 etc). However, it is important to keep in mind that this is not the only way of building a hash function.

Past work
In his multicollisions paper[2], Joux showed that $C$ was no stronger than an ideal hash function (despite having double the output size). Broadly speaking, this proved that the concatenation combiner was, in some sense, no worse than using either function separately. Hoch & Shamir [3] extended this result, showing that even if the hash functions were in some sense "partially broken", reasonable security could be attained.
In terms of the Xor combiner, the Hoch&Shamir result implies security is up to $2^{n/2}$ for preimage resistance, but no attack has been found close to this bound.

New Contribution
Leurent & Wang provide a preimage attack of complexity roughly $2^{5n/6}$. They do this by creating a collection of messages $M_{a,b}$ such that $H_1(M_{a,b})=a$ and $H_2(M_{a,b})=B$. This is done using a novel techinque of "switching" and "interchanges". Having done this, they choose $A,B$ and a message $m'$ such that $X(M_{a,b}||m)=h_1(A,m) \oplus h_2(B,m)$ is the target digest (where $h_i(x,y)$ is the internal routine of $H$, initialised with state $x$ processing message $y$). This choice of $A,B$ can then be made using a birthday-like attack, leading to the sub-exhaustive complexity.

An interesting consequence of this is that the xor of two secure hash functions (who would have preimage security up to $2^n$) is less secure than either one individually.In the paper, they also demonstrate that if the hash functions are not completely secure, this attack can be improved to roughly $2^{n/2}$, demonstrating Hoch & Shamir's results are in fact tight. So, when the functions are secure, the xor combiner reduces the security provided, but once they begin to break down it is likely it will start to assist, keeping security at the 2^{n/2} level.

PS: A bit more detail on "switching"
Let me try to briefly explain how the interchange system is created. Consider the internal states whilst calculating $H_1(M)$. These form an obvious chain, and we can think of this as our base line $L_1[0]$. Applying an arbetrary change to the first block of $M$, we will get a second chain that is in some sense parallel to the original one, but offset by this first change (think of this as line $L_1[1]$). Doing the same for $H_2(M)$, we get another collection of lines $L_2$.

Now, using multicollisions, the authors demonstrate it is possible to find a sequence of message blocks that when appended to the inputs of both hash functions increase your line number in one of the collections without changing it in the other. So, for example, we find a multicollision that lets us transition from $(L_1[0],L_2[0])$ to $(L_1[1],L_2[0])$. Having done this for enough pairs, it becomes possible to get from the initial states to any one of $2^{t^2}$ possible states (where $t$ is the number of lines) after just $2^{2t}$ multicollisions. This is how we create the messages $M_{A,B}$ efficiently enough that it does not dominate the attack.

Errata:
A previous version incorrectly stated that the xor combination of two hash functions is robust under collision resistance. The combiner is robust for combining two PRFs[4], but this is a very different result.

Monday, April 27, 2015

Eurocrypt 2015: Backdoored PRNGs

Since truly random bits are often hard to come by, many cryptographic protocols use a small amount of true randomness to seed a pseudorandom number generator (PRNG) whose outputs ought to be indistinguishable from truly random strings. If one can predict the future outputs of such generators then one can often subvert the larger protocol built on top. So security of PRNGs is crucial in the design of secure protocols.

The trouble is, it may be that a PRNG that appears secure to a typical user actually contains a backdoor rendering it totally insecure; some information may exist, unknown to most users, the knowledge of which enables one to recover the internal state of the PRNG and hence predict its future (or maybe past) outputs. This is a highly topical subject given the recent accusations that the NSA purposely introduced a backdoor in the NIST Dual EC PRNG: it is claimed that the NSA knew some underlying information about the parameters suggested by NIST that is infeasible for ordinary users to determine, and armed with this knowledge, one can use a single output of the PRNG to predict all future outputs. In particular, it would be possible to decrypt TLS sessions using this trapdoor information.

This morning Chaya Ganesh presented a formal model for backdoored PRNGs, in joint work with Yevgeniy Dodis, Alexander Golovnev, Ari Juels, and Thomas Ristenpart. They say that in a backdoored PRNG, the parameter generation algorithm outputs a set of public parameters called the public key and a trapdoor called the secret key. The generation algorithm takes a public key and the current state of the generator and outputs some bits that should be indistinguishable from random to any party with just the public key. A party with the secret key can easily distinguish PRNG outputs from random strings and possibly predict future outputs.

The authors prove that a backdoored PRNG whose outputs are indistinguishable from random without the secret key is equivalent to a public key encryption scheme whose ciphertexts are indistinguishable from random. They also consider possible countermeasures against backdoors, called immunisations, and give a negative result in the so-called public immuniser security model where the immunisation function is decided upon before the PRNG is designed: they show that a saboteur can always design a PRNG that will bypass the immuniser, no matter what it is. On the other hand, in the semi-private immuniser model, where the immuniser's randomness is known to the party using the backdoor but unknown to the PRNG designer, the authors give a positive result. They use the immuniser's randomness as a salt to a hash function and prove that the outputs of this scheme are indistinguishable from random bits, even for an attacker in position of the trapdoor secret key and the salt from the immuniser.

This was an interesting talk, combining elegant formalism with a hugely important and topical issue in security. The full paper is online here.

EuroCrypt 2015 invited talk by Tal Rabin: a privacy research roadmap


For some time now, privacy issues have been on the agenda of various agents, yet solutions remain fragmented, varied, and lacking the impact expected from them in practice. The talk exposed this state of affairs and proposed several points that can better structure the privacy research, its relation to technology, and its funding.

Benefits and doubts
A basic truth was recalled: computation on personal data is great for all sorts of areas, like healthcare, transportation, finance, internet of things, national security, etc. Obviously, the counterpart is a proliferation of personal data in a dense fog of devices, clouds, networks and databases of all kinds. Indeed, each application field may come with its own databases and devices, which seems reasonable at first, but we never know how these will be combined, by whom, and where. Privacy, whatever it means, can easily get lost in this mixture. Three big questions can then guide our research into future privacy frameworks:
  • Do they say what they do and do they do what they say ?
  • Are they usable ? 
  • How do they relate to policy requirements (e.g. of governments) ?
Ingredients for science (both in industry and academia)
The first question is one of rigour. A privacy framework has to start by formulating the problems it tries to solve, in a way that is precise, clear and generic. Once we have a solution, we need arguments why it solves the problem. What kind of arguments are acceptable as proof? Do we need to convince humans or, better, machines? A cloud or an app store may be able to check if a virtual machine or app comes attached with a convincing proof of privacy, or if the assumptions of the proof are not valid anymore.

Most of current solutions are rather ad-hoc. Not only do they fail to define the privacy problem, but also their scope is rather narrow, making them difficult to apply in other areas. The talk argued for generic solutions; they need to be designed at a higher level of abstraction, considering their context, their instantiation in applications (which can in turn open new problems and refine the design), various attacker models, etc. 

Dont forget the salt 
What does it mean for a system's privacy features to be usable? Does it mean the user doesn't have to bother about privacy? Or that he has precise control over data? Or he passes data custody to a third party ? It is certain that usability should come into play not only when implementing a protocol, but even earlier at design time, or at specification time. Usability is really about the way in which we collect data, so it is very much related to privacy protocols, which arrange that data for computation - an interplay between the two would certainly be fruitful.

Out in the fields
Multiple disciplines should be brought together in order to link the technology with the society in general, and with its privacy policies in particular. For example, research can make clear what is possible, or not, in practice, so policy requirements can take that into account. Other examples: formulate a trade-off between utility and privacy (needs some economics and law ?), and design an add-on to implement that; linguistics to identify private data in a text; social science to see how people care about privacy; etc.

Examples: indirection, composition, transparency
The last part of the talk contained some examples. First of privacy issues in health care, transportation and the internet of things. I guess these are well-known now. Then we had an example of indirect problems to which privacy is subject: suppose one has a family member who did a search about a genetic disease; putting a government database and a google database and the discipline of biology together, one could then violate ones privacy.

A cryptographic concept was mentioned as a positive example: privacy preservation under composition with other protocols. Yet, even compositionality breaks down if there's no framework to sustain it; how can P1 | P2 preserve privacy when P2 simply leaks some private data ?  Even worse, the public output of P1 and the leaked private data of P2 may leak more information than P1 or P2 alone.

Transparency of the way in which data is handled was another interesting example: it might improve privacy, but the industry wont go too far with it, so perhaps we need some new techniques to output evidence of appropriate handling of data, without disclosing the actual algorithms.

The government or the indutry? 
That was one of the questions debated at the end, and I guess we agreed both would play a role.

Eurocrypt 2015: Universal Signature Aggregators

Suppose you have a collection of signatures on different messages under different keys, and another party wants to verify that you have all of them. You could send all the signatures, and the other party could obtain the relevant public keys and then verify the signatures. However if you have a lot of signatures this is a lot of data to transfer. Aggregate signatures allow you to combine all the signatures into one of size independent of the number you have combined.

Most aggregate signatures make use of algebraic properties of the underlying scheme's signatures. This means that those generating the signatures to be aggregated all have to use the appropriate scheme. Venkata Koppula's talk this morning asked if we could aggregate signatures from different schemes. He went on to answer the question in the positive, calling such a tool a universal aggregator.

Syntactically, a universal signature aggregator looks like this:

Security says that given the public parameters and a public key for a signature scheme, an adversary with access to a signing oracle cannot compute a valid aggregate signature involving a signature under the given public key on a message he did not query to the signing oracle.

The idea behind the constructions is to first verify the signatures, then transform them into an aggregation-friendly form, then aggregate them. The problem now is the transformation step. At this point in the talk I expected to see some NIZK action, but the tool of choice was instead everyone's favourite hammer, indistinguishability obfuscation.

The construction presented this morning had the aggregation-friendly form of a signature be the evaluation of a one-way function on the signature tuple, with the OWF outputting points in Z_N with N an RSA modulus. Rather than passing out the key to the OWF in the parameters the setup phase generates an obfuscation of a program that evaluates the OWF. This means (selective) security can be shown by replacing this obfuscated program with an obfuscation of a version of the program that refuses to act on the selected challenge signature tuple.

There are additional constructions in the paper, including one with adaptive security, but they all follow the same general outline. The paper (joint work with Susan Hohenberger and Brent Waters) can be found on eprint.

Sunday, April 26, 2015

52 Things: Number 29: What is the UF-CMA security definition for digital signatures?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week we look at the security definition for signatures.

So Number 16 gave the details of the DSA, Schnorr and RSA-FDH signature schemes, but what is a signature scheme and what security properties should it achieve?

A signature scheme $\mathsf{S}$ is a tuple of algorithms $(\mathsf{KG},\mathsf{Sign},\mathsf{VRFY})$ such that:


  • $\mathsf{KG}$ is a randomised algorithm which outputs a secret key $sk$ and a public key $pk$.
  • $\mathsf{Sign}$ is a  (possibly) randomised algorithm which on input $sk$ and a message $m$ it outputs a signature $\sigma$
  • $\mathsf{VRFY}$ is a deterministic (non-stateful) algorithm which takes in the public key $pk$, a message $m$ and a signature $\sigma$ and returns 1 if $\sigma$ is a signature on $m$ and 0 otherwise

Signature schemes are used to prove the origin of a message. If a message has a signature on it, signed by Alice's secret key then it must have come from Alice. The advantage of using a signature scheme over a MAC (assuming good public key infrastructure) is that it can be verified by anyone and does not need any shared secrets.

Now for the signature to prove the origin of a message, it needs to be the case that someone without the secret key can not create a valid signature on a message he has not seen signed before. This is called UF-CMA security.

The game works as follows:

  1. The game runs $\mathsf{KG}$ to get (pk,sk)$
  2. The adversary $A$ is given $pk$ and can then send messages $m_i$ to the game and get back signatures $\sigma_i$ under the secret key $sk$
  3. $A$ must output a pair $(m^*,\sigma^*)$
$A$ is said to win the game if $\sigma^*$ is a valid signature on $m^*$ and $m^*$ is not the same as any of the $m_i$'s which $A$ asked the game to be signed. The advantage of the adversary in the UF-CMA game is defined as the probability that $A$ wins the game. The signature scheme $\mathsf{S}$ is said to be UF-CMA secure if the advantage is suitably small.

Tuesday, April 21, 2015

Its HUGE - RSA 2015

The state of the IT industry, and the security industry in particular, can be judged by just how big the RSA Conference is. This week RSA 2015 is being held in San Francisco and it is certainly the largest I have ever seen it to be. As usual we are in the Moscone Centre in downtown San Fran. But the conference is taking over the entire centre. The tracks are being held in Moscone West, whilst the main keynotes and the expo is held in Moscone South and North. It is certainly the biggest security Expo I have ever seen, and the place is packed with delegates.

As usual the conference started with some "star" giving a joke song about security. This year it was apparently someone of a programme called "Glee" singing a parody of Bowie's "Changes". We then had the usual keynotes. For those not used to the RSA Conference these can range from terrible self publicity of a company, through to really interesting and thought provoking philosophical discussions on all things security. The first keynotes this year were of the latter type.

Amit Yoran of RSA kicked off with a discussion of how our mindset of walls and perimeters is still dominant in the industry, despite this being known to be wrong for many years. His thesis was that we have the technology to prevent all the high profile attacks of the last year, but we lack the mindset. I thought this was a very thought provoking talk, and well worth catching up on the video if you were not there.

We then had Scott Charney of Microsoft, who concentrated on the idea of us not only wanting security, privacy and reliability, we also require control and transparency. This is particularly true of cloud services; since we are not expecting to give over control of our (individual and company) data to cloud providers. We want the controls on this data, and how those controls are exercised, to be done in a transparent manner.

Taking up the idea that we simply have the wrong old fashioned mindset, the next talk by Christopher Young of Intel looked at how sporting teams have changed the way they work by examining scientific evidence. He concentrated on some team doing baseball (which I am led to believe is an inferior form of cricket, close to the "girlie game" of rounders). Apparently there is some film out about a baseball team which improved by using statistics. The key point being that the team improved by changing the standard way of working, and this was achieved by processing the large amount of data which was available.

This was followed by the usual highlight of the RSA Conference; namely the Cryptographers Panel. This year Paul Kocher chaired, with the panel made up of Whit Diffie, Ron Rivest, Adi Shamir and Ed Giorgio (ex NSA). Whit was in particularly good form with a number of interesting view points and remarks which drew chuckles from the audience. The panel considered a number of issues; from Snowden to Bitcoin to IoT.

The award for Mathematical Excellence this year was shared between Ivan Damgard and Hugo Krawcyzk.  Ron gave a nice little talk linking the work of these two excellent cryptographers by their work on hash functions.

So what did I learn from the first day of RSA? Well mainly I am out of touch with modern culture. All the references to movies and TV programmes went over my head.

Friday, April 17, 2015

52 Things: Number 28: What is the IND-CCA security definition for public key encryption?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We discuss IND-CCA security for public key encryption.

IND-CCA security stands for Indistinguishable Chosen Ciphertext Attack. The idea behind it is that in a secure encryption scheme, given a ciphertext, an adversary should not be able to tell what message the given ciphertext encrypts. In this model, the adversary is allowed to call the encryption and decryption oracles both before and after steps 3 and 4 below. The find-then-guess security game for public key IND-CCA is the following.

1. Generate the public and secret keys $(p_{k}, s_{k})$. The adversary A has access to the public key $p_{k}$

2. Assign $b \leftarrow \{ 0, 1\}$ privately

3. A is allowed to query the decryption oracle $Dec_{s_{k}}$ and the encryption oracle $Enc_{p_{k}}$

4. A then outputs a pair of messages $(m_{0}, m_{1})$

5. We output the encryption $c=Enc_{p_{k}}(m_{b})$

6. The adversary is allowed to enquire for more encryptions or decryptions, as in step 3, but he is not allowed to ask for the decryption of $c$

7. A outputs $b' \in \{ 0, 1 \}$. A wins if $b=b'$

We say the advantage of A is $Adv(A) = 2\mid Pr[A$ wins$] - 1/2 \mid$. A scheme is said to be IND-CCA secure if the said advantage is negligible.

There is a different version of IND-CCA, real-or-random, mentioned by Gareth in last week's post. The difference is at step 5 above, where instead of outputting the encryption of the message $m$ A asks for every time, we output an encryption of a random $m'$ of length $\mid m \mid$ if $b=0$, and an encryption of $m$ otherwise. A must then distinguish if he is in the "real" or "random" world. Advantage and security are defined similarly.

The two definitions are equivalent in the sense that if a scheme is IND-CCA secure in the real-or-random sense against an adversary A, we can construct an adversary B for the find-and-guess such that both advantages are equal. Similarly, if a scheme is find-and-guess secure against an adversary A, we can construct an adversary B such that $Adv_{find-and-guess}(A)=2 \cdot Adv_{real-or-random}(B)$.

Friday, April 10, 2015

52 Things: Number 27: What is the AEAD security definition for symmetric key encryption?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This post will kick off the 'Security Definitions and Proofs' section with a brief overview of Authenticated Encryption.

In a recent post Luke described a number of well-used modes of operation (ECB, CBC and CTR) for blockciphers, modes that provide privacy (confidentiality) only. We may also want integrity from our encryption mechanism, meaning that the recipient is assured that the message it receives is the one sent without accidental changes or intentional tampering, and authenticity meaning that the receiver is convinced of the origin of the message. To get these additional goals we often use a message authentication code (MAC), and the most widely used are those based on blockciphers and those based on hash functions (HMAC). Putting these two primitives together is non-trivial: to get an IND-CCA secure scheme we need to follow the 'Encrypt-then-MAC' paradigm with a secure encryption scheme and a strongly unforgeable MAC, meaning computing the MAC on the ciphertext (see here and here for more info on Encrypt-and-MAC and MAC-then-Encrypt, with a focus on why one should avoid them). The 'AD' refers to variable-length associated data such as packet headers, and we normally expect authenticity and integrity but not confidentiality from this optional component. For further reading and examples, see Adam Langley's blog on the topic.

Next week's blog post will see an in-depth overview of IND-CCA2 security in the context of public-key encryption. The 'real-or-random' definition of IND-CCA2 (and IND-CCA1) gives the adversary access to an encryption oracle, which has an encryption key hardwired and on input message $m$ returns either a 'real' encryption $\mathsf{E}_k (m)$ or 'fake' $\mathsf{E}_k (\$^{|m|})$, and a decryption oracle that given a ciphertext $c$ will return $\mathsf{D}_k (c)$ - the adversary is then asked to distinguish which world he is in. In 2004 Shrimpton showed that a new notion dubbed IND-CCA3, where the decryption oracle in the 'fake' world is replaced by an oracle that always returns the invalid symbol $\perp$, is equivalent to the previously considered notion of AE, where the notions of privacy and authenticity/integrity are looked at separately. This observation was incorporated into Rogaway and Shrimpton's paper on the keywrap problem and Deterministic Authenticated Encryption. For more information on the impact of associated data, see here and here.

In practice, a large proportion of traffic uses CCM mode, which is a combination of a blockcipher in counter mode with CBC-MAC with the MAC-then-Encrypt approach, and GCM which uses Encrypt-then-MAC with a blockcipher in counter mode and a polynomial-based hash function called GHASH. CCM is comparatively inefficient as it requires two blockcipher calls per message block and is not online (message length needs to be known before processing can occur), and as this paper by Saarinen shows, GCM has some weak keys.

The CAESAR competition is currently in progress, with the aim of selecting a portfolio of authenticated ciphers for recommendation based on thorough academic public scrutiny. One of the main aims is to get more researchers thinking about such a vital topic, and the large number (and varied nature) of first round submissions indicates this goal has already been achieved. The second round candidates are expected to be announced next week, and an overview of the submissions can be found at the AE Zoo which is run by a number of researchers from DTU.

ECDLP Over Fields of Small Characteristic

Last week Igor Semaev distributed a paper which seems to give a new, more efficient, algorithm to solve the Elliptic Curve Discrete Logarithm Problem (ECDLP) on curves defined over fields of small characteristic. This is important as such curves are used in a number of protocols and products. Whilst not devastating (yet), the algorithm does vindicate the communities preference for curves defined over large prime fields.

Recall the ECDLP is to solve the following equation for z,
$$ Q = [z] P $$
where $P$ and $Q$ are given points on the curve $E({\mathbb F}_q)$ over the field ${\mathbb F}_q$.

Like many modern algorithms for the ECDLP over curves of small characteristic the algorithm makes use of the "summation polynomial". In particular it makes use of the third summation polynomial $S_3(x_1,x_2,x_3)$. This polynomial has a zero $(x_1,x_2,x_3)$ if and only if there exists $y$-coordinates $y_1,y_2$ and $y_3$ such that $P_i=(x_i,y_i)$ is a point on the curve and
$$ P_1+P_2+P_3 = {\mathcal O}$$.

The algorithm is very very simply, and is given by the following steps, (see page 4 of Semaev's paper).
  1. Pick an integer value $m$ and a subset $V$ of ${\mathbb F}_q$ of size $q^{1/m}$.
  2. Generate a set of "relations" as follows:
    1. Pick random  integers $u$ and $v$ and compute the point $R= [u] P + [v] Q$.  If $R={\mathcal O}$ then we immediately solve the ECDLP, so assume $R \ne {\mathcal O}$.  Write $R$ as $(R_X,R_Y)$
    2. If $R_X \in V$ then we have a relation $[u] P + [v] Q - R = {\mathcal O}$.
    3. For $t=2,\ldots,m$ try to solve the following system \begin{align*} S_3(u_1,x_1,x_2) &= 0 \\ S_3(u_i,u_{i+1},x_{i+2}) &=0 \mbox{  for } 1 \le i \le t-3 \\ S_3(u_{t-2},x_t,R_X)&=0 \end{align*} for $x_1,\ldots,x_t \in V$ and $u_1,\ldots,u_{t-2} \in {\mathbb F}_q$.
    4. If successful we can find $y_1,\ldots,y_t \in {\mathbb F}_q$ such that $$(x_1,y_1) + (x_2,y_2) + \ldots + (x_t,y_t) + [u] P + [v] Q = {\mathcal O}$$.
  3. As soon as we have enough relations we can solve for $z$ using linear algebra. 
The last step is standard in so called "index calculus" algorithms. The key innovation is in the second step. The idea is that we try to solve the system defined by the $S_3$ summation polynomials; which have low degree; rather than attack the $S_m$ polynomial in one go.  Semaev provides strong evidence that this step will be amenable to so called Grobner Basis algorithms; by analysing the "first fall degree" assumption for such a system. 
The expectation is that this method will be able to "break" various curves in small characteristic currently defined in cryptographic standards. I expect experimental validation of this expectation, or a reason why such experiments fail, to come in the next few months.

UPDATE (23/04/2015):

More updates on this potential algorithm:


Friday, April 3, 2015

52 Things: Number 26: Describe the NAF scalar multiplication algorithm.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we describe the NAF scalar multiplication algorithm.

The NAF scalar multiplication algorithm is an enhanced form of scalar multiplication, in which the use of Non-Adjacent Form (NAF) representation decreases the expected running time of the algorithm. In more detail:

Let k be a positive integer and P a point on an elliptic curve E over the field $\mathbb{F}_{q}$. The operation that computes the multiplication $Q = k\cdot P$ is known as the elliptic curve scalar multiplication (or point multiplication) and dominates the execution time of elliptic curve cryptographic schemes. One of the most basic methods to compute a scalar multiplication is based on a double-and-add variant of Horner's rule, known as the binary method. As the name suggests, the two most prominent building blocks of this method are the point doubling and point addition primitives. More particularly, to calculate $k\cdot P$, integer k is represented as $k=k_{n-1}2^{n-1}+k_{n-2}2^{n-2}+ \cdots +k_{1}+k_{0}$ , where $k_{i} \in \{0,1\}$, $i= 0,1,2...,n-1$ (binary representation form). The following algorithms form the additive versions of the basic repeated-square-and-multiply methods for exponentiation [1,2].

INPUT: k = (kt1,..., k1k0)2, P E($\mathbb{F}_{q}$). 
OUTPUT: k $\cdot$ P.
       Q$\infty$.
       For i from 0 to t1 do
           If ki  = 1 then QQ+P.
           P←2P
Return(Q). 

INPUT: k = (kt1,..., k1k0)2, P ∈ E($\mathbb{F}_{q}$). 
OUTPUT: k $\cdot$ P.
       Q$\infty$.
       For i from t−1 down to 0 do
             Q←2Q.
             If ki = 1 then QQ+P.
Return(Q). 

The first algorithm processes the bits of k from right to left, while the second processes the bits from left to right. The expected number of ones in the binary representation of k for both  algorithms is t/2≈m/2, therefore the expected running time of both algorithms is approximately m/2 point additions and m point doublings [1]: 

$\frac{m}{2} \cdot A + m \cdot D$.

In 1951, Booth [3] proposed a new scalar binary representation called signed binary method and later Rietweisner [4] proved that every integer could be uniquely represented in this format [5]. More particularly, If P=(x,y)$\in$E($\mathbb{F}_{q}$) then −P=(x,x+y) if $\mathbb{F}_{q}$ is a binary field, and −P=(x,−y) if $\mathbb{F}_{q}$ has characteristic > 3. Thus subtraction of points on an elliptic curve is just as efficient as addition. This motivates us to use a signed digit representation $k=\sum_{i=0}^{l-1} k_i \cdot 2^i$, where $k_i \in \{0, \pm \}$. A particularly useful signed digit representation is the non-adjacent form (NAF). A NAF representation of a positive integer k is an expression $k=\sum_{i=0}^{l-1} k_i \cdot 2^i$ , where  $k_i \in \{0, \pm \}$ , $k_{l-1} \ne 0$ , and no two consecutive digits ki are non-zero. The length of the NAF is l [1].

Properties of NAF [1]:
  1. k has a unique NAF denoted NAF(k). 
  2. NAF(k) has the fewest non-zero digits of any signed digit representation of k. 
  3. The length of NAF(k) is at most one more than the length of the binary representation of k. 
  4. If the length of NAF(k) is l, then 2l /3 < k < 2l+1/3.
  5. The average density of non-zero digits among all NAFs of length l is approximately 1/3. 
NAF(k) can be efficiently computed using the following algorithm [1]:

INPUT: A positive integer k.
OUTPUT: NAF(k).
       i←0.
       While k≥1 do
             If k is odd then: ki ←2−(k mod 4), kkki;
             Else: ki ←0.
             kk/2, ii+1.
Return(ki1ki2,..., k1k0).

The last algorithm presents the way we can modify the left-to-right binary method for scalar multiplication by using NAF(k) instead of the binary representation of k [1]:

INPUT: Positive integer k, P E($\mathbb{F}_{q}$). 
OUTPUT: k $\cdot$ P.
       Based on previous algorithm compute NAF(k) =$\sum_{i=0}^{l-1} k_i \cdot 2^i$
       Q←$\infty$.
       For i from l−1 down to 0 do
             Q←2Q.
             If ki  = 1 then QQ+P.
             If ki  = −1 thenQQP.
Return(Q).

Based on the third and fifth properties of NAF, the expected running time of the NAF scalar multiplication algorithm is approximately [1]:

$\frac{m}{3} \cdot A + m \cdot D$.

[1] Hankerson, Darrel, Scott Vanstone, and Alfred J. Menezes. "Guide to elliptic curve cryptography". Springer Science & Business Media, 2004.
[2] Jonathan Taverne, Armando Faz-Hernández, Diego F. Aranha, Francisco Rodríguez-Henríquez, Darrel Hankerson, Julio López. "Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction", Journal of Cryptographic Engineering, Vol. 1, No 3, pp. 187-199, 2011.
[3] A.D.Booth, “A Signed binary multiplication technique”, Journal of Applied Mathematics, Vol. 4. No. 2, pp.236-240, 1951 
[4] G.W.Reitwiesner, “Binary Arithmetic”, Advances in computers, Academic Press, Vol. 1, pp.231-308, 1960
[5] Karthikeyan, E. “Survey of elliptic curve scalar multiplication algorithms.” International Journal of Advanced Networking and Applications, Vol. 4, No 2, pp. 1581-1590, 2012






Wednesday, April 1, 2015

TCC 2015: Separations in Circular Security for Arbitrary Length Key Cycles

Alice and Bob are in a committed relationship, so as the kids are doing these days they decide to share their secret keys as a totally healthy and not at all creepy or misguided expression of trust. Since millennials understand the importance of cybersecurity, they don't want to send these keys in the clear. So Alice encrypts her secret key under Bob's public key, and he encrypts his secret key under her public key. This is called a key cycle, and it's a weird situation not covered by the standard IND-CPA security game.

Is this a problem though? Can you spot a key cycle in an IND-CPA secure scheme? The notion of circular security asks an adversary to distinguish a key cycle from encryptions of zero under each of the keys. Alice and Bob might think they're safe encrypting each other's keys. But at Eurocrypt in 2010 Acar et al. showed a scheme that is IND-CPA secure, but isn't circular secure for a key cycle of length 2. So IND-CPA security doesn't imply 2-circular security, and Alice and Bob are going to need an encryption scheme specifically designed to have circular security.

What about for longer key cycles? Luckily this is a TCC blog post, so we can leave Alice and Bob and their ridiculous motivating scenario behind, and simply ask theoretical questions because we're curious about the answers. Koppula, Ramchen and Waters settled this particular question once and for all by showing that for any n there exists an encryption scheme that's IND-CPA secure but not n-circular secure. Of course the solution involves indistinguishability obfuscation.

They take an IND-CPA secure scheme and modify encryption to include an obfuscation of a program that detects key cycles, giving a scheme that's clearly not circular secure. To prove IND-CPA security still holds they modify the scheme again so that secret keys additionally contain a random value and public keys contain the evaluation of a PRG at that point. This separates keys into real and fake public keys, which are indistinguishable without seeing the corresponding secret keys. The obfuscated cycle finding program can then test for a cycle ending in a real key. Hop from the real scheme to one with fake keys by the PRG security, and from the obfuscated cycle detection program to an obfuscation of a program that always outputs 0 by iO security, and reduce to the IND-CPA security of the underlying scheme. The paper explains this all in detail and would be a nice first proof if you want to get to grips with constructions involving iO.