Thursday, April 21, 2016

Study group: Robustness of the Learning with Errors Assumption

MathJax TeX Test Page

In the paper Robustness of the Learning with Errors Assumption, Goldwasser, Kalai, Peikert and Vaikuntanathan show that the standard LWE assumption implies the hardness of LWE even if the secret is drawn from an arbitrary distribution with sufficient min-entropy (and also given an arbitrary hard-to-invert function of the secret, but we won't focus on this point here since we didn't cover it during the study group). A noticeable application of this result is a symmetric-key encryption scheme which is provable secure in the presence of leakage but for which no maximum amount of leakage is foreseen in the parameters. In fact, leakage-resilient schemes usually guarantee security up to, say, $\lambda$ bits of leakage, hence using constructions to tolerate such a loss at the cost of some overhead. However, think of the extreme scenario where none of those $\lambda$ bits are actually leaked. This results in the overhead being still there, and hence in the scheme being less efficient, without the constructions adding any security, since the unprotected version was secure anyway.

We adopt the following conventions: even though they aren't the most general possible, they make the explanation of the work easier. The Decisional Learning With Errors problem of parameters $n$, $q$ and $\alpha$ ($DLWE_{n,q,\alpha}$) asks to distinguish between samples of the form $<\mathbf{a}_i,\mathbf{s}>+x_i$ from random where $\mathbf{a}_i \leftarrow U(\mathbb{Z}_q^n)$ are known and randomly chosen vectors, $\mathbf{s} \leftarrow U(\mathbb{Z}_q^n)$ is a secret randomly chosen vector and $x_i \leftarrow \psi_\alpha$ is an error vector chosen from the Gaussian distribution over $\mathbb{Z}_q$ of mean 0 and standard deviation $\alpha$. If we collect $m$ of such samples, the above can be rewritten using matrices as: $$ (\mathbf{A},\mathbf{A}\mathbf{s}+\mathbf{x}) \approx (\mathbf{A},\mathbf{u}) $$ where $\mathbf{A} \leftarrow U(\mathbb{Z}_q^{m\times n})$, $\mathbf{x} \leftarrow \psi_\alpha^m$ and $\mathbf{u}$ is random. We use the symbol $\approx$ to mean (statistical) indistinguishability. Finally, we denote by $DLWE_{n,q,\alpha}(\mathcal{D})$ the same problem with the exception that the secret $\mathbf{s}$ is randomly drawn from the distribution $\mathcal{D}$ instead of from the uniform one.

Then, the main theorem, stated using the above formalism, looks like $$ DLWE_{\ell,q,\gamma} \Rightarrow DLWE_{n,q,\beta}(\mathcal{D}) $$ where $\mathcal{D}$ is a distribution over $\{0,1\}^n$ with min-entropy at least $k$, $\ell \leq (k-\omega(\log{n}))/\log{q}$ and $\gamma,\beta > 0$ are such that $\gamma/\beta$ is a negligible function of $n$. The requirement of $\mathcal{D}$ being over $\{0,1\}^n$ instead of over $\mathbb{Z}_q^n$ is made just for simplicity and can be dropped at the cost of a less neat proof. In essence, we want to prove that $$ (\mathbf{A},\mathbf{A}\mathbf{s}+\mathbf{x}) \approx (\mathbf{A},\mathbf{u}) $$ assuming $DLWE_{\ell,q,\gamma}$ and where $\mathbf{A} \leftarrow U(\mathbb{Z}_q^{m\times n})$, $\mathbf{s} \leftarrow \mathcal{D}$ and $\mathbf{x} \leftarrow \psi_\beta^m$.

We just sketch the main idea of each step here. First of all, we draw $\mathbf{B} \leftarrow U(\mathbb{Z}_q^{m\times \ell})$, $\mathbf{C} \leftarrow U(\mathbb{Z}_q^{\ell\times n})$ and $\mathbf{Z} \leftarrow \psi_\gamma^{m\times n}$ and define $\mathbf{A}' = \mathbf{B}\mathbf{C} + \mathbf{Z}$. Each column of $\mathbf{A}'$ is of the form $$ \mathbf{B}\mathbf{c}_i + \mathbf{z}_i $$ which can be thought as a bunch of $DLWE_{\ell,q,\gamma}$ samples. Hence, by that assumption, we can assume $\mathbf{A}' \approx \mathbf{A}$ since $\mathbf{A}$ was an uniformly random matrix.

The proof then reduces to show that $$ (\mathbf{A}',\mathbf{A}'\mathbf{s}+\mathbf{x}) \approx (\mathbf{A}',\mathbf{u}) $$ which can be rewritten as $$ (\mathbf{B}\mathbf{C} + \mathbf{Z},\mathbf{B}\mathbf{C}\mathbf{s}+\mathbf{Z}\mathbf{s}+\mathbf{x}) \approx (\mathbf{B}\mathbf{C} + \mathbf{Z},\mathbf{u}). $$ Instead, the author prove the following, stronger statement $$ (\mathbf{B},\mathbf{C},\mathbf{Z},\mathbf{B}\mathbf{C}\mathbf{s}+\mathbf{Z}\mathbf{s}+\mathbf{x}) \approx (\mathbf{B},\mathbf{C},\mathbf{Z},\mathbf{u}). $$

Next step is getting rid of the term $\mathbf{Z}\mathbf{s}+\mathbf{x}$ by noticing that since $\mathbf{Z}$ was drawn from a Gaussian distribution with standard deviation $\gamma$, $\mathbf{s}$ is a binary vector, $\mathbf{x}$ was drawn from a Gaussian distribution with standard deviation $\beta$ and $\gamma \ll \beta$, then the term $\mathbf{Z}\mathbf{s}$ has a negligible impact on $\mathbf{x}$. This means that there exists $\mathbf{x}' \leftarrow \psi_\beta^m$ such that $$ (\mathbf{Z},\mathbf{s},\mathbf{Z}\mathbf{s}+\mathbf{x}) \approx (\mathbf{Z},\mathbf{s},\mathbf{x}'). $$ The proof is then reduced to show that $$ (\mathbf{B},\mathbf{C},\mathbf{B}\mathbf{C}\mathbf{s}+\mathbf{x}') \approx (\mathbf{B},\mathbf{C},\mathbf{u}). $$

You can probably notice that we are getting closer to a $DLWE$ instance again. Eventually we will be able to use our assumption. Before, we note that, for the leftover hash lemma $$ (\mathbf{C},\mathbf{C}\mathbf{s}) \approx (\mathbf{C},\mathbf{t}) $$ because of the min-entropy of $\mathbf{s}$ and where $\mathbf{t} \leftarrow U(\mathbb{Z}_q^\ell)$ is a random vector. This lets us rewrite the statement we want to prove as $$ (\mathbf{B},\mathbf{B}\mathbf{t}+\mathbf{x}') \approx (\mathbf{B},\mathbf{u}). $$

We have finally obtained a proper $DLWE_{\ell,q,\beta}$ equation whose truth then follows from our $DLWE_{\ell,q,\gamma}$ assumption and by noting that $DLWE_{\ell,q,\gamma} \Rightarrow DLWE_{\ell,q,\beta}$. $\Box$

Thanks to the above result, the author define a symmetric-key encryption scheme whose parameters are just $q = q(n)$, $\beta = q/poly(n)$ and $m = poly(n)$. As you can see, nothing is said about any anticipated amount of leakage: the scheme does not introduce any overhead just to protect against leakage. However, it is proved secure under the $DLWE_{\ell,q,\gamma}$ assumption with respect to the following definition of security.

We say that a symmetric encryption scheme is CPA secure w.r.t. k(n)-weak keys if for any distribution $\{\mathcal{D}_n\}_{n\in \mathbb{N}}$ with min-entropy k(n), the scheme is CPA secure even if the secret key is chosen according to the distribution $\mathcal{D}_n$.

A couple of conclusive remarks.

  • Our discussion was informal and didn't go into too many details on purpose. For instance we didn't mention how this proof affects the parameters involved. For a more insight we refer to the paper itself. Instead, if you prefer a "less math, more ideas" approach, we refer to a post on the ECRYPT-EU blog;
  • I find really interesting that during the study group it was pointed out how saying "no overhead is introduced" is not correct. Some overhead is indeed introduced, but in the initial parameters (to compensate for a possible loss by means of leakage) rather than through specific constructions. This has the advantage that, if no leakage occurs, the scheme is simply more secure because the parameters have been raised.

Friday, April 15, 2016

Study group: Protecting Obfuscation against Algebraic Attacks

The study group this week was on the paper 'Protecting Obfuscation against Algebraic Attacks') by Boaz Barak, Sanjam Garg, Yael Tauman Kalai, Omer Paneth and Amit Sahai.

In this blog post we will consider a slimmed-down version of the accomplishments of the paper for the sake of brevity. Moreover, while the idea presented is complicated, it is not particularly complex, and in such circumstances it is often better to favour concision over precision.

Obfuscation is the idea of mixing up a program so that it becomes unintelligible but functionality is preserved. Essentially, a person's access to a function is replaced with access to an oracle which computes that function.

Common examples of its use are:
  • Crippleware: software can be obfuscated so that it is difficult to reverse engineer and so that premium parts of software can be packaged with non-premium parts by 'crippling' them with obfuscation, so that a product key can later be used to unlock the premium content.
  • Public key encryption from private key encryption: Given a private key encryption scheme, one can obfuscate an encryption circuit with the secret key embedded and release this publicly. This (roughly) gives a public key scheme.

What this paper achieves
In this paper, the authors build on the core of an older and well-known breakthrough obfuscation construction of 2013 by Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai and Brent Waters. In doing so, they present a new obfuscator; while the starting point of the new construction is exactly the same as the old, they depart from it when providing countermeasures against the (speculated) attacks against the scheme.

This construction is in the multi-linear model, which roughly translates to assuming the existence of Graded Encoding Schemes (GESs). (Note that in the literature, the terms 'Multi-linear Maps' and 'Graded Encoding Schemes' are used interchangeably, though there are important differences. In this discussion, we are considering Graded Encoding Schemes.)

GESs provide a way of encoding (NB that it is not encrypting) elements of some plaintext space so that they have an associated level, which is any subset of $\{1,...,k\}$ for some integer $k$. Intuitively, they offer a good way of preventing an adversary from doing 'too much' with the data he is given (in particular, 'too many' multiplications). Importantly, they allow multi-linear operations, and therefore, in particular, matrix multiplication. The properties they have which are pertinent for this post are the following:
  • Encodings may be added only if they are encoded at the same level.
  • Encodings may be multiplied only if the levels at which the elements are encoded are disjoint. When a multiplication is performed, the resulting element is encoded at the union of these disjoint levels.
  • One can check if a top level encoding (i.e. an element encoded at level $\{1,...,k\}$) is an encoding of zero.

Since their inception, GESs have been rife with problems: no current construction is secure for all desired applications. Fortunately, for their use in obfuscation, all known attacks on GESs (at least for the CLT15 GES at the time of writing) require more information than an obfuscation of a function provides. With a relatively modest overhead (compared with the notorious time and space requirements for GESs), a recent paper propounded the use of random matrices to achieve a similar result to the paper we are discussing without the use of a GES.

Starting point
We start with a circuit which we want to obfuscate; i.e., a Boolean function $C:\{0,1\}^n\to\{0,1\}$. From this, we create a matrix branching program: this is a sequence of pairs of matrices where each pair has an associated input bit. (We will not cover here how exactly one does this, but a good starting point for more information would be the paper from 1986 by David Barrington.) We denote this sequence of pairs by $(B_{i,0},B_{i,1})_{\textsf{inp}(i)}$ where $\textsf{inp}(i)$ is a function describing which bit of the input $x$ the $i$th point in the branching program inspects. A single input bit can be inspected multiple times throughout the program. To evaluate the program on an input $x$, for each point in the program $i$, we check the value $b$ of the $\textsf{inp}(i)$th input bit of $x$ and select the matrix $B_{i,b}$. We then multiply all of these matrices in order and look at the result, $\prod_{i=1}^{l} B_{i,x_{\textsf{inp}(i)}}$ where $l$ is the length of the branching program and $x_{\textsf{inp}(i)}$ is the $\textsf{inp}(i)$th bit of input $x$. If the product is the identity matrix, we interpret this as $C(x)=1$, and otherwise $C(x)=0$.

The end goal is to 'mix up' matrices so that, if we give them away, they don't reveal information about the underlying circuit but they still allow evaluations of the original function.

We start by encoding (using the GES) the $i$th pair of matrices under the singleton level $\{i\}$. This is useful because, as previously noted, we can still perform multi-linear operations like matrix multiplication, but not do 'too many' operations with them (for example, we cannot compute powers of matrices). The product received from an evaluation of the branching program will be a top-level encoding, to which we can then apply our zero-testing procedure (which, recall, requires that an element be top-level in order to check whether or not it encodes zero).

Attacks and Preventative Measures
Attacks on an obfuscation scheme are when an adversary seeks to glean some auxiliary information about the structure of the branching program by seeing what happens when computations are done erroneously. Attacks fall into four categories, and we will see how this paper deals with each of them. The first two are covered in the same way as the original construction on which this paper builds.
  1. Non-algebraic attacks: performing non-multi-linear operations on the matrix elements, or failing to respect the algebraic structure of the matrices as matrices over the plaintext ring. These attacks are largely dealt with by encoding with the GES.
  2. Out of order attacks: taking one matrix from each pair from the branching program and finding the product without respecting their branching program order. This is actually very simple to deal with, and involves a trick by Joe Kilian which he described in his paper of 1988. We choose a set of random matrices $\{R_0,\ldots,R_{l-1}\}$ (where $l$ is the length of the branching program), set $R_l=R_0$, and then define: \begin{eqnarray} \widetilde{B_{i,0}}&=R_{i-1}B_{i,0}R_{i}^{-1}\\ \widetilde{B_{i,1}}&=R_{i-1}B_{i,1}R_{i}^{-1} \end{eqnarray} for $i=1,\ldots,l$. This randomised version of the branching program computes the same output as the original, but now products that don't respect the branching program order will be encodings of random elements.
  3. Mixed Input attacks: it was stated that more than one point in the branching program may inspect the same input bit (i.e. $\textsf{inp}$ is not necessarily injective). This attack involves performing an unfaithful evaluation by switching from reading, say, the $j$th bit of the input as 0 at one point in the program and then as 1 at some later point. This attack is dealt with using the innovative idea of straddling set systems. Encoding matrices of a branching program at levels given by sets in these systems, instead of at singleton sets, effectively locks together matrices at different points in the branching program. This is exactly what we need in order to prevent this kind of attack (and also the next).

    We provide an elucidating example rather than the technical definition (details can be found in the paper, linked above): Let $S=C\cup D$ where $C=\{\{1\},\{2,3\},\{4,5\}\}$ and $D=\{\{1,2\},\{3,4\},\{5\}\}$. We call $S$ a straddling set system for the universe $U=\{1,2,3,4,5\}$. Note that $C$ and $D$ are the only exact covers of $U$ in $S$.

    Now, suppose that we had a branching program which considered input bit 1 at positions 1 and 3 in the branching program, and input bit 2 at positions 2 and 4. I.e., we have
    $1$ $2$ $3$ $4$
    $\textsf{inp}(1)=1$ $\textsf{inp}(2)=2$ $\textsf{inp}(3)=1$ $\textsf{inp}(4)=2$
    $B_{1,0}$ $B_{2,0}$ $B_{3,0}$ $B_{4,0}$
    $B_{1,1}$ $B_{2,1}$ $B_{3,1}$ $B_{4,1}$
    We define a straddling set system for each input bit: $S_1$ is
    $C_1=\{\{1\},\{2,3\}\}$ and $D_1=\{\{1,2\},\{3\}\}$
    and $S_2$ is
    $C_2=\{\{4\},\{5,6\}\}$ and $D_2=\{\{4,5\},\{6\}\}$

    Then denoting by $(M,A)$ an encoding of the matrix $M$ at level $A\subset \{1,\ldots,k\}$, we encode as follows:
    $1$ $2$ $3$ $4$
    $\textsf{inp}(1)=1$ $\textsf{inp}(2)=2$ $\textsf{inp}(3)=1$ $\textsf{inp}(4)=2$
    $(B_{1,0},\{1\})$ $(B_{2,0},\{4\})$ $(B_{3,0},\{2,3\})$ $(B_{4,0},\{5,6\})$
    $(B_{1,1},\{1,2\})$ $(B_{2,1},\{4,5\})$ $(B_{3,1},\{3\})$ $(B_{4,1},\{6\})$
    Since $C_1$ and $D_1$ are the only exact covers of $U_1$ and $C_2$ and $D_2$ are the only exact covers of $U_2$, if we want to obtain a top-level product at the end, choosing, for example, $B_{1,0}$ forces us also to choose $B_{3,0}$ (and we cannot choose $B_{3,1}$). This exactly prevents mixed-input attacks.
  4. Partial evaluation attacks: computing subproducts of matrices and comparing them. These attacks are dealt with using more complicated straddling set systems, but the idea of 'locking' matrices together is essentially the same. The branching program is extended so that each point is dependent on two input bits, and several redundant matrices are inserted into the program so that for every pair of input bits, there is a point in the program which inspects both of these bits. The straddling sets can then be constructed so that each input bit affects the matrix the evaluator must choose at every point in the program. Thus it greatly restricts the operations that can be performed, and stops these partial evaluation attacks.
The paper concludes with a proof that this obfuscator achieves what is known as virtual black-box obfuscation in the ideal graded encoding model. The really interesting thing in this author's opinion is the aforementioned translation of the straddling set technique to the obfuscator (for which, admittedly, a security proof has yet to materialise) which doesn't require a GES.

Thursday, March 24, 2016

Study group: The moral character of cryptographic work

In "The Moral Character of Cryptographic Work" [pdf], presented as an invited talk at Asiacrypt 2015 and the subject of this week's study group, Phillip Rogaway offers something of a critical theoretic perspective [1] on cryptography as a field and its impact on the wider world.

His premise is that cryptography rearranges power, for good or for ill, and is therefore inescapably political. As such, researchers have a moral responsibility to identify the direction of influence of our current and potential future work and to make value-aligned decisions accordingly.

He begins with a broad-brush overview of the wider context: the social responsibility of scientists and engineers generally. This was thrown into sharp relief by the second world war and its aftermath. The invention of the atom bomb, as well as the convictions of many Nazi-serving scientists in the Nuremberg Trials (where 'following orders' was not considered an exculpating defense) left little room for the idea that science was somehow neutral. Many in the academic community were at that time actively campaigning for an ethic of responsibility — for example, Bertrand Russell and Albert Einstein, who collected signatures from 11 leading scientists (9 Nobel Prize recipients) on a statement urging world leaders away to seek peaceful resolution in the face of growing nuclear threat (1955) and physicist Joseph Rotblat, who founded the Pugwash Conferences to proactively work towards that same end. Rogaway also cites the growing environmental movement as a catalyst for heightened ethical awareness in the four decades or so following the war — after which he suggests that science started to lose the sense of its responsibility, hinting towards individualism but expanding on 'extreme technological optimism' as causal components.

Part two expands on the political character of cryptographic work in particular. There are two different prevailing cryptographer "archetypes": the spy — explicitly political, often heroic — of fiction and history, and (as we are more inclined to view ourselves) the scientist — theoretical, politically neutral, going about our business quietly. But in its early days as an academic field cryptography was not so detached from socio-political concerns. Diffie and Hellman have long been explicitly engaged, e.g. in their criticism of DES's key length, and work such as Chaum's on private communication embedded a concern for democracy and individual autonomy. But, Rogaway argues, the field has since fragmented: secure messaging and other privacy applications have been sidelined by the IACR community as "security" rather than "cryptography", and have spawned their own far more socio-politically aware community (see, e.g. PETS) whilst cryptography has tended further towards an assumption of neutrality.

At the other extreme, the loosely organised "Cypherpunk" movement has been pressing for the use of cryptography as a tool for protecting individual autonomy against abuses of power since the 1980s. Projects such as Bitcoin, PGP, Tor and WikiLeaks are consciously targeted to challenge authority and nurture basic human freedoms (e.g. of speech, movement, and economic engagement).

Rogaway points out, though, that cryptography doesn't always favour "ordinary people" as the cypherpunks sometimes seem to suppose. Depending on the involved mechanics, it can just as easily be found to entrench power and erode autonomy. IBE, for example, assumes a trusted central authority, and so naturally embeds key escrow. Differential privacy frames the database owner as the honest party and assumes mass data collection to be a public good. But the former is seldom unambiguously the case and the latter is highly debated. And FHE, he argues, is in danger of becoming a utopian distraction from real, more 'practical' issues, and a convenient cover for defense and intelligence agencies.

However, perhaps as big a worry for the academic community (in Rogaway's eyes) is our lack of any sort of impact. Agencies show tellingly little interest in our activity (see, e.g., the dismissive review of Eurocrypt 1992 in an NSA newsletter, which was followed by two decades worth of silence on the subject). Our research divides, he proposes, into three branches: crypto-for-security, impacting the commercial domain, crypto-for-privacy, pertaining to socio-political matters, and crypto-for-crypto, exhibiting no ostensible benefits beyond academic interest -- the argument being that we are operating far too much along this third branch and not enough along the second.

This under-attention to crypto-for-privacy has helped maintain the increasing trend in pervasive surveillance which, in part 3, is highlighted as a significant threat to human autonomy and social progress. Rogaway compares what law enforcement would have us believe about surveillance with alternative perspectives from the surveillance-studies community. Law enforcement frame privacy and security as in conflict: a personal and a collective good respectively, in trade-off with one another and thrown out of balance by modern technology. The current disproportionate strength of privacy (e.g. widespread encryption) gives the edge to the ‘bad guys’ — terrorists, child abusers and drug dealers — and we must sacrifice some privacy to avoid ‘Going Dark’. Conversely, surveillance studies views privacy as a social good, which enhances security as much as it conflicts with it. Mass surveillance, increasingly enabled by modern technology, is an instrument of power which has, historically, been used to protect the status quo and stifle change (see, for example, the FBI’s use of surveillance data to try to induce Martin Luther King Jr. to commit suicide). Today, dissidents are imprisoned, drone targets assassinated, and investigative journalism supressed via the use of phone and internet monitoring. Cryptography is (or can be) a tool to stem surveillance creep and the uniformity, the erosion of autonomy and the social stagnation it engenders.

Part 4 offers some proposals for action. In the first place, cryptographers should gear problem selection more towards crypto-for-privacy. Rogaway gives some examples of his own ongoing projects of this nature, including (game based provably) secure messaging in an untrusted server model, and ‘bigkey’ cryptography where the total key material is too huge to exfiltrate (megabytes to terabytes) and random extractions are made for use in protocols requiring ‘normal’ length keys, in such a way that the derived key is indistinguishable from uniformly random even if the adversary gets the randomness and a large amount of information about the bigkey. Promising projects from other researchers include ‘Riposte’, a protocol for anonymous file sharing in the presence of pervasive monitoring, and scrypt and Argon, hash functions designed to slow password checking and avoid dictionary attacks which consume lots of memory as well as time so that they can’t be accelerated by custom hardware.

More generally, Rogaway suggests, research should be slower paced and value based. Provable security should be more practice-oriented, keeping enthusiasm for definitions and proofs in submission to the end goals of utility, and should be extended to the realm of mixnets and secure messaging. We should be discerning about where our funding comes from, and only work with organisations whose values align with our own (which perhaps sounds "easier said than done" to not-yet-established researchers). We should exercise our academic freedom in speaking our minds; resist dogma and be open to new models and approaches; foster a more expansive systems-level views, and actually learn to use some privacy tools for ourselves; replace cutesy cartoonised adversaries with suitably frightening pervasive threats — both in our imaginations and on our slides; choose our language with a considered awareness of the impression it makes; and boost commonly available resources of information (starting with an overhaul of relevant Wikipedia pages) and tools. Above all, he urges that this is a collective responsibility: we need to change what is systematically valued and prioritised so that the movement of the whole field is towards a considered, socially beneficial exercise of the power that cryptography has whether we like it or not.

[1] Social critical theory being the philosophical project to understand society with a view to "liberating human beings from the circumstances that enslave them." (Horkheimer, Max. 1982. Critical Theory Selected Essays.)

Thursday, March 3, 2016

Cryptographic Reverse Firewalls

Since Edward Snowden informed us our internet traffic is under constant surveillance and that the US government has likely backdoored a standardised algorithm, threat models in provable security have evolved to consider an adversary within the user's machine. One of the first works addressing this was the substitution attack model of Bellare, Paterson and Rogaway that considered an adversary trying to corrupt an implementation to reveal a secret undetected by the user. A potential attack vector identified here was to use the randomness involved in operations to leak information while still appearing to be an honest implementation.

Today's study group continued in this area, taking a look at Mironov and Stephens-Davidowitz's paper on cryptographic reverse firewalls. While a traditional firewall sits between your machine and the network deciding what traffic gets through to you, protecting you from an outside threat, a cryptographic reverse firewall sanitises your incoming and outgoing traffic to prevent you from your own machine's misbehaviour. A reverse firewall does not need to know Alice's secret keys and so can be run by anyone, and in addition can be "stacked" so Alice can hedge her bets with multiple reverse firewalls in place at a time.

The desired properties of a reverse firewall are as follows.

  1. Maintains functionality - If a protocol is implemented correctly then when the firewall is in place it should continue to provide the same functionality.
  2. Preserves security - If a protocol provides a security guarantee then it should continue to do so when the firewall is in place.
  3. Resists exfiltration - The firewall should prevent a tampered implementation leaking information to the outside world. This is modelled strongly by asking that an adversary cannot tell the difference between a tampered and an honest implementation behind the firewall.

The original paper goes on to consider reverse firewalls for multi-party computation, but we looked at the simpler case of signature schemes as studied by Ateniese, Magri and Venturi. For signature schemes maintaining functionality is straightforward to define - honestly generated signatures modified by the firewall should still verify.

Preserving security is slightly more complicated since the natural model of asking the adversary to forge a signature with access to a normal signing oracle and to an oracle that executes tampered signing algorithms of his choice then processes the output through the firewall admits a generic attack. To launch this attack the adversary submits a tampered signing algorithm that iterates over the bits of the secret key, returning an honest signature when the bit is 1 and a string of zeroes when the bit is 0, eventually recovering the secret key. To avoid this attack the firewall is allowed to output a special symbol that renders the oracle useless from that point on. This can be seen as the tampered implementation being detected and shut down.

Resisting exfiltration for signature schemes is impossible if arbitrary tampering is allowed, since for example the tampered algorithm can output a string of zeroes which the firewall has no way of turning into something indistinguishable from a valid signature.

As mentioned before a potential conduit for leaking information is the randomness used in an algorithm. The functionality maintaining and security preserving firewall given for signature schemes meeting a re-randomisability condition prevents such attacks by re-randomising valid signatures and returning the special symbol when given a signature that fails to verify.

Friday, February 12, 2016

Cryptocurrencies and proofs of space

A public ledger, especially one without a trusted party, requires computational techniques in order to ensure that the record is valid and cannot be controlled by malicious parties. We need to ensure that transactions can't be swapped, reversed, cloned, removed, and that new coins are generated on solid grounds.

Transaction consistency
All these features can be stated in terms of logical properties (e.g. consistency) on a chain of transactions. However, several consistent chains may still be inconsistent with each other, so we need in addition to ensure that a majority of parties can agree on a single valid chain.

Spawn it if you can 
If we have a chain recording transactions in a payment system, the problem is that several of the parties that are supposed to make the system work (aka miners) can collude and spawn a chain that records an alternative reality. Moreover, they may be able to convince other parties that this is actually the real chain, forming a new consensus that invalidates previous transactions.

To deal with this problem, Bitcoin makes it computationally hard for miners to add blocks of transactions to the chain, i.e. they have to provide a proof of work. Moreover, the challenge for the proof of work is derived from the block chain, this way gluing the new transactions into the chain. After a few such chain extensions a time T, it is getting harder and harder for miners to spawn a chain that shows alternative transactions at time T or earlier.

From time to space
A proof of work takes time and energy, so, are there any other options available to authenticate the transaction chain? I'll summarize the papers [1] and [2], where they show how to do it with proofs of space, which look like this:
Setting: a prover P has to convince a verifier V that it allocated space of size N
  • P stores a structure S of size N 
  • V obtains from P a space commitment comm(S)
  • V challenges P for a proof that it indeed stored S of size N
  • V can accept or reject
Desired properties
  1. Completeness: V always accepts the proof of a P that is honest, i.e. if it stored the required O(N) space ;  
  2. Soundness: a dishonest P has to spend at least O(N) time during execution in order for its proof to be accepted by V ;  
  3. Efficiency: V has to run in time at most O(log(N)) and polynomial in the security parameter (in both phases), while for P we allow O(N) time during initialization
The proofs of space that aren't 
Consider the following simple protocols for proving space:
  1. V sends a random file of size N to P at initialization and at execution queries for some of its random bits. This protocol does not satisfy efficiency, since the verifier runs in time O(N).
  2. For a difficult to invert function F, P  stores (1, F(1)), ... (N, F(N)) at initialization and V queries for the inverse of a random F(r) at execution. This protocol does not satisfy soundness, since it is known that a prover can answer such a challenge in time O(N^(2/3)) by using only O(N^(2/3)) space [3].    
  3. At initialization, P constructs a Merkle tree ( storing values w_1,...,w_N and V gets the root of this tree. At execution, V requests P to open some random branches of the tree. Is this a proof of space? Probably not. In fact, the main construction of [1] adds an additional constraint on the values w_1,...,w_N that are to be stored in the Merkle tree.    
Crypto graphs: robust expanders, superconcentrators, dense long pathers
The values w_1,...,w_N in the Merkle tree will be labels of nodes in a direct acyclic graph G, computed using the following equation
where h is a hash function, w is the label for the node v, and w_1,...,w_n are the labels for the parents of v. Moreover, G is not just any graph, but a graph with a small number of edges (e.g. in-degree of O(log N) or less) that should be hard to pebble: if the prover P stores only a few labels of the graph, it should be computationally hard for P to get the label of a challenge node.

The hard to pebble requirement for the graph amounts to having enough precedents for a random node in the graph, even if some of its nodes are removed. The main construction of such a graph proposed in [1] is based on a chain of bipartite graphs that are robust expanders [4], superposed with a sparse graph with dense long paths [5], and connected to a superconcentrator [6]. It has in-degree of O( log log N).

On the blockchain, time binds stronger than space
Goin back to our chain of transactions, interesting things are revealed when we replace a proof of work with a proof of space. The blocks in Bitcoin are chained as follows:
[ trans , hash , key , pw ]  ----->  [ trans', hash', key', pw' ]
where hash' = h(trans', hash, key', pw') 
and pw' is such that hash' < L

For a random hash function h, it is difficult to find a value pw' satisfying the constraints above, when L is chosen to be small enough - therefore pw' is a proof of work connecting a block to its predecessor.   

In a proof of space, let us denote by proof( chal, comm(S) ) the proof that P supplies to the verifier after having commited comm(S) in the initialisation phase and being challenged with chal in the execution phase. We could then have the blocks chained as follows:  
[ trans, hash, key, ps ]  ----->  [ trans', hash', key', ps' ]
where hash' = h(trans', hash, key', ps') 
and ps' = [ comm(S') , proof( hash, comm(S') ) ]
and S' is some allocated space for the mined block

For some reason, [2] requires miners to publish the space commitment comm(S') in an earlier transaction that is already on the block chain (they have special transactions for this). More seriously, because proofs of space are computationally easy to produce, we have the problem of grinding, where a miner can try out different values hash' (by trying different sets of transactions trans' as inputs) and choose one that will make it easier to have his blocks accepted on the chain in future - a malicious miner can hijack the chain this way. To address this, [2] proposes to remove the transactions from the input of the hash, and instead add two signatures to the block:  
s_1' = sign(trans', sk(key') )
    s_2' = sign( (s_1,s_2), sk(key'))
one on transactions in the new block, and one on signatures in the previous block.

The challenge of more  
There is also a problem with using the hash of the previous block as a challenge for the proof. This means that, if there are several competing chains, the challenges for the newly added block will be different in each chain. Again, because proofs of space are easy to compute, self-interested miners can estimate the value of adding their blocks to various chains (for different challenges, the value of blocks will be different - according to definitions in [2]), and this leads to problems in reaching consensus on a single chain. The paper proposes several ideas for finding a good single challenge (from further in the past, from an unpredictable beacon, from other miners), but none of them is ideal. Another problem to reaching consensus is that the miners can extend multiple chains, and not, as they're supposed to, concentrate only on the best one. To address this, [2] introduces a new type of transactions that allows a miner to penalize another who worked on multiple chains, and collect some of its coins.

[1] Proofs of space
Stefan Dziembowski and Sebastian Faust and Vladimir Kolmogorov and Krzysztof Pietrzak [CRYPTO 2015]

[2] Spacemint: a cryptocurrency based on proofs of space
Sunoo Park and Krzysztof Pietrzak and Albert Kwon and Joël Alwen and Georg Fuchsbauer and Peter Gaži
[3] A cryptanalytic time-memory trade-off
Martin Hellman
Information Theory, IEEE Transactions on 26.4 (1980): 401-406.

[4] Pseudorandomness
Salil P. Vadhan
Foundations and Trends in Theoretical Computer Science, Vol. 7, pp 1-336

[5] On sparse graphs with dense long paths.
Erdoes, Paul, Ronald L. Graham, and Endre Szemeredi.  
Computers & Mathematics with Applications 1.3 (1975): 365-369.

[6] Graph-theoretic properties in computational complexity. 
Valiant, L. G. (1976). 
Journal of Computer and System Sciences, 13(3), 278-285.

Thursday, January 28, 2016

A Modular Framework for Building Variable-Input-Length Tweakable Ciphers

For this weeks study group I presented a paper by Shrimpton and Terashima from AsiaCrypt 2013[1]: A Modular Framework for Building Variable-Input-Length Tweakable Ciphers. Starting from the bottom, the authors take a modular approach to build an Authenticated Encryption (AE) scheme, starting with some relatively simple primitives and extending their functionality until full AE is supported. So, the paper can be broken up into 4 sections:
  1. Introduce our primitives: Tweakable Block Ciphers (both Beyond Birthday fixed-input-length and Variable-input-length)
  2. Combine these with their new Protected IV (PIV) to form an Arbitrary Input Length Tweakable Block cipher (AIL TBC)
  3. Provide two explicit examples of such constructions
  4. Build a secure AE scheme out of a VIL TBC
From a practical point of view, Part 3 is arguably the most interesting, because it provides explicit examples of secure constructions and (when combined with 4) yields an AE scheme that may be secure beyond the birthday bound (which is the point at which most symmetric security proofs break down).
This blog will mainly focus on Part 2, and readers are encouraged to read the full paper for more details on .


Very briefly, let us sketch a few key notions:
  • A TBC (Tweakable Blockcipher) acts like a family of block ciphers, one for each tweak. If it is secure, then any change in the tweak should make the TBC act completely differently, which we call an STPRP (for strong tweakable pseudo-random permutation).
  • A primitive is VIL (Variable Input Length) secure if it is secure when queried on messages of different lengths
  • A primitive is AIL (Arbitrary Input Length) secure if it is secure when queried with message of any (single) length
  • Authenticated Encryption was discussed in a blog post last year[2] and (roughly) corresponds to secure communication between two people sharing a key.
  • The Birthday Bound on n bits is roughly $q^2/2^n$, and is the point many symmetric security results break down. A scheme that is still secure for $q>2^{n/2}$ is known as Beyond Birthday Bound (BBB) secure.

The Protected IV (PIV) construction

The key aim of the paper was to build a secure VIL TBC (ie an STPRP) from a fixed-width TBC with variable length tweak (F) and a VIL TBC (V). To do so, the authors describe the PIV (for Protected-IV) scheme. A diagram of the construction is given to the right, and we thank the authors for permission to reproduce their graphic. It can be seen as an extension of the SIV scheme[3], except that by re-encrypting the keeping the IV secret ("protecting" it) and letting it carry some information about the plaintext, the authors have managed to remove the ciphertext expansion required for SIV security.
The most interesting thing about the scheme is that V does not have to be secure as a VIL TBC: V only has to be secure if the tweak is never repeated (similar to the idea of a nonce-based authenticated encryption scheme). This makes V much easier to construct with (for example) a slight variant of counter mode sufficing.
The idea behind the proof is relatively intuitive, built around the fact that (because F is secure) the IV is random and doesn't repeat (up to a birthday bound term on |IV|). So, V is always called with a unique tweak, securely encrypting the X_r (or decrypting the Y_r) content, and so the output is nicely random, making the whole scheme a secure STPRP. Thus security of the scheme reduces to the security of F, V and of a birthday attack on the IV.

Instantiations and Building Authenticated Encryption

To close, the paper provided some instantiations, and explains how to extend the Encode-then-Encipher[4] concept and proof to achieve strong Authenticated Encryption from a STPRP. We didn't have time to discuss these elements in detail, but observed that to achieve Beyond Birthday security, the IV had to be twice as wide as the birthday bound we seek to pass.


  1. A Modular Framework for Building Variable-Input-Length Tweakable Ciphers, Shrimpton & Terashima
  2. 52 Things #27: What is AEAD?, from this blog
  3. Deterministic Authenticated-Encryption: A Provable-Security Treatment of the Key-Wrap Problem Rogaway & Shrimpton
  4. Encode-then-encipher encryption: How to exploit nonces or redundancy in plaintexts for efficient cryptography BEllare & Rogaway

Monday, January 18, 2016

Sixth Bar-Ilan University Winter School on Cryptography

This is the first of a two-part blog post has been collaboratively written for the ECRYPT-EU blog by Eduardo (University of Bristol), Marie-Sarah, Matthias and Ralph. Part 2 can be found here.

Earlier this month, from 4-7 January, a few ECRYPT-NET fellows and about a hundred others attended the Bar-Ilan University winter school on cryptography. It took place in Ramat Gan, a suburb of Tel Aviv, at the Kfar Maccabiah hotel and conference centre (named after the Maccabiah, or Jewish Olympics, that take place there every four years). The school was intense, but well organized. It was split into two parts, verifiable computation and special encryption, and so will be our coverage of it.

Part 1: Verifiable Computation

Michael Walfish, Yael Tauman Kalai, and Eran Tromer guided us through methods for verifying the outsourced computation of functions. We particularly appreciated the crystal-clear overview of all sessions, given by Michael Walfish, that emphasized how the content of the talks fit together.

Let's set the scene for verifiable computation: a client (the verifier) wants to outsource the computation of a function f to a server (the prover) who has more computing resources. But how does the verifier know that the value returned by the prover is actually the result of applying the function f to the purported inputs? A malicious or a lazy server could indeed modify the process to gain some advantages as, for instance, reducing the cost of operating a system.

Verifiable computation of f(x) comprises the following two phases:
  1. Program representation (or arithmetization): The verifier (or a party he trusts) expresses the function f as a set of arithmetic constraints over a field, in terms of the input x, output y, and intermediate variables z. Each of these xy, and z may be vectors, e.g., x=(x1,x2,x3). Typically, the format these constraints needs to have is that of degree-2 polynomials that equal 0 when they (and the function f) are satisfied.
  2. Solving and proving: The server must prove to the client that the solution it returns, y, is correct. The landscape of proof protocols shows a trade-off between efficiency, expressiveness and additional properties like zero-knowledge or non-interactivity. The speaker himself recently wrote a survey which is a very nice introduction to the state of the art and these trade-offs.

Yael Kalai's talks took a more theoretical approach, guiding us through the evolution of Probabillistically Checkable Proofs (PCPs). She emphasized the importance of "good" security assumptions, where "good" requires at least, and according to her point of view, that the underlying assumptions can be efficiently falsified. These theoretical worries were well founded, as most of today's verifiable computation protocols rely on SNARKs (standing for Succint Non-interactive ARguments of Knowledge) which cannot be proved secure via black-box reductions from (efficiently) falsifiable assumptions. 

Yael's talks provided also very interesting and intuitive examples. To give one, suppose that Peggy and Victor are playing chess. After a number of moves, Peggy (the prover) wants to prove to Victor (the verifier) that she has a checkmate. If Victor fails to see it, it is for Peggy easier to convince him by continuing the game (an interactive proof) until he does, rather than to explain all the possible combinations of moves without moving any piece (a non-interactive proof). This intuition of the power of interaction extends to the rest of the proof systems. Finally, she even showed us how the subject fits in the quantum framework, introducing us to the notion of non-signalling adversaries.

Eran Tromer recovered the line of Michael Walfish and focused on the details of SNARKs and how they are actually constructed. Among others, he has been writing libsnark, a C++ library that is used a lot for verifiable computation systems relying on SNARKs. He also showed us a potential application for them, called Zerocash. Zerocash is a protocol that provides a privacy-preserving version of Bitcoin. In contrast to Bitcoin, where all the transactions are public in the block chain, Zerocash does not contain information about the payment’s origin, destination or amount. The correctness of the transaction is guaranteed via a zero-knowledge proof. More details can be found here.

Part 1.5: Excursion to Caesarea and Binyamina Winery

Sun starting to go down in Caesarea.

Tuesday afternoon, we made an excursion to the remains of Caesarea, a Roman city on the Mediterranean coast that was built by Herod over 2000 years ago. To say that it had a tumultuous history would be an understatement. Our tour included a walk through a "graveyard" of columns and capitals, the amphitheatre (whose first row is still intact), and the hippodrome.

Next, we took an informative tour of the Binyamina winery. We learned that grapes are crushed with a flexible rubber material to simulate the skin of feet. For red wine, the grapes are fermented (skin, seeds, and all) before being crushed. For white wine, seeds and skin are removed (by sedimentation) after pressing, then fermented. The tannin (bitter-tasting substances) in wine comes from the seeds, skin, and maybe the material of the barrel in which it is matured. Whether wine is aged or matured in an American oak (sweeter) or French oak (more tannins, adds more complex flavours) barrel affects the final product. Tannins prevent oxidation, so red wine (with more tannins) can be matured longer. Stopping fermentation early makes wine more sweet.

After learning how wine is made, we concluded the day by learning how it tastes (using our five senses!) and enjoying a generous dinner.

The second and last part of the post can be found in the ECRYPT-EU blog.