Bristol Cryptography Blog: November 2014

Tuesday, November 25, 2014

Post-Quantum Signatures

This week's Study Group was led by Peter Scholl who spoke about hash-based signatures and in particular a recent paper by Bernstein, Hopwood, Hulsing, Lange, Niederhagen, Papchristodoulou, Schwabe and Wilcox O'Hearn, called SPHINCS: practical stateless hash-based signatures [1].

Most real-world signature schemes like RSA and DSA can be broken by a quantum computer, due to Shor's algorithm [2]. One could use lattice-based signatures but their security is not well understood: the authors of [1] note that "their quantitative security levels are highly unclear" even in the pre-quantum setting.

On the other hand, hash-based signature schemes using 'quantum-resistant' one-way hash functions are secure in the quantum setting [3], although the symmetric security parameter needs to be doubled for quantum resistance due to Grover's algorithm [4]. The trouble is that such schemes tend to be inefficient or stateful, where the latter means one cannot have a signing key shared across multiple devices and the scheme may be vulnerable to 'restart attacks' where the scheme is forced to re-use a secret key, compromising security. The paper of Peter's talk seeks to address this by constructing a (relatively) efficient, stateless, hash-based signature scheme called SPHINCS. The construction is proved secure “based on weak standard-model assumptions, avoiding collision resistance and the random-oracle model”.

One Time Signature Schemes

First, we revisited the Lamport One Time Signature (OTS) scheme. (This isn't the OTS used in the paper but it will serve as a reasonable approximation.) Here your secret key is a sequence of pairs of bitstrings $(x_0, y_0), ..., (x_n, y_n)$ and the public key is the sequence of pairs of hashes $(H(x_0), H(y_0)), ..., (H(x_n), H(y_n))$. To sign a message $m$ one takes each bit $m_i$ of $m$ and selects $x_i$ or $y_i$ according to whether $m_i$ is 0 or 1. To verify a signature one hashes each $x_i$ or $y_i$ in the signature and checks the output against the public key. This is called “One Time” since the secret key can only sign one message before security is seriously compromised: your signature reveals half of the elements of the secret key.

From 'One Time' to 'Many Times'

A natural way to build “many time” signature schemes is to iterate OTS schemes. The trouble is that doing this in a naive way means the verification algorithm needs a huge number of public key components, enough for all the bits of all the messages you ever want to sign! Instead, we compress each OTS public key using a “Merkle tree” where the (original) OTS public keys are the leaf nodes and parents are constructed by hashing the concatenation of the children (together with a bitmask for added security). We want the (new) public key to be the root of this tree and so our signature must supply enough information (in as brief a way as possible) for the verifier to recover the root. This is how we do it:

Given the path from a leaf to the root, let the sequence of siblings of each node on the path be called the authentication path of the leaf.
In a signature, supply the index of the leaf node used and its authentication path.
With the leaf and its sibling, the verifier can construct the parent.
Then with the sibling of the parent, the verifier can construct the grandparent.
The verifier continues like this, using the siblings given in the authentication path to recover the root node i.e. the public key.

So we now have a scheme with a practical-sized public key. We can also shrink the secret key by just storing a seed to a Pseudo-Random Generator which will then output the many OTS secret keys which, when hashed, give the leaves of the Merkle tree. However, we haven't “eliminated the state”, to use the authors' phrase: one must store a counter recording which OTS secret keys have been used so far.

"Eliminating the State"

Goldreich's [5] answer to this problem was to deterministically choose which OTS key to use next, rather than just use them in order, i.e. some hash of the message determines the index of the OTS secret key to use in signing. Of course, now one needs a much bigger tree in order to avoid accidentally using a “one time” key more than once. In fact, for security, there needs to be exponentially many OTS key pairs, so we can't just have one leaf of the tree for each OTS secret key or the tree would be far too big for efficient signing. Instead, we associate every node with an OTS key pair and at each step from a leaf to the root, sign the hash of the public keys of the child nodes with the secret key of the parent node. The new (longer) signature contains the index of the leaf node used and the OTS signatures of all the nodes on the path to the root. The trouble with this scheme is that the length of the signature is cubic in the security parameter. This is where the authors of [1] come in.

From 'One Time' to 'A Few Times'

The main idea of the paper is to use a 'few times signature' scheme (instead of an OTS) at the bottom of the tree to reduce the number of leaves needed for security and hence the overall height of the tree, thus shortening the signature. Their choice of scheme for the bottom of the tree is called HORS: Hash to Obtain Random Subset. In HORS, the secret key is the tuple $(s_1, ..., s_t)$, the public key is the hashes $(H(s_1), ..., H(s_t))$ and a message $m$ determines a (“random”) subset $S$ of $\lbrace1, 2, ..., t\rbrace$ with fixed size $k$ (much smaller than $t$). Then the signature for $m$ is the set of secret key components corresponding to $S$, i.e. $\lbrace s_i | i \in S\rbrace$. Now we can use the secret key 'a few times' before security is compromised as only a small number of the components $s_i$ of the secret key are revealed in each signature. But again we have the problem that the public key needs to be very large and so a Merkle tree is once again employed: the new public key becomes the root node, recoverable from the index of a leaf node and the corresponding authentication path. Notice that this means the bottom of the tree in SPHINCS, the construction proposed in [1], is itself a tree. So SPHINCS consists of a hyper-tree (a tree of trees).

The SPHINCS Tree of Trees

The SPHINCS hyper-tree has total height $h$ consisting of $d$ layers, each of height $h/d$. The index from the hash of the message determines a HORS tree on the bottom layer of the hyper-tree and a leaf of this HORS tree from which we compute the HORS signature of the message. Then this signature is signed according to the OTS scheme on the next layer (where the initial index in some way determines the tree and the leaf to use). We repeat this OTS signing on each layer and finally output the SPHINCS signature consisting of the index, the HORS signature (which contains all the information needed to recover the HORS public key) and each OTS signature and each authentication path needed to recover the root at each layer.

Real World Considerations (in the Quantum Computing World!)

After proving its security, the authors demonstrate the practicality of SPHINCS with certain choices of parameters: the hyper-tree has total height 60 consisting of 12 layers (each of height 5), the number of HORS secret key elements is $2^{16}$ and 32 of these are revealed in each HORS signature (i.e. $t=2^{16}$, $k = 32$). There is also a parameter that affects the OTS scheme but we haven't detailed it here. With these choices, the signatures have size 41KB, the keys have size around 1KB and one can sign hundreds of messages per second on a modern quad-core computer.

By most accounts, quantum computers are something of a pipe-dream at the moment. Nevertheless, it's reassuring to know that security is still achievable - and indeed practical - against adversaries who can exploit the enormous power of quantum computers, whenever that day comes.

[1] - http://eprint.iacr.org/2014/795.pdf

[2] - http://en.wikipedia.org/wiki/Shor%27s_algorithm

[3] - http://eprint.iacr.org/2014/709.pdf
[4] - http://en.wikipedia.org/wiki/Grover%27s_algorithm

[5] - Foundations of Cryptography: Volume 2, Basic Applications (2004)

Monday, November 24, 2014

52 Things: Number 7: How does randomness help in computation, and what is the class BPP?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog post concentrates on the Complexity Class BPP.

So far, during this blog series Ryan has introduced us to complexity classes, and in particular to P:

P is the class of languages decidable in polynomial time by a deterministic Turing machine.

Then, Guy introduced us to complexity class NP:

NP is the class of languages decidable in polynomial time by a nondeterministic Turing machine.

This week I am going to introduce the complexity class BPP:

BPP is the class of languages that are recognised by a probabilistic polynomial time Turing machine with an error probability of $\frac{1}{3}$.

Probabilistic Turing Machine

A probabilistic Turing Machine[1] is a type of nondeterministic Turing Machine which randomly chooses between the available transitions at each point according to a probability distribution. What this means is that a probabilistic Turing machine can have stochastic results. On the same input, it could have different run times, accept it on one occasion and reject it on another. It could also never stop. This Turing Machine gives rise to other complexity classes such as RP, ZPP and, the one we're discussing in this post, BPP.

A little about the complexity class BPP

So as we have seen from the definition, the class BPP (Bounded-Error probabilistic polynomial time) contains the decision problems that are solvable in polynomial time by a probabilistic Turing machine with error probability $\frac{1}{3}$. Note that this error probability can be chosen to be any value strictly between $0$ and $\frac{1}{2}$ due to a result named the amplification lemma which I will not discuss further here. The class BPP contains P, the class of problems solvable in polynomial time by a deterministic Turing Machine, this is due to the fact that a deterministic Turing Machine is a special case of the probabilistic Turing Machine (taking the only possible path with probability 1). As talked about in Guy's post, there is an open (Millennium) problem conjecturing as to whether P = NP. There is a similar question with BPP, being P = BPP? The number of problems known to be in BPP but not known to be in P is decreasing.

An example of a BPP Problem

One of the most famous problems known to be in BPP but not known to be in P was determining whether a number was prime. However, recently (2002) a deterministic polynomial time algorithm was found[2] for this problem meaning that it is indeed in P. Another problem known to be in BPP and currently still not known to be in P is polynomial identity testing[3], the problem of determining whether a polynomial is identically equal to the zero polynomial.

There are still many very important unanswered questions on the topic of Complexity Classes. Some of which, if answered, could have a large impact on shaping the future of Cryptography and Computer Science.

[1] - http://en.wikipedia.org/wiki/Probabilistic_Turing_machine

[2] - http://en.wikipedia.org/wiki/AKS_primality_test

[3] - http://en.wikipedia.org/wiki/Schwartz%E2%80%93Zippel_lemma

Tuesday, November 18, 2014

Study group: Network Security Risk Assessment Using Bayesian Belief Networks.

This weeks study group was given by Shan on assessing network security risk using Bayesian Belief Networks, following the paper of Kondakci titled "Network Security Risk Assessment Using Bayesian Belief Networks". The model developed can be applied to a variety of security evaluation tasks, risk assessment and other decision making systems. The most important thing to understand on this topic is exactly how Bayesian Belief Networks (BBN's)[1] work and how they can be used to calculate security risks caused by various threat sources. To achieve this understanding, we need to talk a little about conditional probability[2]. We know that given an event $A$, the probability of an event $B$ occurring is
$$\mathrm{P}(B|A) = \frac{\mathrm{P}(B \cap A)}{\mathrm{P}(A)} $$
We can use this to consider an attack on two systems $A$ and $B$. Let us assume that the probability of an attack $\mathrm{P}(N)$ on either system is 0.1 (meaning the probability of no attack $\mathrm{P}(N')$ is 0.9). Let us also assume that the probability of system $A$ failing if there is an attack is 0.8 and if there isn't an attack 0.1. If there is an attack on system $B$ it fails with probability 0.6 and if no attack 0.5.
From these values we can very easily calculate the chances of the various systems failing. The probability of A failing $\mathrm{P}(A)$ is calculated as follows
$$\mathrm{P}(A) = \mathrm{P}(A|N)\mathrm{P}(N) + \mathrm{P}(A|N')\mathrm{P}(N') = 0.1$$
Similarly we can show the probability of B failing as 0.53. Using Bayes Theorem[3] the probability that an attack has occurred given that system A has failed can be calculated as
$$\mathrm{P}(N|A) = \frac{ \mathrm{P}(A|N)\mathrm{P}(N) }{ \mathrm{P}(A)} = \frac{(0.8 * 0.1)}{0.17} = 0.47 $$
As you would expect, the observation that System A has failed has increased the probability that an attack has occurred. Let $M = N|A$ be the event of there being an attack given that $A$ has failed and $M' = N'|A$ the event of there not being an attack given that $A$ has failed. Using the law of total probability[4], the probability that system B has failed can be calculated as
$$\mathrm{P}(B) = \mathrm{P}(B|M)\mathrm{P}(M) + \mathrm{P}(B|M')\mathrm{P}(M') = 0.55 $$
Again, observing that System $A$ has failed increases the probability that System $B$ has also failed.

This simple example shows how Bayesian Belief Networks can be used to get a better understanding of the state of systems. As a real world example, consider mobile phone applications. If we know that an application has failed on a certain phone then we can calculate the failure risk on other phones. Obviously these models can easily increase in size and complexity, the hardest thing about this model is building the conditional probability table associated with the various systems involved, in our case, finding the values of $ \mathrm{P}(A|N)$ etc. These are built using historical data. Systems can be threatened by various types of threats, the first step is to create a directed acyclic graph (DAG)[5] showing all the relations involved in the system. This can then be converted into a BBN. This way we can combine certain threats into a joint impact parameter, for example we could have external threats as well as internal threats and within that we can have different types of internal threats for the system. The model presented is a clever, generic way of computing the risk of the various classifications of threats of IT systems and computing environments using conditional probability through Bayesian Belief Networks.

[1] - http://en.wikipedia.org/wiki/Bayesian_network
[2] - http://en.wikipedia.org/wiki/Conditional_probability
[3] - http://en.wikipedia.org/wiki/Bayes'_theorem
[4] - http://en.wikipedia.org/wiki/Law_of_total_probability
[5] - http://en.wikipedia.org/wiki/Directed_acyclic_graph

Friday, November 14, 2014

52 Things: Number 6: How can we interpret NP as the set of theorems whose proofs can be checked in polynomial time?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the Complexity Theory section with an alternative definition of NP...

This question is very much a follow up question to the previous one. Last week Guy answered the question of ``What is meant by the complexity class $\mathbf{NP}$?'', while this week I will be answering the related question of ``How can we interpret $\mathbf{NP}$ as the set of theorems whose proofs can be checked in polynomial time?''.

Now to me this is a more intuitive definition of what it means for a problem to be in $\mathbf{NP}$. Not only is it a more intuitive definition but it should (hopefully) also be clearer as to why this is a complexity class that is useful both for cryptography and the wider world. Now before we go into what we can use the class of problems for, the definition is as follows:

$\mathbf{NP}$ is the class of languages that have polynomial time verifiers.

OK but what does this actually mean? Basically if we have an element $x$ and we want to know if $x\in L$ (where $L$ is some $\mathbf{NP}$ language) we have a prover $P$ which given $x$ outputs a witness $w$, this may take exponential time to find $w$ given $x$. Then if we give $x$ and $w$ to our verifier $V$, $V$ can tell if $x\in L$ in polynomial time.

This definition seems very different from the one given last week but they are in fact equivalent. Informally they are equivalent because the witness $w$ can just be the sequence of decisions the NDT made at each branching point to show that $x\in L$. For a (slightly) more formal proof of their equalence [1] is a reasonable online source (with references to textbooks).

So why might this be useful in cryptography? Well essentially we have a class of languages which can take exponential time to check if you do not know a witness but with a witness it can be done in polynomial time. This has the ``feel'' of a lot of cryptographic algorithms - take Encryption (formalisation to follow in future weeks' blogs) for example; you want it to be exponentially hard to get the message from ciphertext without the key (similar to a witness) but with the key you want it to be efficient (polynomial time) to extract the message.

A warning: While it sounds like a good move to use an $\mathbf{NP}$ problem for cryptography it may not be as simple as it first appears. This is because languages are in $\mathbf{NP}$ based on the worse case complexity where as for encryption we need it to be hard on average. For example we may have an $\mathbf{NP}$ language which has one element that takes exponential time to solve but all others are really efficient to solve - this will not make a good basis for an encryption scheme because we want encryption to be secure for all messages not just one.

Now I am aware that integer factorization isn't known to be $\mathbf{NP}$-complete and isn't known to be in $\mathbf{P}$ (See Ryan's blog here for a description) but it makes for a nice example of the point I am trying to make about choosing your $\mathbf{NP}$ instances carefully. In general finding a factor of a number is easy (half of them are divisible by two!) but if we choose something sensible we can make it hard to factor. Let us focus on numbers of the form $N=p\cdot q$ for $p,q$ prime (a.k.a numbers with only two proper factors). Now if either of these numbers is small then it is again easy, so we want the numbers to be of equal size (this is what we do for RSA). From this it is possible to build an encryption scheme over the top.

[1] - http://en.wikipedia.org/wiki/NP_(complexity)#Equivalence_of_definitions

Wednesday, November 5, 2014

52 Things: Number 5: What is meant by the complexity class NP?

Last week, Ryan introduced us to complexity classes, and in particular to P:

P is the class of languages decidable in polynomial time by a deterministic Turing machine.

This week, we introduce another complexity class:

NP is the class of languages decidable in polynomial time by a nondeterministic Turing machine.

What is a Nondeterministic Turing Machine (NDTM)?

Well, an NDTM is a Turing Machine for which the transition function can have multiple values for the same tape value/state pair (meaning it is not technically a function any more, so the correct thing would be to call it a transition relation). Thus evaluation of an NDTM on any particular input can be thought of as a tree, branching at each point the transition function provides more than one possible value. Now, an NDTM evaluates all of these possible branches at once, and accepts if at least one of these computations halts in the accept state. This definition generalizes from language membership to decision to computational problems in the same way as P did in last weeks blog.

Some Examples of NP problems

We begin with an easy example: path finding. Given a directed graph (on n vertices) is there a path from vertex A to vertex B. How do we know this is in NP? Well, there is an NDTM that solves it, which basically works by simply trying every route by branching each time there is a junction. If one of these branches (ie one of the trial routes) reaches B, then that branch terminates in the accept state. Any branch that does not reach B within n steps (ie after traversing n edges) terminates in the reject state. Since any path will involve at most n-1 edges, any valid route will be detected, and so this machine correctly decides whether the path exists.

One of the key examples of an NP problem is the satisfiability problem:

SAT: Given an expression in n boolean variables, is there an assignment of the variables such that the expression evaluates to True?

For example, the expression $(A \land B) \lor (A \land \lnot B)$ is satisfiable, with one valid assignment being A=B=True. Note that in it's standard form, SAT is a decision problem: it suffices to decide whether such an assignment exists, we don't have to find it.

Ok, so there are problems in NP. What is interesting about it?

Firstly, we see that P⊆NP since a DTM is an NDTM that just happens not to ever branch (in fact, our first example can actually be solved by a DTM and so is within P). So, the real question is what sort of things can we do in NP that we couldn't do in P? Well, this is the root of the P$\overset{?}{=}$NP millenium problem, which is a major open problem. There are certainly problems that we have found that are known to be in NP that we do not know to be in P, but perhaps future research will show that these are also in P.

Lots of interesting cryptographic systems (particularly in the public key setting) are secure based on the assumption that a computational problem is "hard", which generally means "at least as hard as any problem in NP". That is, many schemes are based on problems which we think are difficult to solve, and that if you can create an algorithm that solves them you could also use this algorithm to solve lots of other problems that we currently believe to be difficult.

The Cook-Levin theorem provides an interesting result in this direction: No problem in NP is any harder than SAT. What his means is that if I had an oracle (basically an algorithm that I can see the input/output of but none of the workings) that solved SAT, by asking it to solve cleverly constructed queries I could use it to solve any other NP problem. This made SAT the first example of an NP-complete problem. So, to show that a problem X is at least as hard as solving an NP problem (it is NP-hard), it suffices to show that if I could solve X, I could use it to solve SAT.