Bristol Cryptography Blog: 2014

Wednesday, December 24, 2014

52 Things: Number 12: What is the elliptic curve group law?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the Mathematical Background section by introducing the Elliptic Curve Group Law...

The Elliptic Curve group law is a method by which a binary operation is defined on the set of rational points of an elliptic curve to form a group. Now, lets go through what that actually means, and what it's used for. Thanks to Dr Dan Page for providing the group law diagram.

An Elliptic Curve and its rational points

An Elliptic Curve is a cubic equation in two variables over some mathematical field. They can be written in various forms, but over most fields¹ can be written in short Weierstrass form:

$$E : y^2 = x^3 + ax + b $$

For now we will assume that we are working in the field of real numbers, and ignore any complications that come from using finite fields. With some simple requirements on $a,b$ (specifically, that $27b^2\neq -4a^3$) this is an elliptic curve.

The set of points that will be elements of our group are going to be the rational points of the elliptic curve. This is simply the collection of points $(x,y)$ that satisfy the curve equation where both $x,y$ are rational. So, that's the set of $(x,y)\in \mathbb{Q}$ where $y^2=x^3+x+b$. For reasons that will become clear, we also include a point at infinity².

Adding a Group Law to an elliptic curve

The simplest way to describe the relation we're going to add to the set of rational points is with a diagram:

Elliptic Curve (blue) with two points (P,Q) and their sum (P+Q) plotted, along with construction lines (red)

So, to add together $P$ and $Q$, we draw a line through $P$ and $Q$, and make $T=(T_x,T_y)$ the third point this line intersects the curve. Then, $P+Q=(T_x,-T_y)$. To add a point to itself, we take the tangent at that point. Now, the surprising fact is that this operation defines a group, with the point at infinity the neutral element.

Most of the requirements of being a group are easy to see geometrically³. For example, it is easy to find the inverse of an element. In the diagram above $(P+Q)+T=0$, because the line from $T$ to $(P+Q)$ has it's third intersection at infinity, and so $(P+Q)=-T$. In fact, for any elliptic curve in short Weierstrass form, to negate a point we simply change the sign of it's y-coordinate.

Is that all there is to it?

Pretty much yes. The same method holds to over finite fields, although in this case it tends to be simpler to think of the group's operation as being an algebraic construct rather than geometrical, since Elliptic Curves over finite fields do not have such an intuitive structure. Also, we don't need to view curves in short Weierstrass form, since there are many different coordinate schemes and equations that represent the same curve. Indeed, some choices of curve and coordinate system assist us in doing certain types of computation.

What's that got to do with Cryptography?

It turns out that over certain finite fields the Elliptic Curve Group has several nice properties for cryptographers. There are a surprisingly large number of curve and field pairs where it's not too costly to do group computations⁴, but for which the various discrete log or DH problems (see last week's blog) are hard. Moreover, compared to using large multiplicative groups (eg RSA groups) the variables computed with are much smaller. Putting all these together, elliptic curves allow cryptographers to efficiently calculate ciphertexts that are much smaller than those created by alternative means without reducing security.

Specifically, fields of characteristic not equal to 2,3. That is, fields where $2\neq0$ and $3\neq 0$. Unfortunately, this obviously means that the results we discuss won't hold in binary fields, but that is rather beyond the scope of this talk.
Justification for this comes from considering the elliptic curve as a curve in projective space, but for now it suffices that such a point exists.
Associativity is by far the most complicated to show. This diagram on wikipedia explains the concept behind the proof, although the details are rather involved.
Even as I write this, I'm sure someone will question the validity of this claim, but it is true that compared to many groups that one could construct in which the required problems are sufficiently hard, point arithmetic on an elliptic curve is comparatively tractable.

Friday, December 19, 2014

Study Group: Efficient Smart Phone Forensics Based on Relevance Feedback

This week's study group was given by Panos on the subject of Efficient Smart Phone Forensics Based on Relevance Feedback[1].

The Digital Forensics Procedure has 4 major steps: Device Secure, Data Extraction, Data Analysis and Data Representation. To improve the efficiency of data analysis, a sub procedure known as forensic triage is involved to mark out the most relative information among in the phone. In this paper, a ranking system called LIFTR was presented to prioritize the information recovered from Android phones where the amounts of data are huge comparing to feature phones.

LIFTR is designed under following assumptions:

Physical image of phone is acquired.
The data is not encrypted. They are filtered binary files (which are the output of previous procedure).
The format of data are list of strings, e.g. the result of DEC0DE or strings command.

LIFTR then takes the information parsed by recovery engine as the input alongside with optional suspect information, and then outputs a ranked list of the most important NAND pages.

There are two major steps in the design of LIFTR:

Initial Ranking. The input of LIFTR are unranked pages. The initial ranking will mark the pages based on a relevance metric. After this procedure, the pages will be ranked for the next step.
Relevance Feedback. After the initial ranking, LIFTR will present some possibly relevant fields of pages to the investigator and ask for the labelling, which is usually manually, of true and false positive. LIFTR then resorts the ranking by the labelling information assigning relevant pages with higher ranks and irrelevant pages lower. This process can be repeated multiple time to reach a better performance.

The experimental result shows that LIFTR can efficiently mark out the relevant pages. However, it does not guarantee the target information will be included in its result; therefore all data should still be analyzed during an investigation. However, LIFTR is still an effective tool to improve the efficiency of the investigation.

[1]Saksham Varma, Robert J. Walls, Brian Lynn, and Brian Neil Levine. 2014. Efficient Smart Phone Forensics Based on Relevance Feedback. InProceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices (SPSM '14). ACM, New York, NY, USA, 81-91. DOI=10.1145/2666620.2666628 http://doi.acm.org/10.1145/2666620.2666628

52 Things: Number 11: What are the DLP, CDH and DDH problems?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog is the second in the section on Mathematical Background and concerns how group operations can be used to design cryptographic primitives.

As you probably know by now, cryptography often relies on 'hard problems'. That is, we design cryptographic protocols whose security can be proven if we assume that the adversary is unable to solve a certain (mathematical) problem in a reasonable amount of time. This blog post introduces three such problems which are widely used in security proofs. Luckily for me, a) this is really just Group Theory, not Computer Science and b) just two days before writing this I attended a very decent guest lecture by fellow Bristol Crypto researcher Susan Thomson on this exact topic. (That is to say, any inaccuracies in the following can and should be blamed on her!)

The Discrete Logarithm Problem (DLP)

Okay, let $G$ be an abelian group. First we write the operation in $G$ multiplicatively. For any $g \in G$ and any integer $a>1$ let $g^a$ denote $g * g * ... * g$ with $a$ occurrences of $g$. The discrete logarithm problem (DLP) is:

Given $G$, $g$ and $h=g^a$, find $a$.

Here $a$ is called the discrete logarithm of $h$ with base $g$, hence the name of the problem.

Is the discrete logarithm problem hard? Sometimes, maybe. As a counter-example, let $G$ be the integers under addition. So now it makes sense to write the group operation additively, not multiplicatively. So the same process of repeating the group operation with the same element $g$ is now written $g + g + ... + g$ with, say, $a$ summands and, since we're working in the integers, this sum, call it $h$, is just equal to the integer $ag$. Therefore $a$, the discrete logarithm of $h$ to the base $g$, can be found by just dividing $h$ by $g$. For example, if I say to you, "find the discrete logarithm to the base 3 of 18 in the integers under addition", you'd just write $3 + 3 + ... + 3 = 18$ with $a$ summands on the left, simplify this to $3a = 18$ and find that $a = 6$. I could change the group to integers modulo $N$ for some integer $N>1$ (still under addition) but the problem wouldn't be much harder: one has to solve an equation like $ag \equiv h \: (\mathrm{mod} \: N)$ which is solved by performing the Extended Euclidean Algorithm to find $g^{-1}\: (\mathrm{mod} \: N)$ (if it exists), multiplying this by $h$ and reducing modulo $N$ to obtain $a$. All of this is can be done in polynomial time - no good for building secure cryptographic primitives.

On the other hand, the DLP in a finite field of prime order viewed as a group under multiplication (after throwing out the zero element) or in elliptic curve groups (more on those next week) is believed to be hard. That is, we do not yet know of any polynomial time algorithms for finding discrete logarithms in these groups. As a concrete example, suppose I ask you, "find the discrete logarithm to the base 3 of 5 in the multiplicative group of the integers modulo 7". This means find an integer $a$ such that $3^a \equiv 5\: (\mathrm{mod} \: 7)$. Now that we are in the multiplicative group, not the additive one and so we really do have to 'take logarithms' somehow, not just multiply by the inverse of 3. In this case, since 7 is fairly small, we can find the answer by just trying all the possibilities one at a time until we find a solution:

$3^1 = 3 \not\equiv 5\: (\mathrm{mod} \: 7)$
$3^2 = 9 \equiv 2 \not\equiv 5 \: (\mathrm{mod} \: 7)$
$3^3 = (3^2)\times3 \equiv 2\times3 = 6 \not\equiv 5\: (\mathrm{mod} \: 7)$
$3^4 = (3^3)\times3 \equiv 6\times3 = 18 \equiv 4 \not\equiv 5\: (\mathrm{mod} \: 7)$
$3^5 = (3^4)\times3 \equiv 4\times 3 = 12 \equiv 5\: (\mathrm{mod} \: 7)$

so $a = 5$. The way we 'bounce around' the integers modulo 7 by repeatedly multiplying by the base 3 (getting 3, 2, 6, 4 and then 5) should give you some intuition as to why DLP seems hard in this setting. If our prime was much larger than 7, say thousands of bits long in binary, even a computer would take a very long time to solve DLP this way (though there are better, sub-exponential time algorithms and it is not proven that no polynomial time algorithm exists for solving DLP in this kind of group).

The Computational Diffie-Hellman Problem (CDH)
A problem related to DLP is named after Whit Diffie and Martin Hellman who devised a way of two parties agreeing on a secret key over a public channel without revealing it:

Alice and Bob publicly agree on a cyclic group $G$ and generator $g$.
Alice chooses a random secret integer $a$ and Bob chooses a random secret integer $b$.
Alice computes $g^a$ and publicly sends this to Bob. Bob computes $g^b$ and publicly sends this to Alice.
Alice and Bob both compute $g^{ab}=(g^a)^b=(g^b)^a$ by raising what they received from the other party to power of their own secret integer.

Now $g^{ab}$ is a secret key that can be used for symmetric encryption and decryption by Alice and Bob. But someone listening in to the exchange has in their possession $G$, $g$, $g^a$ and $g^b$. So secrecy of the key $g^{ab}$ depends on this problem, called the Computational Diffie-Hellman Problem (CDH):

Given $G$, $g$, $g^a$ and $g^b$, find $g^{ab}$.

CDH is clearly related to DLP, but which is harder? Well, if I can solve DLP then I can efficiently compute the secret integer $a$ from $g^a$ and then find $g^{ab}$ by raising $g^{b}$ to the power $a$ in the same way Alice does, therefore solving CDH. So anyone who can solve DLP can also solve CDH, meaning DLP is at least as hard as CDH.

The Decisional Diffie-Hellman Problem (DDH)
This is another 'discrete logarithm' style problem used to prove indistinguishability properties. Say Alice and Bob perform the Diffie-Hellman key agreement protocol as above so that $G$, $g$, $g^a$ and $g^b$ are all public and $g^{ab}$ is the shared secret key. Intuitively, the Decisional Diffie-Hellman Problem (DDH) asks whether an adversary can distinguish Alice and Bob's secret key $g^{ab}$ from a random group element of $G$. Formally:

Given $G$, $g$, $g^a$, $g^b$ and $T_x$ such that $T_0$ is a random element of $G$, $T_1 = g^{ab}$ and $x$ is chosen uniformly at random from $ \lbrace 0,1 \rbrace $, find $x$.

If an adversary can solve DDH (i.e. output the correct value of $x$ with probability greater than $\frac{1}{2}$), then $G$, $g$, $g^a$ and $g^b$ must leak some information about the secret key $g^{ab}$ that distinguishes it from a random group element, even if it can't be computed directly. What should be clear is that if the adversary can solve the computational Diffie-Hellman problem, then they can actually compute $g^{ab}$ and hence trivially distinguish this element from a random group element, thereby solving the decisional Diffie-Hellman problem. So anyone who can solve CDH can also solve DDH, meaning CDH is at least as hard as DDH.

These are the three problems we wanted to talk about and we've given a sketch proof of their ordering in terms of hardness: DLP is the most hard, then CDH and then DDH. As we've seen, DLP is sometimes easy, making CDH and DDH easy. So the choice of group $G$ and generator $g$ is very important when doing cryptography!

Wednesday, December 17, 2014

Study Group: Attack on XLS

This week's study group, delivered by Guy, featured a paper by Mridul Nandi attacking the XLS domain completion method. This attack was initially given at DIAC in August, and was formally presented earlier this month at Asiacrypt 2014 in Taiwan. The idea of a domain completion method is to preserve length in an encryption scheme, so that the ciphertext is of the same length as the message. More formally, this means turning an encryption scheme that works on blocks of size n to a cipher that works on inputs of size m ≥ n. This contrasts to the strategy of padding (for example to the block length of a block cipher) that infers ciphertext expansion, which is used in many applications and standards. A number of approaches have appeared in the literature including ciphertext stealing, hash-then-counter (XCB, HCTR, HCH) and applying a block cipher twice to create a one-time pad for the final block (EME, TET, HEH) - but these methods are not applicable to generic blockciphers and lack formal security analysis.

At FSE 2007, Ristenpart and Rogaway presented XLS (eXtended Latin Squares), a domain completion method that inherits the assumed strong pseudorandom permutation property of the underlying blockcipher. The method is conceptually straightforward, utilising a (linear) mixing permutation and bit-flipping and the security proof is, as you would expect from the authors, relatively easy to follow despite its numerous intricacies. No fewer than six teams have incorporated XLS into their CAESAR submissions.

Despite this acquired confidence, Nandi presents an attack to demonstrate that XLS is in fact not a strong pseudorandom permutation (SPRP), giving an attack that makes just three encryption/decryption oracle queries and succeeds with probability 1/2. In addition, the author shows that replacing the mixing function defined in the original paper with some other stronger linear permutation(s) does not help, and gives a distinguishing attack (with success probability 1/4) against this generalised version of XLS.

Nandi does not precisely pinpoint any incorrect assertions in the proof nor provide any indication to give hope that XLS is fixable, and it will be interesting to see how the authors of the original paper (and the teams relying on XLS in their CAESAR submissions) respond in the coming months.

Friday, December 12, 2014

52 Things: Number 10: What is the difference between the RSA and strong-RSA problem?

Cryptography relies heavily on the assumption that certain mathematical problems are hard to solve in a realistic amount of time. When looking at Public-Key (Asymmetric) Cryptography, which is what we'll be focusing on in this blog post we use the assumed existence of One-Way functions, i.e. functions that are easy to compute one way but are difficult to invert. We use problems from algorithmic number theory to produce these functions.

Factoring

The first difficult problem from number theory to talk about is factoring. Given a composite integer $N$ the factoring problem is to find positive integers $p,q$ such that $N = pq$. Although on the face of it this seems like a very simple problem, this is in fact a very tough, well studied problem. This can be solved in exponential time by checking all the numbers $p = 2, \ldots, \sqrt{N}$. However, solving a problem in exponential time is not fast enough. No polynomial time algorithm has been developed to solve the factoring problem, despite many years of research. Clearly there are examples of $N$ for which this is very easy to solve, for example whenever $N$ is even. Therefore, when starting to think about using this in a Cryptographic construction we consider $N$ as very large and being constructed by 2 large primes $p,q$.

The RSA Problem

In RSA public-key encryption [1] Alice encrypts a plaintext $M$ using Bob's public key $(n,e)$ to ciphertext $C$ by $C = M^e (\textrm{mod } n)$ where $n$ is the product of two large primes and $e \geq 3$ is an odd integer that is coprime to the order of $\mathbb{Z}_n^{*}$, the group of invertible elements of $\mathbb{Z}_n$. Bob knows the private key $(n,d)$ where $de = 1 (\textrm{ mod } (p-1)(q-1))$ meaning he can compute $M = C^d (\textrm{mod } n)$. An adversary can eavesdrop $C$ and can know the public key $(n, e)$ however to calculate $M$ the adversary must find the factors of $n$. Therefore, this means the RSA problem is no harder than integer factorisation but is still a very hard problem to solve provided a suitable $n$ is chosen.

The Strong RSA Assumption

The strong RSA assumption differs from the RSA assumption in that the adversary can choose the (odd) public exponent $e \geq 3$. The adversary's task is to compute the plaintext $M$ from the ciphertext given that $C = M^e (\textrm{mod } n)$. This is at least as easy as the RSA problem meaning that the strong RSA assumption is, unsurprisingly, a stronger assumption. The RSA problem is now over a quarter of a century old. Public key encryption schemes have been developed that derive their strength fully from the RSA problem.

[1] - http://people.csail.mit.edu/rivest/RivestKaliski-RSAProblem.pdf

Thursday, December 11, 2014

Password-Protected Secret Sharing

Hugo Krawczyk opened the final day presenting a paper on password-protected secret sharing. He began with a motivating example of protecting a high value secret such as a Bitcoin wallet. Typical storage methods using password authentication at a server are vulnerable to a single point of failure - an attacker just needs to break into one server, and can then mount an offline dictionary attack to guess the user's password and recover the key. One natural cryptographic solution to this problem is to secret share the wallet key across multiple servers (this approach is taken in the commercial solution offered by Dyadic Security), ensuring that an attacker must break into several servers to recover the key. The problem now is that in order to use the Bitcoin wallet, you must authenticate yourself to each of your servers - if this is done with a single password then there are now many points of failure for the system! An offline dictionary attack can now be performed against any one of the servers. Of course, one could choose different, strong passwords for each server, but this is unlikely to be practical for most users.

The notion of password-protected secret sharing (PPSS) gets around this. A PPSS scheme allows a user to secret share a secret among n servers (with threshold t) and store just a single password for authentication. Roughly speaking, the security guarantee is that any attacker breaking into up to t servers and obtaining all of their secret information (shares, long-term keys, password files etc) cannot learn anything about the secret or the password. This implies that the adversary's only hope is to try and guess the password in an online attack: offline attacks with fewer than t+1 servers are useless.

The main contribution of the paper is a new PPSS scheme, which only requires authenticated channels between the servers (except for a one-time setup phase) and is very efficient: authentication is done in one round of communication (2 messages), requiring 2t + 3 exponentiations for the client and 2 for each server. Previous schemes were much less efficient and often required additional assumptions such as a PKI.

The main building block of their construction is an Oblivious PRF (OPRF). Here a server holds a PRF key $k$, the user holds the input $x$ and obtains the output $f_k(x)$, whilst the server learns nothing about the input. This can be done very efficiently with a 'hashed Diffie-Hellman' technique, where the output is defined by $f_k (x) = H(H(x)^k)$ and can be computed with a standard DH exchange requiring just 2 exponentiations per party.

For the PPSS scheme, each server $S_i$ has an OPRF key $k_i$. The user chooses a password $p$, secret shares their secret $s$ into shares $s_i$ and runs the OPRF with each server to obtain $r_i = f_{k_i}(p)$. They then encrypt $s_i$ as $c_i = s_i \oplus r_i$, store $c_i$ at server $S_i$, erase all intermediate information and store the password $p$. To reconstruct, the user retrieves $c_i$ from $S_i$ and runs the OPRF to recover the shares $s_i$. The protocol is nice and simple to describe, but there are many subtle issues that arise with defining and proving security for the OPRF used in the construction: see the paper for details.

Another application of PPSS is for threshold PAKE (password authenticated key exchange), which allows a user to securely exchange keys with any subset of n servers using a single password, provided no more than t servers are corrupted. They prove a generic composition theorem showing that PPSS can be combined with key exchange to give threshold PAKE in just one round, which was never achieved by any prior work.

Wednesday, December 10, 2014

The Legal Infrastructure Around Information Security in Asia

The second invited talk at Asiacrypt strikingly stood out. It was given by Helaine Leggat, an Australian lawyer and information security professional, and it touched upon cryptography only by several mentions of hash functions. The speaker started by elaborating on her motivation to get involved with legal aspects of information security, those being the advent of international cyber warfare and mass surveillance as well as the following shifts in power: from the West to the East, from nation states to the society, and from real assets to information assets. All those are facilitated by the easiness of communication that the internet provides.

For the rest of her talk, Helaine highlighted various elements of the legal framework concerning information security, with a focus on Asia and Oceania. On the international level, there are model laws by United Nations Commission on International Trade Law (UNCITRAL) such as the UNCITRAL Model Law on Electronic Commerce from 1996 and the UNCITRAL Model Law on Electronic Signatures from 2001. Both serve as templates for national law, that is, they are not applicable in their own right. Furthermore, there are conventions such as the United Nations Convention on the Use of Electronic Communications in International Contracts from 2005 and the Convention on Cybercrime by the Council of Europe, which has been adopted by various countries in the Asia-Pacific.

For the national level, the speaker mostly elaborated on laws governing the relationship between governments and citizens with respect to access to information. There are two aspects of this relationship: citizens' access to government information, and the governments' access to information of their citizens. The former is regulated under freedom of information acts in many countries. Those law usually contain exceptions for state secrets.

Even more controversial is the legislation on privacy. It can be seen as being at the core of the trade-off between freedom and protection. Even though there is a lot of commercial interest in personal data, it seems that nation states lead a war on privacy by developing the most sophisticated malware. Moreover, Helaine mentioned historic privacy legislation that makes a difference between telecommunication and broadcasting and that distincts between telephones and computers. In the age of convergence in the form of smart devices, this makes little sense.

Finally, the speaker turned her attention to a more informal kind of legislation such as standards and best practice codes. It it note-worthy that, despite their informal nature, standards are sometimes referred to by laws, for example India's Information Technology act or New Zealand's Privacy Act of 1993, which refers to an ISO standard on privacy.

In the Q&A, Adi Shamir brought up the recent case where the US government asked Microsoft to disclose personal data that is stored in Ireland. Microsoft argues that this data is protected under EU privacy law. In her answer, Helaine pointed to the contradiction between the PATRIOT act and EU law, and to the fact that different parts of the world have different attitudes towards privacy. This makes corporations operating in Europe care more about their customers' privacy there, at least superficially.

Tuesday, December 9, 2014

Large-scale Computation and Exploitation of RC4 Biases

The excellent first invited talk at Asiacrypt 2014 in Kaohsiung, Taiwan was given by Kenny Paterson (@kennyog) from Royal Holloway, entitled "Big Bias Hunting in Amazonia: Large-scale Computation and Exploitation of RC4 Biases". Kenny outlined the progress of himself and his co-authors in analysing the single and double byte biases in the RC4 keystream in detail, and using the resulting information to construct attacks on cryptographic protocols such as TLS and WPA/TKIP.

Keystream bias and plaintext recovery

The fundamental flaw with RC4 is that (at least) the first 256 bytes of the keystream are not uniformly distributed: the probability of each byte taking each of the 256 possible values 0x00,..,0xFF being 1/256, the actual probabilities exhibit small differences (biases) away from this number. I find that the visual representation of these biases gives the best indication of the nature of the problem---check out the graphs here.

These biases are problematic in situations where a fixed plaintext (think cookie, or password) is repeatedly encrypted under new RC4 keys. In the ideal world, the distribution of the resulting ciphertexts for these plaintext should be uniform, but because the keystream is biased, it isn't. This allows an adversary who sees enough of these ciphertexts to use a fairly simple Bayesian analysis to learn about portions of the plaintext. Protocols sitting in this repeated plaintext context include TLS and WPA/TKIP.

The talk went into detail on how these biases can be exploited to attack these two protocols, but for the purposes of this blog I'm going to stick to discussing how these biases were estimated. For more on the actual attacks, have a look at the RHUL homepage, or a couple of blog posts from Matt Green here and here, or our own blog post here.

Generating the biases

The final section (and motivation for the title) of Kenny's talk outlined how the keystream distribution information was estimated. Estimating the distribution is functionally simple: simply generate lots and lots of random 128-bit RC4 keys, record the first 256 bytes of the keystream generated using each key, and sum up the resulting values of each byte (or pair of bytes in the case of double-byte biases) to estimate the probability of each byte value occurring.

The key here is the "lots and lots" part of the above paragraph. The biases are small---the uniform probability would be 0.00390625, and from a rough glance at the estimated distributions a typical deviation might be as small as 0.00001. As a consequence to eliminate the variance in the estimation process and to estimate the distributions with sufficient precision to allow a plaintext recovery attack, a large number of random keys are needed.

How many? In the case of estimating single-byte biases under the WPA/TKIP protocol, the number of keystreams required was 2^48, and for the double-byte biases 2^46 keystreams. Running this kind of computation using standard desktop-level resources is infeasible (unless you have a lot of time on your hands), and so the authors turned to the computational resources available through cloud computing services.

The authors used Amazon's EC2 compute cluster to deploy 128 nodes, each containing Intel Xeon E5-2680 v2 processors clocked at 2.8 GHz, giving a grand total of 4096 Ivy Bridge cores, totalling 8192 virtual cores in the presence of hyperthreading. Kenny gave some interesting anecdotes about issues with managing this amount of hardware, including having to ask Amazon to manually disable controls stopping users from spinning up more than 10 nodes at a time, and maxing out the capacity of the US West data centre.

As well as the computation required to estimate the distribution, storing the distribution is also a challenge---as always, data I/O is lurking in the background ready to pounce just as you think the difficult part is over. The single-byte distribution requires 32 GiB of storage (which I'd assume to be double-precision floating point values), and the double-byte requires a massive 8 TiB. If you're using a remote compute service, then transferring that kind of data onto local storage for further analysis requires significant effort.

Finally, the cost of the computational effort is interesting---clearly, renting compute cycles is cheaper than purchasing the necessary hardware yourself. The total cost was 43K USD; given the total compute time of 63 virtual core years, we can arrive at a cost of around 7 cents per virtual core hour.

The take-home message

For me, a few points stood out as particularly noteworthy. The maxim "attacks always get better" is definitely applicable here: from the initial observation that plaintext recovery might be possible, we've arrived at the point at which widely-used secure protocols can be considered to be broken when used in tandem with RC4. The results from the team at RHUL and elsewhere also demonstrate how you should keep pushing further with investigations into vulnerabilities, and that is certainly worth investing the time and resources necessary to fully experiment with the nature of the vulnerability.

Secondly, we again return to the problems that arise at the point of intersection between applied cryptography and real-world deployments. Kenny talked about the "inertia" of implementations of protocols: from the high of 50%+ usage of RC4 in TLS, we're still seeing a usage of 33%. Having the agility to rapidly patch/update all the necessary hardware, firmware and software to fully eliminate the usage of a broken ciphersuite is clearly very difficult. Moti Yung made a good point about the relative seriousness of these cryptographic flaws---whilst a plaintext recovery attack requiring 2^20-30 TLS sessions is clearly bad and necessitates a fix, the kind of software flaw vulnerabilities we've recently observed in almost all of the most popular TLS/networking stacks are far more impactful.

Monday, December 8, 2014

Bivariate Polynomials Modulo Composites and their Applications

So this week is AsiaCrypt 2014 in Kaosiung, Taiwan and as such this will be one of a series of blog posts appearing over the course of the next few days. Unusually for me I have not chosen a paper on theoretical leakage resilience or anything close to what I usually work on. I have chosen to blog about a talk from the second session "New Proposals" entitled "Bivariate Polynomials Modulo Composites and their Applications" by Dan Boneh and Henry Corrigan-Gibbs, with Henry giving the talk.

So before I start talking about the presentation I just want to recap on RSA and its related problem. Given (equal size) primes $p,q$ we define $N=p\cdot q$, then the RSA problem is given $N$ recover $p,q$ and we assume that this is hard. Certainly we believe that factoring integers is hard and if we can solve the RSA problem, and years of using RSA at a large scale says this does appear to (currently) be a hard problem.

Using the RSA problem we can define a class of problems defined by a function $f$ where given $f(x)\mod{n}$ the goal is to find $x$. If $f(x)=x^2$ we get a problem similar to quadratic resoprocity, while if $f(x)=x^e$ we get RSA and if $f$ is random such that $f(x)=0\mod{N}$ then this is equivalent to factoring $N$.

The question that arose in this presentation is what if the function $f$ is replaced with a bivariate polynomial instead, can we get security and does this have any use? The three notions of security that were looked into were one way-ness, second preimage resistance and collision resistance.

The trick used by the authors is to look at $f(x,y)=c\in\mathbb{Q}$ to understand $f(x,y)\mod{N}$. Since if the problem if easy to solve in $\mathbb{Q}$ then it is easy to solve modulo $N$ (loosely speaking you just convert the answers as you would expect). Currently this is the only method known for solving these but it does raise the question of is it the only method?

Oneway-ness: It can be shown that (in $\mathbb{Q}$) $f(x,y)$ can not be one way if it is of genus 0, it is currently unknown how are (in general) it is to invert for a higher genus.

Second preimage resistant: For $f$ to be second preimage resistant, $f(x,y)=c\in\mathbb{Q}$ must have no non-trivial rational mapping to itself. If there is such a mapping then this mapping can be used to easily break the second preimage resistant property.

Collision resistance: For $f$ to be collision resistant, having $f$ be injective will give this. This gives us that if $f(x,y)$ is not injective then $f(x,y)\mod{N}$ is not collision resistant but it is an open question if $f(x,y)$ being injective automatically implies that $f(x,y)\mod{N}$ is collision resistent.

Before we even consider collision resistant functions, we have to ask if there even exists low degree injective polynomials over the rationals. Well it turns out that this is an open question in Number Theory! On the bright side $f_z(x,y)=x^7+3\cdot y^7$ has been conjectured to be injective over the rationals for 15 years now (can this be proved in the generic ring model?). From this point forward the authors use the assumption $f_z$ is in fact collision resistant and then ask what constructions they can get from this.

A commitment scheme

Commit(m): Choose a random $r$ modulo $N$ and return $c=f_z(m,r)\mod{N}$

Open(c,m,r): Check $c=f(m,r)\mod{N}$

Hiding follows from the fact that $m$ is blinded by a random $r$, while blinding follows from the fact that breaking blinding breaks the assumption that $f_z$ is collision resistant.

They also describe how to make a Chameleon hash and have a really novel use in which using succinct zero knowledge you can prove that 3 Pedersen commitments open to values which are the commitment (and message) of value for one of their commitment scheme values. While zero knowledge "proof that a commitment is of a commitment triple" sounds like a strange use it is actually used in anonymous bitcoin.

To conclude they show that using bivariate polynomials with the RSA problem leads to some interesting schemes and that it is certainly worhty of further study.

Sunday, December 7, 2014

52 Things: Number 9: What are Shannon's definitions of entropy and information?

Information Theory was founded by Claude E. Shannon in 1948. It was originally developed to study signal processing but its application has been broadened to various disciplines through decades. This blog is intended to briefly introduce two fundamental concepts, entropy and information. If you are interested in more details, I personally suggest you to find more in [1].

Entropy

Entropy is a measurement to evaluate the uncertainty[3] of one or more variables.

Assume we are investigating the first web page people visit when they open a web browser. We use samples from two groups of people: 4 cryptographer from Bristol Cryptogroup and 4 passenger grabbed at Bristol coach station. Let's make a more ambitious assumption that the cryptographers always visit http://bristolcrypto.blogspot.co.uk/ first when they open a web browser, while the others each visits a different portal.

Now let's evaluate the uncertainty of their answers: apparently, the answers from cryptographers are quite certain (low uncertainty) whilst it can hardly be guessed if an answer is from a passenger (high uncertainty). In other words, we say the answers in the cryptographers group has a low entropy but that in the passengers group has a high entropy.

So one of the most remarkable inventions of Shannon's is his definition of entropy(Shannon's Entropy):

\begin{equation}

H = -{\sum}_{i}{p_i \log_b{p_i}}

\end{equation}

where $p_i$ is the probability of a message (an answer in previous example) appears. In computer science, we usually use $b = 2$ (bits).

If we compute the entropy in our example, we will have:

\begin{eqnarray}

H_{cryptographer} = -{\sum}_{1}^{4}{1 \log_2{1}} = 0\\

H_{passenger} = -{\sum}_{1}^{4}{{1 \over 4} \log_2{1 \over 4}} = 2

\end{eqnarray}

So the passengers' answer do have a higher entropy than the cryptographers'!

Information

Formally, the definition of Shannon's information is given in [2] as:

" Information is a measure of one's freedom of choice when one selects a message."

To explain this, let's make a minor modification to our previous example. Let's grab another 4 passengers from Bristol train station and assume their answers are also random portals just like the passengers' in coach station.

Here is the question: Given an answer $y$, can you tell which group the answer is from?

We can instantly tell that the answer is from our cryptographer group if $y$ is http://bristolcrypto.blogspot.co.uk/. However, we will be struggling if $y$ is a portal; therefore, we could say the message of http://bristolcrypto.blogspot.co.uk/ contains more information (less freedom) than a portal (more freedom).

So how does it relate to entropy?

Extending the definition of entropy, we define the Conditional Entropy as:

\begin{equation} H(Y|X) = \sum_{x \in X}{p(x)H(Y|X = x)} \end{equation}

which describes the entropy of $Y$ when given condition $X=x$. To make it more explicitly, since entropy is the uncertainty of a variable; hence the previous definition of conditional entropy is in fact the uncertainty of $Y$ when given the "clue"(condition) $X$.

Observation: consider two variables $X$ and $Y$. If $X$ contains only a minimal information of $Y$, then given an exact value of $X$ should not help us much on deducing the value of $Y$, that is, it does not obviously reduce the uncertainty of $Y$; on the other hand, if $X$ contains essential information of $Y$, then the entropy of $Y$ is expected to be low when $X$ is given. Therefore, the conditional entropy can be viewed as a rational measurement to the information $X$ has of $Y$!

Another important measurement is called Mutual Information. It is a measure of mutual dependency among two variables. One way to define it is the loss of entropy(uncertainty) when given a condition:

\begin{equation} I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) \end{equation}

Cryptographic Example

The concepts of information theory is widely used in cryptography. A classic example is to view a cryptographic process as a channel with plaintext and ciphertext being input and output respectively. Research of side channel analysis also benefits from the usage of information theory.

References

[1] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory
    2nd Edition. Wiley-Interscience, 2 edition, July 2006.

[2] S. Vajda, Claude E. Shannon, and Warren Weaver. The mathematical
    theory of communication. The Mathematical Gazette, 34(310):312+,
    December 1950.

[3] http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

Monday, December 1, 2014

52 Things: Number 8: How does interaction help in computation, and what is the class IP?

To answer the questions, we first give give a brief introduction of the interactive proof systems. As we know, zero-knowledge proofs currently play an important role in the study of cryptographic protocols. This concept was introduced by Goldwasser, Micali and Rackoff in [1]. Such proofs have the fascinating property of revealing nothing other than the validity of assertions being proven. In order to achieve this, Goldwasser, Micali and Rackoff developed the interactive proof systems by adding two ingredients to the classical proof systems. The first ingredient is randomisation, the verification of proofs can be probabilistic and the verifier is allowed to err with small probability. The second one is interaction, the static proof is replaced by dynamic prover who will interact with the verifier to convince it the assertion is true. Combining the classical proof systems with the two ingredients has a huge effect: the set of languages of interactive proof systems is in a large complexity class IP.

What is a proof?

Loosely speaking, a proof is a way for a party to convince another party that a certain assertion is true. The two parties involved in a proof system are called a prover and a verifier.

Classical proofs

A classical mathematical proof is a fixed sequence of statements, which can be all written down by the prover then the verifier can easily check the validity of the assertion. There is no interaction between the prover and the verifier.

Any proof system should have the following properties:

Efficiency: the verification procedure can be carried out efficiently.
Soundness: for any false assertion, invalid proofs are hard to find.
Completeness: for any true assertion, there exists a proof.

Recall the complexity class NP can be viewed as a class of languages whose members all have certificates that can be easily checked (the non-members do not have certificates). Hence we have NP is exactly the class of languages of classical proofs.

Interactive Proofs

In an interactive proof system, the prover and verifier are allowed to interact with each other by exchange of messages. Before introducing the concept of interactive proofs, we first give an example (which can be found in [2]) to show how does interaction help in computation.

Example: Graph Isomorphism and Graph Non-isomorphism.

Two graphs G and H are called isomorphic if the nodes of G can be reordered so that it is identical to H. We define the language:

ISO = {<G,H>|G and H are isomorphic graphs}

It is clear that ISO is in NP. Even though the number of nodes can be very large, the membership of ISO can be easily verified by given the instructions of reordering.

Then we consider the complement problem of Graph Isomorphism, namely Graph Non-isomorphism. We define the language:

NOISO = {<G,H>|G and H are not isomorphic graphs}

And the question is, using a classic proof, how can a prover convince a verifier that two graphs are not isomorphic? We don't know how to provide short proofs that the graphs are not isomorphic and the verifier can't check every possibilities in polynomial time. Thus, we don't know how to prove that NOISO is in NP. Nevertheless, consider the following interactive protocol, the prover can convince the verifier the fact that the given two graphs are not isomorphic.

Both the prover and the verifier have a pair of graphs $(G_1, G_2)$ as common input. The verifier randomly choose a random bit $b \in \{0,1\}$ and a permutation $\pi$. Then it applies $\pi$ on $G_b$ to get a graph $H$. The verifier sends $H$ to the prover. Next, upon received $H$, the prover sends $b' \in \{0,1\}$ to the verifier. Finally, the verifier accepts if and only if $b'=b$.

The idea behind the protocol is, if the given graphs $(G_1,G_2)$ are not isomorphic, then the prover should be able to identify $H$ is from either $G_1$ or $G_2$. However, if the input graphs are isomorphic, even with unlimited computational power, the best choice of the prover is to make $b'$ a random guess. In this case, the prover accepts at most $\frac{1}{2}$.

From the above example, we conclude that NOISO cannot be proved to the verifier via a classic proof, whereas it could be proved via an interactive proof(protocol). We can see there is some power in interaction.

Now we give the formal definition of interactive proofs and the complexity class IP.

Interactive proof system: A pair of interactive machines $(P,V)$ is called an interactive proof system for a language L if machine V is polynomial-time and the following conditions hold:

Completeness: For ever $x \in L$,

$Pr[<P,V>(x)=1] \geq \frac{2}{3}$

Soundness: For every $x \notin L$

$Pr[<P,V>(x)=1] \leq \frac{1}{3}$

The class IP: The class IP consists of all languages having interactive proof systems.

By the definition, it is obvious that any language in BPP is in IP. And if we restrict the exchange of messages between the machines to be $1$, then we have NP is in IP. Actually, IP is a surprisingly large class. In 1992, Shamir revealed that PSPACE=IP [3].

In addition, notice that in the protocol, the prover has the ability to toss a private-coin. If the prover is allowed to access to the verifier's random string leads to the model of interactive proofs with public-coins, which is related to a similar complexity class AM [4].

[1] http://dl.acm.org/citation.cfm?id=63434

[2] http://www.amazon.co.uk/Introduction-Theory-Computation-Michael-Sipser/dp/0619217642

[3] http://dl.acm.org/citation.cfm?doid=146585.146609

[4] http://en.wikipedia.org/wiki/Arthur%E2%80%93Merlin_protocol

Tuesday, November 25, 2014

Post-Quantum Signatures

This week's Study Group was led by Peter Scholl who spoke about hash-based signatures and in particular a recent paper by Bernstein, Hopwood, Hulsing, Lange, Niederhagen, Papchristodoulou, Schwabe and Wilcox O'Hearn, called SPHINCS: practical stateless hash-based signatures [1].

Most real-world signature schemes like RSA and DSA can be broken by a quantum computer, due to Shor's algorithm [2]. One could use lattice-based signatures but their security is not well understood: the authors of [1] note that "their quantitative security levels are highly unclear" even in the pre-quantum setting.

On the other hand, hash-based signature schemes using 'quantum-resistant' one-way hash functions are secure in the quantum setting [3], although the symmetric security parameter needs to be doubled for quantum resistance due to Grover's algorithm [4]. The trouble is that such schemes tend to be inefficient or stateful, where the latter means one cannot have a signing key shared across multiple devices and the scheme may be vulnerable to 'restart attacks' where the scheme is forced to re-use a secret key, compromising security. The paper of Peter's talk seeks to address this by constructing a (relatively) efficient, stateless, hash-based signature scheme called SPHINCS. The construction is proved secure “based on weak standard-model assumptions, avoiding collision resistance and the random-oracle model”.

One Time Signature Schemes

First, we revisited the Lamport One Time Signature (OTS) scheme. (This isn't the OTS used in the paper but it will serve as a reasonable approximation.) Here your secret key is a sequence of pairs of bitstrings $(x_0, y_0), ..., (x_n, y_n)$ and the public key is the sequence of pairs of hashes $(H(x_0), H(y_0)), ..., (H(x_n), H(y_n))$. To sign a message $m$ one takes each bit $m_i$ of $m$ and selects $x_i$ or $y_i$ according to whether $m_i$ is 0 or 1. To verify a signature one hashes each $x_i$ or $y_i$ in the signature and checks the output against the public key. This is called “One Time” since the secret key can only sign one message before security is seriously compromised: your signature reveals half of the elements of the secret key.

From 'One Time' to 'Many Times'

A natural way to build “many time” signature schemes is to iterate OTS schemes. The trouble is that doing this in a naive way means the verification algorithm needs a huge number of public key components, enough for all the bits of all the messages you ever want to sign! Instead, we compress each OTS public key using a “Merkle tree” where the (original) OTS public keys are the leaf nodes and parents are constructed by hashing the concatenation of the children (together with a bitmask for added security). We want the (new) public key to be the root of this tree and so our signature must supply enough information (in as brief a way as possible) for the verifier to recover the root. This is how we do it:

Given the path from a leaf to the root, let the sequence of siblings of each node on the path be called the authentication path of the leaf.
In a signature, supply the index of the leaf node used and its authentication path.
With the leaf and its sibling, the verifier can construct the parent.
Then with the sibling of the parent, the verifier can construct the grandparent.
The verifier continues like this, using the siblings given in the authentication path to recover the root node i.e. the public key.

So we now have a scheme with a practical-sized public key. We can also shrink the secret key by just storing a seed to a Pseudo-Random Generator which will then output the many OTS secret keys which, when hashed, give the leaves of the Merkle tree. However, we haven't “eliminated the state”, to use the authors' phrase: one must store a counter recording which OTS secret keys have been used so far.

"Eliminating the State"

Goldreich's [5] answer to this problem was to deterministically choose which OTS key to use next, rather than just use them in order, i.e. some hash of the message determines the index of the OTS secret key to use in signing. Of course, now one needs a much bigger tree in order to avoid accidentally using a “one time” key more than once. In fact, for security, there needs to be exponentially many OTS key pairs, so we can't just have one leaf of the tree for each OTS secret key or the tree would be far too big for efficient signing. Instead, we associate every node with an OTS key pair and at each step from a leaf to the root, sign the hash of the public keys of the child nodes with the secret key of the parent node. The new (longer) signature contains the index of the leaf node used and the OTS signatures of all the nodes on the path to the root. The trouble with this scheme is that the length of the signature is cubic in the security parameter. This is where the authors of [1] come in.

From 'One Time' to 'A Few Times'

The main idea of the paper is to use a 'few times signature' scheme (instead of an OTS) at the bottom of the tree to reduce the number of leaves needed for security and hence the overall height of the tree, thus shortening the signature. Their choice of scheme for the bottom of the tree is called HORS: Hash to Obtain Random Subset. In HORS, the secret key is the tuple $(s_1, ..., s_t)$, the public key is the hashes $(H(s_1), ..., H(s_t))$ and a message $m$ determines a (“random”) subset $S$ of $\lbrace1, 2, ..., t\rbrace$ with fixed size $k$ (much smaller than $t$). Then the signature for $m$ is the set of secret key components corresponding to $S$, i.e. $\lbrace s_i | i \in S\rbrace$. Now we can use the secret key 'a few times' before security is compromised as only a small number of the components $s_i$ of the secret key are revealed in each signature. But again we have the problem that the public key needs to be very large and so a Merkle tree is once again employed: the new public key becomes the root node, recoverable from the index of a leaf node and the corresponding authentication path. Notice that this means the bottom of the tree in SPHINCS, the construction proposed in [1], is itself a tree. So SPHINCS consists of a hyper-tree (a tree of trees).

The SPHINCS Tree of Trees

The SPHINCS hyper-tree has total height $h$ consisting of $d$ layers, each of height $h/d$. The index from the hash of the message determines a HORS tree on the bottom layer of the hyper-tree and a leaf of this HORS tree from which we compute the HORS signature of the message. Then this signature is signed according to the OTS scheme on the next layer (where the initial index in some way determines the tree and the leaf to use). We repeat this OTS signing on each layer and finally output the SPHINCS signature consisting of the index, the HORS signature (which contains all the information needed to recover the HORS public key) and each OTS signature and each authentication path needed to recover the root at each layer.

Real World Considerations (in the Quantum Computing World!)

After proving its security, the authors demonstrate the practicality of SPHINCS with certain choices of parameters: the hyper-tree has total height 60 consisting of 12 layers (each of height 5), the number of HORS secret key elements is $2^{16}$ and 32 of these are revealed in each HORS signature (i.e. $t=2^{16}$, $k = 32$). There is also a parameter that affects the OTS scheme but we haven't detailed it here. With these choices, the signatures have size 41KB, the keys have size around 1KB and one can sign hundreds of messages per second on a modern quad-core computer.

By most accounts, quantum computers are something of a pipe-dream at the moment. Nevertheless, it's reassuring to know that security is still achievable - and indeed practical - against adversaries who can exploit the enormous power of quantum computers, whenever that day comes.

[1] - http://eprint.iacr.org/2014/795.pdf

[2] - http://en.wikipedia.org/wiki/Shor%27s_algorithm

[3] - http://eprint.iacr.org/2014/709.pdf
[4] - http://en.wikipedia.org/wiki/Grover%27s_algorithm

[5] - Foundations of Cryptography: Volume 2, Basic Applications (2004)

Monday, November 24, 2014

52 Things: Number 7: How does randomness help in computation, and what is the class BPP?

So far, during this blog series Ryan has introduced us to complexity classes, and in particular to P:

P is the class of languages decidable in polynomial time by a deterministic Turing machine.

Then, Guy introduced us to complexity class NP:

NP is the class of languages decidable in polynomial time by a nondeterministic Turing machine.

This week I am going to introduce the complexity class BPP:

BPP is the class of languages that are recognised by a probabilistic polynomial time Turing machine with an error probability of $\frac{1}{3}$.

Probabilistic Turing Machine

A probabilistic Turing Machine[1] is a type of nondeterministic Turing Machine which randomly chooses between the available transitions at each point according to a probability distribution. What this means is that a probabilistic Turing machine can have stochastic results. On the same input, it could have different run times, accept it on one occasion and reject it on another. It could also never stop. This Turing Machine gives rise to other complexity classes such as RP, ZPP and, the one we're discussing in this post, BPP.

A little about the complexity class BPP

So as we have seen from the definition, the class BPP (Bounded-Error probabilistic polynomial time) contains the decision problems that are solvable in polynomial time by a probabilistic Turing machine with error probability $\frac{1}{3}$. Note that this error probability can be chosen to be any value strictly between $0$ and $\frac{1}{2}$ due to a result named the amplification lemma which I will not discuss further here. The class BPP contains P, the class of problems solvable in polynomial time by a deterministic Turing Machine, this is due to the fact that a deterministic Turing Machine is a special case of the probabilistic Turing Machine (taking the only possible path with probability 1). As talked about in Guy's post, there is an open (Millennium) problem conjecturing as to whether P = NP. There is a similar question with BPP, being P = BPP? The number of problems known to be in BPP but not known to be in P is decreasing.

An example of a BPP Problem

One of the most famous problems known to be in BPP but not known to be in P was determining whether a number was prime. However, recently (2002) a deterministic polynomial time algorithm was found[2] for this problem meaning that it is indeed in P. Another problem known to be in BPP and currently still not known to be in P is polynomial identity testing[3], the problem of determining whether a polynomial is identically equal to the zero polynomial.

There are still many very important unanswered questions on the topic of Complexity Classes. Some of which, if answered, could have a large impact on shaping the future of Cryptography and Computer Science.

[1] - http://en.wikipedia.org/wiki/Probabilistic_Turing_machine

[2] - http://en.wikipedia.org/wiki/AKS_primality_test

[3] - http://en.wikipedia.org/wiki/Schwartz%E2%80%93Zippel_lemma

Tuesday, November 18, 2014

Study group: Network Security Risk Assessment Using Bayesian Belief Networks.

This weeks study group was given by Shan on assessing network security risk using Bayesian Belief Networks, following the paper of Kondakci titled "Network Security Risk Assessment Using Bayesian Belief Networks". The model developed can be applied to a variety of security evaluation tasks, risk assessment and other decision making systems. The most important thing to understand on this topic is exactly how Bayesian Belief Networks (BBN's)[1] work and how they can be used to calculate security risks caused by various threat sources. To achieve this understanding, we need to talk a little about conditional probability[2]. We know that given an event $A$, the probability of an event $B$ occurring is
$$\mathrm{P}(B|A) = \frac{\mathrm{P}(B \cap A)}{\mathrm{P}(A)} $$
We can use this to consider an attack on two systems $A$ and $B$. Let us assume that the probability of an attack $\mathrm{P}(N)$ on either system is 0.1 (meaning the probability of no attack $\mathrm{P}(N')$ is 0.9). Let us also assume that the probability of system $A$ failing if there is an attack is 0.8 and if there isn't an attack 0.1. If there is an attack on system $B$ it fails with probability 0.6 and if no attack 0.5.
From these values we can very easily calculate the chances of the various systems failing. The probability of A failing $\mathrm{P}(A)$ is calculated as follows
$$\mathrm{P}(A) = \mathrm{P}(A|N)\mathrm{P}(N) + \mathrm{P}(A|N')\mathrm{P}(N') = 0.1$$
Similarly we can show the probability of B failing as 0.53. Using Bayes Theorem[3] the probability that an attack has occurred given that system A has failed can be calculated as
$$\mathrm{P}(N|A) = \frac{ \mathrm{P}(A|N)\mathrm{P}(N) }{ \mathrm{P}(A)} = \frac{(0.8 * 0.1)}{0.17} = 0.47 $$
As you would expect, the observation that System A has failed has increased the probability that an attack has occurred. Let $M = N|A$ be the event of there being an attack given that $A$ has failed and $M' = N'|A$ the event of there not being an attack given that $A$ has failed. Using the law of total probability[4], the probability that system B has failed can be calculated as
$$\mathrm{P}(B) = \mathrm{P}(B|M)\mathrm{P}(M) + \mathrm{P}(B|M')\mathrm{P}(M') = 0.55 $$
Again, observing that System $A$ has failed increases the probability that System $B$ has also failed.

This simple example shows how Bayesian Belief Networks can be used to get a better understanding of the state of systems. As a real world example, consider mobile phone applications. If we know that an application has failed on a certain phone then we can calculate the failure risk on other phones. Obviously these models can easily increase in size and complexity, the hardest thing about this model is building the conditional probability table associated with the various systems involved, in our case, finding the values of $ \mathrm{P}(A|N)$ etc. These are built using historical data. Systems can be threatened by various types of threats, the first step is to create a directed acyclic graph (DAG)[5] showing all the relations involved in the system. This can then be converted into a BBN. This way we can combine certain threats into a joint impact parameter, for example we could have external threats as well as internal threats and within that we can have different types of internal threats for the system. The model presented is a clever, generic way of computing the risk of the various classifications of threats of IT systems and computing environments using conditional probability through Bayesian Belief Networks.

[1] - http://en.wikipedia.org/wiki/Bayesian_network
[2] - http://en.wikipedia.org/wiki/Conditional_probability
[3] - http://en.wikipedia.org/wiki/Bayes'_theorem
[4] - http://en.wikipedia.org/wiki/Law_of_total_probability
[5] - http://en.wikipedia.org/wiki/Directed_acyclic_graph

Friday, November 14, 2014

52 Things: Number 6: How can we interpret NP as the set of theorems whose proofs can be checked in polynomial time?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the Complexity Theory section with an alternative definition of NP...

This question is very much a follow up question to the previous one. Last week Guy answered the question of ``What is meant by the complexity class $\mathbf{NP}$?'', while this week I will be answering the related question of ``How can we interpret $\mathbf{NP}$ as the set of theorems whose proofs can be checked in polynomial time?''.

Now to me this is a more intuitive definition of what it means for a problem to be in $\mathbf{NP}$. Not only is it a more intuitive definition but it should (hopefully) also be clearer as to why this is a complexity class that is useful both for cryptography and the wider world. Now before we go into what we can use the class of problems for, the definition is as follows:

$\mathbf{NP}$ is the class of languages that have polynomial time verifiers.

OK but what does this actually mean? Basically if we have an element $x$ and we want to know if $x\in L$ (where $L$ is some $\mathbf{NP}$ language) we have a prover $P$ which given $x$ outputs a witness $w$, this may take exponential time to find $w$ given $x$. Then if we give $x$ and $w$ to our verifier $V$, $V$ can tell if $x\in L$ in polynomial time.

This definition seems very different from the one given last week but they are in fact equivalent. Informally they are equivalent because the witness $w$ can just be the sequence of decisions the NDT made at each branching point to show that $x\in L$. For a (slightly) more formal proof of their equalence [1] is a reasonable online source (with references to textbooks).

So why might this be useful in cryptography? Well essentially we have a class of languages which can take exponential time to check if you do not know a witness but with a witness it can be done in polynomial time. This has the ``feel'' of a lot of cryptographic algorithms - take Encryption (formalisation to follow in future weeks' blogs) for example; you want it to be exponentially hard to get the message from ciphertext without the key (similar to a witness) but with the key you want it to be efficient (polynomial time) to extract the message.

A warning: While it sounds like a good move to use an $\mathbf{NP}$ problem for cryptography it may not be as simple as it first appears. This is because languages are in $\mathbf{NP}$ based on the worse case complexity where as for encryption we need it to be hard on average. For example we may have an $\mathbf{NP}$ language which has one element that takes exponential time to solve but all others are really efficient to solve - this will not make a good basis for an encryption scheme because we want encryption to be secure for all messages not just one.

Now I am aware that integer factorization isn't known to be $\mathbf{NP}$-complete and isn't known to be in $\mathbf{P}$ (See Ryan's blog here for a description) but it makes for a nice example of the point I am trying to make about choosing your $\mathbf{NP}$ instances carefully. In general finding a factor of a number is easy (half of them are divisible by two!) but if we choose something sensible we can make it hard to factor. Let us focus on numbers of the form $N=p\cdot q$ for $p,q$ prime (a.k.a numbers with only two proper factors). Now if either of these numbers is small then it is again easy, so we want the numbers to be of equal size (this is what we do for RSA). From this it is possible to build an encryption scheme over the top.

[1] - http://en.wikipedia.org/wiki/NP_(complexity)#Equivalence_of_definitions