# Bristol Cryptography Blog

A blog for the cryptography group of the University of Bristol. To enable discussion on cryptography and other matters related to our research.

## Tuesday, November 18, 2014

### Study group: Network Security Risk Assessment Using Bayesian Belief Networks.

This weeks study group was given by Shan on assessing network security risk using Bayesian Belief Networks, following the paper of Kondakci titled "Network Security Risk Assessment Using Bayesian Belief Networks". The model developed can be applied to a variety of security evaluation tasks, risk assessment and other decision making systems. The most important thing to understand on this topic is exactly how Bayesian Belief Networks (BBN's)[1] work and how they can be used to calculate security risks caused by various threat sources. To achieve this understanding, we need to talk a little about conditional probability[2]. We know that given an event $A$, the probability of an event $B$ occurring is

$$\mathrm{P}(B|A) = \frac{\mathrm{P}(B \cap A)}{\mathrm{P}(A)} $$

We can use this to consider an attack on two systems $A$ and $B$. Let us assume that the probability of an attack $\mathrm{P}(N)$ on either system is 0.1 (meaning the probability of no attack $\mathrm{P}(N')$ is 0.9). Let us also assume that the probability of system $A$ failing if there is an attack is 0.8 and if there isn't an attack 0.1. If there is an attack on system $B$ it fails with probability 0.6 and if no attack 0.5.

From these values we can very easily calculate the chances of the various systems failing. The probability of A failing $\mathrm{P}(A)$ is calculated as follows

$$\mathrm{P}(A) = \mathrm{P}(A|N)\mathrm{P}(N) + \mathrm{P}(A|N')\mathrm{P}(N') = 0.1$$

Similarly we can show the probability of B failing as 0.53. Using Bayes Theorem[3] the probability that an attack has occurred given that system A has failed can be calculated as

$$\mathrm{P}(N|A) = \frac{ \mathrm{P}(A|N)\mathrm{P}(N) }{ \mathrm{P}(A)} = \frac{(0.8 * 0.1)}{0.17} = 0.47 $$

As you would expect, the observation that System A has failed has increased the probability that an attack has occurred. Let $M = N|A$ be the event of there being an attack given that $A$ has failed and $M' = N'|A$ the event of there not being an attack given that $A$ has failed. Using the law of total probability[4], the probability that system B has failed can be calculated as

$$\mathrm{P}(B) = \mathrm{P}(B|M)\mathrm{P}(M) + \mathrm{P}(B|M')\mathrm{P}(M') = 0.55 $$

Again, observing that System $A$ has failed increases the probability that System $B$ has also failed.

This simple example shows how Bayesian Belief Networks can be used to get a better understanding of the state of systems. As a real world example, consider mobile phone applications. If we know that an application has failed on a certain phone then we can calculate the failure risk on other phones. Obviously these models can easily increase in size and complexity, the hardest thing about this model is building the conditional probability table associated with the various systems involved, in our case, finding the values of $ \mathrm{P}(A|N)$ etc. These are built using historical data. Systems can be threatened by various types of threats, the first step is to create a directed acyclic graph (DAG)[5] showing all the relations involved in the system. This can then be converted into a BBN. This way we can combine certain threats into a joint impact parameter, for example we could have external threats as well as internal threats and within that we can have different types of internal threats for the system. The model presented is a clever, generic way of computing the risk of the various classifications of threats of IT systems and computing environments using conditional probability through Bayesian Belief Networks.

[1] - http://en.wikipedia.org/wiki/Bayesian_network

[2] - http://en.wikipedia.org/wiki/Conditional_probability

[3] - http://en.wikipedia.org/wiki/Bayes'_theorem

[4] - http://en.wikipedia.org/wiki/Law_of_total_probability

[5] - http://en.wikipedia.org/wiki/Directed_acyclic_graph

## Friday, November 14, 2014

### 52 Things: Number 6: How can we interpret NP as the set of theorems whose proofs can be checked in polynomial time?

*This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.*We continue the Complexity Theory section with an alternative definition of NP...*This question is very much a follow up question to the previous one. Last week Guy answered the question of ``What is meant by the complexity class $\mathbf{NP}$?'', while this week I will be answering the related question of ``How can we interpret $\mathbf{NP}$ as the set of theorems whose proofs can be checked in polynomial time?''.*

*$\mathbf{NP}$ is the class of languages that have polynomial time verifiers.*

**A warning**: While it sounds like a good move to use an

*$\mathbf{NP}$*problem for cryptography it may not be as simple as it first appears. This is because languages are in $\mathbf{NP}$ based on the worse case complexity where as for encryption we need it to be hard on average. For example we may have an $\mathbf{NP}$ language which has one element that takes exponential time to solve but all others are really efficient to solve - this will not make a good basis for an encryption scheme because we want encryption to be secure for all messages not just one.

## Wednesday, November 5, 2014

### 52 Things: Number 5: What is meant by the complexity class NP?

*This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.*We continue the Complexity Theory section with NP...**P**:

**P***is the class of languages decidable in polynomial time by a deterministic Turing machine.*

**NP***is the class of languages decidable in polynomial time by a***nondeterministic**Turing machine.

###

What is a Nondeterministic Turing Machine (NDTM)?

Well,
an NDTM is a Turing Machine for which the transition function can have
multiple values for the same tape value/state pair (meaning it is not
technically a function any more, so the correct thing would be to call
it a transition relation). Thus evaluation of an NDTM on any particular
input can be thought of as a tree, branching at each point the
transition function provides more than one possible value. Now, an NDTM
evaluates all of these possible branches at once, and accepts if at
least one of these computations halts in the accept state. This definition generalizes from language membership to decision to computational problems in the same way as **P**did in last weeks blog.

### Some Examples of NP problems

One of the key examples of an NP problem is the satisfiability problem:

**SAT:***Given an expression in n boolean variables, is there an assignment of the variables such that the expression evaluates to True?*

### Ok, so there are problems in NP. What is interesting about it?

**P⊆NP**since a DTM is an NDTM that just happens not to ever branch (in fact, our first example can actually be solved by a DTM and so is within

**P**). So, the real question is what sort of things can we do in

**NP**that we couldn't do in

**P**? Well, this is the root of the

**P**$\overset{?}{=}$

**NP**millenium problem, which is a major open problem. There are certainly problems that we have found that are known to be in

**NP**that we do not know to be in

**P**, but perhaps future research will show that these are also in

**P**.

The Cook-Levin theorem provides an interesting result in this direction: No problem in NP is any harder than SAT. What his means is that if I had an oracle (basically an algorithm that I can see the input/output of but none of the workings) that solved SAT, by asking it to solve cleverly constructed queries I could use it to solve any other NP problem. This made SAT the first example of an

**NP-complete**problem. So, to show that a problem X is at least as hard as solving an NP problem (it is

**NP-hard**), it suffices to show that if I could solve X, I could use it to solve SAT.

## Friday, October 31, 2014

### 52 Things: Number 4: The Complexity Class P

*complexity class P*. Having recently joined the Cryptography group at Bristol as a Mathematician, I knew very little theoretical computer science when I first started my PhD and I'm sure there will be plenty of others in my situation, so this blog will start right from the beginning and you can skip over parts you know already. First, we'll give an overview of what

*complexity*means and why it matters, then we'll define

*Turing machines*, and finally arrive at the

*complexity class P*, concluding with an example.

*Introduction to the Theory of Computation*by Michael Sipser [1], which I have found hugely helpful.

**Section 1: Complexity and Big O Notation**

*number of operations*that a certain

*model*of a computer would take to do it. This is called

*(time) complexity theory*.

Typically, though, the number of operations required will depend on the

*input*to the task and may vary even with inputs of the same length. As a pertinent example, say we design a computer program which tells you whether or not an integer you input is prime. If we give as input the number 256, the program will probably output 'not prime' sooner than if we had given it the number 323 (even though they both have length 9 when written as binary integers, for example), since the first integer has a very small prime factor (2) and the second has larger factors (17 and 19). Therefore we usually opt for a

*worst-case analysis*where we record the longest running time of all inputs of a particular length. So we obtain an algebraic expression $t(n)$ that reflects the longest running time of all inputs of length $n$.

*dominant*term in the expression and also ignore any constant factors. This is called

*asymptotic analysis*; we assume $n$ is enormous and ask roughly how many steps the model of computation will take to 'finish' when given the worst possible input of length $n$, writing our answer in the form $\mathcal{O}\left(t\left(n\right)\right)$. For example, if we find that our process takes $6n^3 – n^2 + 1$ steps, we write that it is $\mathcal{O}\left(n^{3}\right)$, since all other terms can be ignored for very large $n$.

**Section 2: Turing Machines**

*alphabet*is a non-empty finite set and a

*string*is a finite sequence of elements (symbols) from an alphabet. A

*language*is simply a set of strings.

*Turing machine*models what real computers can do. Its 'memory' is an infinitely long tape. At any time, each square of the tape is either blank or contains a symbol from some specified alphabet. The machine has a tape head that can move left or right along the tape, one square at a time, and read from and write to that square. At first, the tape is all blank except for the leftmost $n$ squares which constitute the input (none of which can be blank so that it is clear where the input ends). The tape head starts at the leftmost square, reads the first input symbol and then decides what to do next according to a

*transition function*. The transition function depends on what it reads at the square it is currently on and the

*state*that the machine is currently in (like a record of what it has done so far) and returns

- a new state
- another symbol to write to the square it is on (though this symbol might be the same as what was already written there)
- a direction to move in: left or right.

*accept state*or

*reject state*.

*accepts its input*. Similarly it may

*reject its input*. In either case we say the machine

*halts*on its input. But note that it may enter a loop without accepting or rejecting i.e. it may never halt. If a Turing machine accepts every string in some language and rejects all other strings, then we say the machine

*decides*that language. We can think of this as the machine testing whether or not the input string is a member of the language. Given a language, if there is a Turing machine that decides it, we say the language is

*decidable*.

*Church-Turing thesis*[2]). We define the

*time complexity class*$\mathrm{TIME}\left(t\left(n\right)\right)$ to be the collection of all languages that are decidable by an $\mathcal{O}\left(t\left(n\right)\right)$ time Turing machine, then we turn computational problems into questions about language membership (is an input string a member of a certain language? e.g. does this string representing an integer belong to the language of strings representing prime integers?) and can partition computational problems into time complexity classes.

**Section 3: The Complexity Class P**

*polynomial time*. The

*complexity class P*is the class of all languages that are decidable in polynomial time by a Turing machine. Since $k$ could be very large, such Turing machines are not necessarily all practical, (let alone 'fast'!), but this class is a rough model for what can be realistically achieved by a computer. Note that the class P is fundamentally different to those languages where $t(n)$ has $n$ in an exponent, such as $2^n$, which grow much, much faster as $n$ increases – so fast that even if you have a decider for some language, you may find that the universe ends before it halts on your input!

*directed graph*(a set of nodes and edges where there is at most one edge between any pair of nodes and each edge has an arrow indicating a direction). Then if we encode the graph and the two nodes as a single string, we can form a language consisting of those strings representing a graph and two nodes such that it is possible to follow the edges from the first node and eventually arrive at the second. So a decider for this language will effectively answer the question of whether there is a path from the first node A to the second B, called the

*path problem*, by accepting or rejecting the graph and nodes you input. We give a decider for this language and show that it decides in polynomial time.

- First put a mark on A.
- Scan all the edges of the graph. If you find an edge from a marked node to an unmarked node, mark the unmarked node.
- Repeat the above until you mark no new nodes.
- If B is marked, accept. Otherwise, reject.

*the path problem is in P*.

## Thursday, October 30, 2014

### SPA and the AES key schedule

Today's study group was given by Valentina on the 2014 CHES paper titled "Simple Power Analysis on the AES Key Expansion Revisited" by Christophe Clavier, Damien Marion and Antoine Wurcker from the Universite de Limoges in France.

To briefly recap, "simple power analysis" (SPA) is the rather misleading moniker for the class of methods that analyse side-channel information contained within a small amount of side-channel *traces* captured during the encryption of a single pair of plaintext and key material. The misleading nature of the title is that these methods are anything but simple to perform---the amount of exploitable information leakage contained within a single trace corresponding to the encryption and decryption of a fixed input pair is far, far smaller than that which can be achieved by capturing a large amount of traces for a variety of input pairs to be exploited using differential power analysis (DPA).

In the side-channel community there's a growing shift in perception towards viewing side-channel analysis as an auxiliary phase in an enhanced global key search, rather than a stand-alone ‘win-or-lose’ attack. DPA attacks use the additional information available in the larger set of traces and aim to reduce the final enumeration effort to as small an amount as possible. SPA attacks instead face the challenge of having to exploit all the available information in the trace and perform a potentially demanding enumeration phase.

The work of Clavier *et. al.* explores attacks on the AES key scheduling algorithm. It is sufficient for the purposes of this blog to know that the AES key schedule takes (assuming AES-128) one 16-byte secret key and expands it to 11 16-byte round keys, the first of which is the same as the input secret key. The authors explore two *masked* implementations of the key schedule algorithm---masking aims to combine random values (masks) with sensitive intermediate values, to break any relationships between the intermediate values and observed side-channel information. The first is termed "11 byte entropy boolean masking" and generates 11 individual random bytes, each of which masks all of the bytes with each round key (the 16 bytes of a single round key are masked using the same random mask). The second is termed "16-byte entropy boolean masking", and essentially is orthogonal to the 11-byte scheme: each byte within a round key is masked with a different random byte, but each round key is masked by the same 16 random bytes. In an ideal case, all of the 11 x 16 = 176 sensitive bytes would be masked by a new random byte each time---the authors claim that their 11- and 16-byte schemes are relevant in scenarios where a device does not have enough memory or available entropy to store or generate this many random values.

SPA attacks on the key-schedule attempt to reduce the size of the set of possible secret keys as much as possible. In this work, the authors assume that they know the Hamming-weight (a common form of side-channel leakage is that the Hamming weight of an intermediate value is proportional to side-channel information) of the key byte XORed with the mask. This is a strongly simplifying assumption---in practice, an attacker would have to *profile* the key schedule on a second device to begin to work towards an approximation for the side-channel leakage.

The primary contribution of the paper is an adapted "guess and backtrack"-style algorithm for reducing the set of possible keys by exploiting relationships between key bytes within the key schedule, with full knowledge of the leakage corresponding to the targeted intermediate variable of the key byte XORed with its mask. The additional enumeration effort imposed by the presence of the masks is shown to be manageble, finding that with up to 30 trace measurements their attack succeeds in drastically reducing the remaining search space in a relatively short amount of time. The attacks are further analysed in the presence of a shuffling countermeasure (the order of execution of the combination of the key bytes with the masks is randomised within each 4-byte column), and the authors discover that they can adapt the algorithm to explore all permutations of the ordering, taking on average several hours to succeed. With an assumption of being able to introduce faults in specific parts of the algorithm, this time can be reduced to the order of minutes.

The techniques presented for exploiting the information leakage are intricate and clever. The work motivates the following question for this area of research: to discover methods for relaxing the assumption on the attacker needing full knowledge of the information leakage occurring.

## Friday, October 24, 2014

### Study group: Lattice-based Digital Signatures

The next steps towards the security of lattice-based signature schemes were taken in '08, when two independent works described schemes that are provably secure based on the hardness of standard lattice problems. The first is by Gentry, Peikert and Vaikuntanathan, which follows the hash-and-sign paradigm and includes a useful method to sample from the discrete Gaussian distribution, which is used in most modern lattice-based crypto schemes. As the word 'hash' implies, the security proof for this scheme is in the random oracle model. The second scheme is by Lyubashevsky and Micciancio in the standard model, but it only provides one-time secure signatures. These can be converted into fully secure signatures using a tree construction, which requires a logarithmic number (in the security parameter) of applications of the one-time scheme.

These two works inspired two different lines of research. One line focuses on getting the best efficiency in the random oracle model, whereas the other focuses on getting security in the standard model while maintaining a good asymptotic efficiency as well. The focus paper of the study group was in this second line of research: Improved Short Lattice Signatures in the Standard Model by Ducas and Micciancio from this year's Crypto. It combines the 'vanishing trapdoor' technique due to Boyen and the 'confined guessing' method due to Böhl et al. For their lattice trapdoors, they use the work of Micciancio and Peikert from Eurocrypt '12. This non-trivial combination leads to a scheme where the signatures are short (consisting of only one vector) at the cost of having keys consisting of a logarithmic number of vectors. They also propose a stateful scheme which reduces the key sizes to a log log number of vectors and also tightens the reduction, removing a factor introduced by the confined guessing stage as well as tightening the approximation factor of the underlying lattice problem. Interestingly, the schemes by Ducas and Micciancio require the additional algebraic structure of ideal lattices, whereas previous works only use this structure for the efficiency improvement.

In conclusion, the result is a new scheme that compares favourably to previous schemes by either allowing for smaller signatures or smaller keys. But things move fast in the lattice world, as there is already a new paper on the ePrint archive that reduces the keys to a constant number of vectors, at the cost of a bigger approximation factor in the underlying lattice problem. It is still possible to choose parameters such that this approximation is polynomial, but it is also possible to pick them less conservatively, resulting in a subexponential approximation factor. It will be interesting to see whether such choices survive future improvements to cryptanalysis.

## Wednesday, October 22, 2014

### 52 Things: Number 3: Computational and storage power of different form factors

**Q3: Estimate the relative computational and storage capabilities of**

- a smart-card
- a micro-controller (i.e. a sensor node)
- an embedded or mobile computer (e.g., a mobile phone or PDA)
- a laptop- or desktop-class computer.

*benchmarking*the performance of different devices on a variety of problem instances---see, for example, CompuBench. Fortunately the range of capabilities of the devices included in the question makes a sufficient answer less dependent on quantitative metrics.

A measure for the storage capabilities of each device is much simpler to find: we can simply compare the approximate number of bytes of information the device is capable of holding on permanent storage.

A smart-card is the least computationally powerful device: obviously clock speeds vary for different implementations, but one might expect to see around a 20 MHz core speed. In terms of storage, a typical smart-card might have around 2 kilobytes (KiB) available.

A microcontroller is "a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals" [1]. The range of storage and compute capability available will vary significantly according to the exact definition of microcontroller, but taking the suggested sensor node as an example, a typical microcontroller is likely to have similar computational capabilities as a smart-card and slightly more storage available, perhaps in the order of a few KiB to a few megabytes (MiB).

A mobile computer such as a mobile phone has significantly more storage and computing power, and the amount of power available is rapidly increasing over time. Taking the 2008 iPhone and the 2013 Nexus 5 phone as an example, the iPhone used a 412 MHz 32-bit RISC ARM core, and the Nexus 5 CPU used is a 2.3 GHz quad-core processor. In terms of storage, if we ignore the ability of some phones to use removable storage, then a high-end phone in 2013 might expect to provide in the order of 16 to 32 gigabytes (GiB) of storage

Finally, most laptop or desktop class computers are likely to have more processing power than a mobile phone: the high-end Intel "Haswell" i7 4960K processor contains 4 cores each clocked at 4 GHz, and the AMD "Piledriver" FX-9590 CPU contains 8 cores at 4.7 GHz---note that a direct comparison between these two processors requires more than just assessing core counts and clock speeds! There are other factors that can affect the computing capabilities of a desktop or laptop computer---in particular, the addition of a graphics processing unit can, for certain problems, provide a large increase in performance. The storage capacity of a laptop or desktop can vary tremendously, but a typical amount of storage in a consumer machine might be between hundreds of gigabytes and several terabytes (TiB)---the largest single hard drive capacities are now around 8 TiB.

- [1] https://en.wikipedia.org/wiki/Microcontroller