Friday, February 24, 2012

Bar-Ilan Winter School on Cryptography - New trapdoors for Lattices. Simpler, Tighter, Faster, Smaller

This week some members of the group have been in Tel Aviv for the 2nd Bar-Ilan Winter School on Cryptography - Lattice-based Cryptography and Applications. The school was really well organized and useful: the programme was diversified (basics and hard problems on lattices, cryptographic constructions based on lattices, recent advances in fully homomorphic encryption), yet coherent and interesting for everyone.

In the session "Trapdoors and Application'', Chris Peikert presented some aspects of the joint work with Daniele Micciancio ([MP2012]) which will be published at Eurocrypt 2012.

The main idea of this work is a new method of generating random lattices  Λ┴(A) (defined by an uniformly random matrix A  in Z_q^{n x m}) together with an associated secret "strong''  trapdoor  ([GPV08]S in Z_q^{m x m}, i.e.  short basis for Λ┴(A).  Such strong trapdoors are then used to give specialized inversion algorithms for the functions

                 f_A (x) = Ax  mod q    and   g_A (s; e) = s^t A + e^t      mod q

where  q=poly(n)  and   x, e are short integer vectors.

Recall that many cryptographic applications need to sample preimages for the SIS function f_A and/or invert the LWE function g_ (Hash-and-sign digital signatures [GPV08], Standard Model signatures [CHKP10, R10Boy10 ], SM CCA-secure encryption [PVW08Pei09], SM (H)IBE [GPV08, CHKP10ABB10a , ABB10b]), and that general methods to solve these tasks are already known ([Bab85] for inverting  g_ and [Klein01Klein00, GPV08, Pei10] for preimage sampling of  f_A) but they are complex and they trade quality for efficiency.

The general idea of the new method of trapdoor generation is the following. They design a fixed well-structured  public lattice defined by a ``gadget'' matrix  G for which the associated functions f_G and g_G admit fast, parallel and high-quality inversion algorithms.
Then they randomize G in two steps:

  1. define a semi-random matrix  B = [ AG] for uniform A in  Z^{n x m'},  m' sufficiently large; 
  2. apply to this matrix  B a  "nice'' unimodular transformation T and let  A =  B T.
 At this point the unimodular transformation  will be a trapdoor for A, and preimage sampling for   f_A and inversion of  g_A reduces efficiently to the corresponding tasks for  f_G and g_G.

It is worth to note that for G=0 we get Ajtai's original method ([Ajt96]) for constructing a random A together with a ``weak'' trapdoor of one or more short vectors (but not a full basis), while the comparison with the GGH approach ([GGH97]) is not appropriate as in GGH the private and public
lattices are the same.

The new trapdoor is an easily generated unimodular transformation instead of a short basis of a lattice as in  Ajt99 and AP09, but is at least as powerful since we can efficiently construct a basis for Λ┴(A) from a basis of Λ┴(G) and T. This new method does not require computations of Hermite normal forms or matrix inverses but only one matrix multiplication and the  inversion operations drastically decrease in comparison with the prior and general algorithms for these tasks.

The slides and the video of  the lecture will be available in the school website

Study Group on Oblivious Transfer

This week study group by Essam and Gareth was on Oblivious Transfer (OT), which is a fundamental primitive in cryptographic protocols. The problem is as follows: we have a sender S and a receiver R. Sender has N items, say M1 , M2 , ..., MN. The receiver is interested to retrieve one of the items, say the item Mi, where the index i ∈ {1, ...., N}. Informally, any such OT protocol, denoted as OT1N, should satisfy the following two properties:

  1. Sender's security (SS): The receiver should not learn anything about the item Mj where j ≠ i.
  2. Receiver's security (RS): The sender should not learn about the index i.         
OT12 is a special form of OT1N where S has only two items. The problem was first formulated by M. Rabin in his seminal work in 1981. Rabin's proposal was for the OT12 (with a slight variation). This was followed by the work of Even et al., Brassard et al. and several other researchers and different variations of the problem, different notions of security and efficient solutions have been proposed over the past three decades. What makes OT so fundamental is that it is complete for any form of secure 2-party computation and in general for any secure n-party computation.

In the study group, we also discussed the adaptive variant of the OT, denoted by OTkN where the receiver can now retrieve k items out of the N items and that too adaptively. This means that the choice of the jth item to retrieve may depend on the previous (j - 1) items which have been retrieved by the receiver. This primitive can be further generalized to OTk x UN, where now U denotes the number of  receivers in the system. In the study group, we discussed the simple case of U = 1, implying that we have only 1 receiver.

Adaptive OT finds motivation from database applications. For example, consider a large medical database, where a user might be interested to retrieve several records and his choice of records may depend upon the records retrieved by him earlier. Structurally, any OTkN protocol consists of the following two phases:
  1. Initialization phase: this phase is invoked by the sender. Informally, here the sender does some form of commitment of his N inputs.
  2. Transfer phase: in this phase S and R interacts where R retrieves an item of his choice. This transfer phase will be executed k times.
In the literature, different notions of the security for the OT protocols have been considered. They are as follows:
  • Security in the presence of honest but curious adversary: here the sender's security (SS) and the receiver's security (RS) is guaranteed only in the presence of a passive adversary, who follows the protocol instruction but tries to gain extra information from the protocol transcript. For example, if the receiver is corrupted, then he will honestly follow the protocol, but at the same, would try to learn about the other items, in addition to the items retrieved by him. On the other hand, if the sender is honest, then he will honestly follow the protocol, but will try to learn about the items retrieved by the receiver.
  • Security with half simulation: here the adversary can be actively corrupted, who can force the corrupted party (either the sender or the receiver) to deviate from the protocol in an arbitrary manner. In this model, the receiver's security (RS) is ensured using the standard indistinguishability based arguments. But the sender's security (SS) is proved by using the simulation based approach, where we compare the ideal OT primitive and the real world OT protocol and show that for every adversary who corrupts the receiver in the real protocol, there exists a simulator in the ideal OT functionality, who corrupts the receiver, such that the view of the adversary and the simulator are computationally indistinguishable. 
  • Security with half simulation is good, but it does not capture all the corruption scenarios. For example, it may be possible that the sender aborts the transfer phase before the k executions of the transfer phase protocol, in a hope that he can retrieve information about the items retrieved by the sender. This is called selective abortion and this behavior by the sender is not captured by the notion of half simulation. To capture this, we go for security will full simulation, where we argue about both SS and RS using simulation based arguments. For this, we further enhance the ideal OT primitive and then the security is proved by showing that for every adversary in the real protocol (who corrupts the sender or the receiver), there exists a simulator in the ideal protocol, such that the view of the adversary and the simulator are computationally indistinguishable.
To get a feel of a  real implementation of the OT, Gareth presented the following simple OT12 protocol due to Even et al.

  • Setting: S has two items M0 and M1. R wants to retrieve the item Mb, where b ∈ {0, 1}.   
  • Initialization phase (executed by the S): S generates an RSA key pair (N, e, d) and randomly generates x0 and x1 (corresponding to M0 and M1 respectively). S then sends the RSA public key e and the random x0 and x1  to the R.
  • Transfer phase: 
        1. R picks a random k and blinds xb  by computing

             v = (xb + ke) mod N.
           R then sends v to the S. In some sense, this k hides the choice of the item
           (of the) receiver from the sender.

        2. On receiving v, S prepares the mask for M0 and M1 by computing the

          k0 = (v - x0)d mod N and k1 = (v - x1)d mod N

          From the point of the view of the S, only one of the k0 or k1 will be equal
          to k. But S could not figure out this. S then masks the item Mi using
          the mask ki as follows:

          c1 = M+ kand  c2 = M+  k2

            S then sends c1 and c2 to the R.

       3. Since R knows the k, he can retrieve Mb by computing

           Mb = c- k.

           R could not retrieve the other item M1-b because he could not retrieve
           the mask k1-b  from c1-b

The above OT12 protocol will be secure in the presence of an honest but curious adversary. Using this protocol as a black-box, we can design an OT1N protocol (proposed by Naor and Pinkas). The protocol was presented formally in the study group and I dont think that many people followed the protocol. To help them, I explain the idea of the protocol using concrete values.

Let N = 4 and hence S has items M0, M1, M2 and M3. The protocol assumes that N is a power of 2 and that is why I have selected N = 4. R will be interested to receive one out of the 4 items. Notice that the index of any item can be represented by 2 bits. For example, 3 = 11, 0 = 00 and so on. During the initialization phase, S will "some-how" encrypt all the items and will send the encrypted items to the R. The receiver will then "magically" retrieve the decryption key, corresponding to the item which we want to retrieve. Interestingly, this will be done in such a way that R will not learn the decryption keys corresponding to the other items. And at the same time, S will not learn which decryption keys are retrieved by the R. This can be viewed as some sort of "assisted decryption" service from the S. And this is the common principle underlying all the OT protocols.

In the current example, to do the encryption, S will select 4 PRF keys, namely
k0, 0,  k0, 1,   k1, 0  and k1, 1 , for a PRF, say F. The interpretation is that the key
 kb, j will be used for encrypting the item, whose index has bit b at position j in its binary representation, where b ∈ {0, 1} and j = 0, 1. This is demonstrated in the following table:

Item Index Binary Representation of the Index Keys Used for Encrypting Encryption of the Item
M0 0 00 k0, 0 and k0, 1 C0 = M0 ⊕ Fk0, 0 (0)⊕ Fk0, 1 (0)
M1 1 01 k1, 0 and k0, 1 C1 = M1 ⊕ Fk1, 0 (1)⊕ Fk0, 1 (1)
M2 2 10 k0, 0 and k1, 1 C2 = M2 ⊕ Fk0, 0 (2)⊕ Fk1, 1 (2)
M3 3 11 k1, 0 and k1, 1 C3 = M3 ⊕ Fk1, 0 (3)⊕ Fk1, 1 (3)

 During the initialization phase, S will send the cipher-texts C0, C1, C2 and C3 to the R. To retrieve an item, it is enough for the R to retrieve the keys, corresponding to the binary representation of the index of the item, which he wants to retrieve. Note that R needs to retrieve 2 PRF keys to retrieve an item and this will be done by invoking two instances of OT12 protocol. The inputs for the first instance will be the PRF keys k0, 0 and k1, 0 while the input for the second instance will be the PRF keys k0, 1 and k1, 1. 

Suppose R wants to retrieve the item M2. Then it is enough for R to retrieve the keys k0, 0  and k1, 1. So from the first instance of the OT12 protocol, R retrieves    the PRF key k0, 0  and from the second instance of the OT12 protocol, R retrieves  the PRF key k1, 1. Now R can unmask C2 and recover M2.

On a high level, the receiver's security (RS) follows from the security of the underlying OT12 protocols. The sender's security (SS) follows from the pseudo-randomness of the PRF, the security of the OT12 protocols and the basic fact that any two indices differ in at least one position in their binary representation. So for every item other than the item retrieved by the R, there will be at least one PRF key, which will be unknown to the R.

It is easy to see that the complexity of the above protocol is very high. It requires N log N evaluation of PRFs in the initialization phase and log N invocations of the OT12 protocols. Using more sophisticated primitives like blind signatures and variants of Elgamal encryption scheme, it is possible to design more efficient
OTk x UN   protocols, which was briefly discussed at the end of the study group.

An interesting problem is to design efficient UC-secure OT protocols based on static assumptions. There exist UC-secure OT protocols for certain settings, for example non-adaptive adversary, based on strong assumptions, etc. But getting an efficient UC secure protocol against an adaptive adversary, based on standard assumptions is an interesting problem.

Bar-Ilan Winter School on Cryptography ---- Day 1

The first day of the 2nd Bar-Ilan Winter school on Cryptography featured lectures by Oded Regev, Vadim Lyubashevsky and Chris Peikert. Oded gave an excellent introduction to the lattices, stating various properties of the lattices, the definitions of the "so called" hard computational problems like the Shortest Vector Problem (SVP), Shortest Independent Vectors problem (SIVP) and the Closest Vector Problem (CVP), along with the summary of the known results related to the computational aspects of these problems. This first lecture laid a great platform for the remaining talks of the day.

The second talk by Vadim gave an introduction to the Smallest Integer Solution (SIS) problem, which is as follows:

given n random vectors a1, ...., an, we have to find out a "non-trivial" solution z1, ...., zn, where each Zi belongs to {-1, 0, 1} such that, the following condition holds:

                 z1 a1 + z2 a2 + .... + zn an = 0 mod M

In some sense, the above problem can be viewed as an instance of the sub-set sum problem. Vadim then showed the relationship of SIS to the Lattice problems and presented a very simple collision resistant hash function, whose security is based on the difficulty of solving SIS. He then talked about the properties of the Gaussian distribution, which is very much used in al most any lattice based crypto primitive.

The last two lectures of the first day were given by Chris Peikert. His first lecture was on the introduction to the Learning with Errors (LWE) problem. This problem is a fundamental problem in lattices and used in many popular lattice based crypto primitives. The problem first introduced by Oded Regev can be viewed as a complementary problem to the SIS. The problem is as follows:

given random vectors a1, ..., an and the products b1, ...., bn, where each bi is of the following form:

                           bi = (ai, s) + ei,
 where (ai, s) denotes the dot product of ai and s, s is a secret and each "ei" is a small "Gaussian" noise (error), it is hard to find the secret s or the error vectors e1, ...., en. Anyone who is familar with Coding theory can easily see that this problem is related to the general problem of decoding, where ai's constitute the generator matrix of a linear code and s constitutes the message and ei's constitute the error vector and we have to find out the original message s. The LWE problem can be veiwed as a special form of the general decoding problem.

One very interesting property of the LWE problem in contrast to the classical hard problems (like CDH and the DDH problem) is that the search and the decision version of the LWE problem are equivalent. The decision version of the LWE problem is that we cannot distinguish between the pairs (a1, b1), ..., (an, bn) where bi's are random and the pairs (a1, b1), ..., (an, bn), where bi's are now of the form (ai, s) + ei. Chris then very nicely explained the Regev's public key crypto system based on LWE.

The second talk by Chris presented a very nice and new application of LWE, namely the construction of a PRF using LWE. This result is going to be presented in Eurocrypt 2012. Actually Chris has three papers in Eurocrypt 2012, related to the lattices and he presented two of them during the four days of the school. Though I did not completely understood the construction but I got a feel of the underlying idea and I found it very intersting. Goldreich, Goldwasser and Micali (GGM84) in their seminal work showed how to construct PRFs based on PRGs. However the algorithm requires k iterations for a k-length input. Naor and Reingold showed how to construct PRFs using synthesizers. Informally, a synthesizer S is a deterministic function from D x D to D (here D is some finite set or a domain), such that for any a1, ..., am and b1, ..., bm, selected uniformly from the set D, for some polynomial m, the m^2 outputs

                        {S(ai, bj)}

should be pseudorandom. Given a synthesizer, we can construct a PRF by using recursion as follows: let x = x1, ..., xk be the input for the PRF. Corresponding to each xi, we will select random si, 0 (corresponding to xi = 0) and si, 1 (corresponding to xi = 1). Now to get the value of the PRF for x1, ...., xk, we will construct a sort of binary tree. At the leaf levels, we have k nodes, corresponding to x1, ..., xk. At the next higher level, depending upon whether xi = 0 or xi = 1, we place si, 0 or si, 1. Then at the next higher level, we apply the synthesizer S on the elements at the lower level as follows: the first synthesizer S will be applied on s1, * and s2, * (here * will be either 0 or 1, depending upon the input for the PRF). The second synthesizer will be applied on s3, * and s4, * and so on and the last synthesizer will be applied on sk-1, * and sk, *. We recursively apply this logic to the next higher level, where we apply a synthesizer to the two childrens at the lower level and like this we reach the root level, whose value is the final output of the PRF. This construction is somewhat similar to the one used while constructing the Merkle tree. The pseudo-randomnss of the final output follows from the properties of the synthesizers. The idea used in the paper by Chris is to construct synthesizers based on LWE.

The slides and the videos of all the lectures will be available in the school website and I think that anyone who wants to get an introduction to the Lattice based cryptography should look into these materials. I also have the print outs of all the slides and if anyone is interested to look at them, I would be very happy to give the print outs.

Friday, February 17, 2012

Study Group: Measuring Side Channel Leakage from Web Applications

This weeks's study group was lead by Elisabeth and Luke, and focused on the side channel leaks from web applications.

First of all, a web application is a two-part program running in a browser (client side) and on a web server (server side). It can also be regarded as a finite state machine that changes from one state to another according to user input and computations that are processed by the server. The communication between the client and the server therefore plays an important role, however it could also lead to substantial information leakage.

Let's illustrate this with an example, say online tax payment. The classic scenario is that a user sends a request to the server, the sever then responds with an address where a form can be found, the user fills it in and sends it back to the server, and finally the server does the computations and responds with a new page according to the information that has been previously provided. Note that these steps can be repeated any number times, according to the particularities of each user and the detail depth that is required. Let's assume the first form the user had to fill in contains a question about the income, with a two choice answer: under 40k, which would imply no tax, and over 40k, which implies taxation. Although the communication is itself protected by protocols such as TLS/SSL, information about the user's choice and thus their income can still be inferred, since an attacker (eavesdropper) can get access to the state the web app goes into by analysing the traces, i.e. the size and number of packets that are exchanged between the client and the server. Basicly, these amounts differ for the two choices: the over 40k option is one byte larger than the other. Also, the size of the webpage response differs depending on the user input, so the number and size of packets originating from the server will again be higher, respectively larger. One way to protect against this would be randomizing packet sizes. Another option would be to increase the entropy (number of available choices), but this might further imply a usability cost. 

Other real life examples include healthcare,  investment or websearch. The first paper we looked at, "Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow" (Chen et al., May 2010) [pdf] reports some information leaks in several real-world web apps under pseudonyms.

The challenge of this paper is assessing how vulnerable a web app is, and clearly the main criterion so far is the distinguishability of the two options. A first attempt is introducing a rough heuristic, namely the density of a trace which is given by:

density(T) = |T| / (max(T)-min(T))

The same authors come up with a better attempt at quantifying in "Sidebuster: Automated Detection and Quantification of Side-Channel Leaks in Web Application Development" (Chen et al., Oct. 2010) [pdf]. Here they introduce a tool designed to work out which inputs leak information, and quantify entropy loss after an observation, where entropy is a measure of uncertainty of a random variable.

Going back to our example, supposing the user's income is larger than 40k, by selecting the appropriate option the web application has now transitioned from state S to state S' where the user again has to answer a question with two possible answers, e.g. "Are you self employed?". The entropy before SCA is H(A)=1, and the conditional entropy (after observing) H(A|O)=0, i.e. no remaining secrecy. Still, even assuming that the attacker can see the same observation O1 for whatever choice the user selects, if he sees O1 twice he will know which path has been selected and thus again gain access to information.

However, their white-box approach for leakage quantification requires that all traffic data is invariant, i.e. the complete input space is known.  In the black box setting for an attacker in practice this isn't usually going to be the case, and hence measurements he/she can make are likely to be noisy.

Finally, the third article we discussed was "Automated Black-Box Detection of Side-Channel Vulnerabilities in Web Applications" (Chapman and Evans, Oct. 2011) [pdf]. Here, the authors describe a black-box tool for detecting and quantifying the severity of side-channel vulnerabilities by analyzing network traffic over repeated crawls of a web application.

Four distance metrics are introduced, namely:
  • Total-Source-Destination - returns the summed difference of bytes transferred between each party, i.e. the difference in the number of bytes transfered to the client, added to the difference in the number of bytes transfered to the server.
  • Size-Weighted-Edit-Distance - adds robustness by tracking the control-flow of the transfered information. Every transfer is treated as a symbol in a string of the sequence of transfers, thus the sequence of the transferred data matters.
  • Edit-Distance - similar to previous metric; reveals how well an attacker can do against a perfect packet-padding strategy.
  • Random - serves as a baseline in order to judge the distinguishability gained from the distance metrics beyond the assumption that the adversary can distinguish page breaks.

Sunday, February 5, 2012

Secure metrology: From theory to practice

Here's another blog post about a talk at the "Is Cryptographic Theory Practically Relevant?" workshop. "Secure metrology: From theory to practice" by George Danezis was in the very spirit of this workshop. Danezis works for Microsoft Research and talked about his experiences from working on a current hot topic: smart metering.

Metrology is the science of measuring things, and the question here is whether this can be done securely. The main use case is electricity meters, which are planned to be substituted by smart meters in the near future in the US and in Europe (which includes the UK). These would allow to meter remotely, in more fine-grained intervals (which could give incentives to avoid peak times) and give the electricity provider a complete profile facilitating consumption forecasts.

The main challenge is privacy: if the current energy consumption is sent in real time and unsecured (as happened in Italy) this might be interesting information for burglars for example. While this can be achieved by encrypting the information sent to the provider, the consumption patterns of particular users should also be hidden from the provider.

Employing available cryptographic tools, solutions addressing all these concerns can be constructed, and even more: George talked about a scheme, where using e-cash, the providers would not even know how much a user pays, but still be assured that the user does pay his bills. More efficient (and realistic) schemes combine encryption, signatures and zero-knowledge proofs to ensure privacy and assurance.

But as often, what's possible in theory is not always applicable in practice: smart meters must be cheap (less than 20$), and so have very limited computational power; moreover, existing standards must be respected, which is increasingly important as the industry becomes more diverse with new suppliers and distributors.

Another issue is user self-determination, that is, the users should have control over their data and be able to give it to third parties. And of course there is the question of trust: do I trust my device to do the zero-knowledge proofs correctly? All in all, as there are so many goals to be achieved simultaneously, smart metering will certainly remain a hot topic at the confluence of theory and practice.

Friday, February 3, 2012

Analysis of Cryptographic Security APIs

During the last session of the first day of the workshop "Is Cryptographic Theory Practically Relevant?", Graham Steel (ENS Cachan) gave a talk titled ``Analysis of Cryptographic Security APIs``.

The talk focused on the security analysis of RSA public key Cryptography Standard (PKCS)#11 which is the most popular used standard for designing cryptographic device interfaces. Tamper-resistant devices such as USB keys and smartcards are often employed in insecure environments to carry out secure operations such as enabling authentication, protect sensitive data or facilitate secure login.  

Graham first gave a brief introduction to PKCS#11-based API. In a PKCS#11-based API, an host machine interacts with a cryptographic device and the sensitive cryptographic keys stored on the tamper-resistant support are supposed to remain protected  even when the host machine is compromised. Any application is supposed to access the objects stored on the device in a guarded way. Indeed, an object is referenced by means of an handle whose 
value does not reveal any information about the object. Objects also come with attributes
describing properties of the object. Moreover, new objects can be created by calling a key generation command and attributes may be set by calling a set attribute command.
When a function is called to use a given object,  the device first checks that the attributes of the object allow the function to be executed. Two further functionalities are considered: Wrap and Unwrap keys. Those functionalities are fundamental for the way PKCS#11 deals with key management, indeed they are used for secure transport of keys between devices or to protect them while in untrusted storage. 

Subsequently, Graham outlined the few hints the standard provides about PKCS#11 security. The standard states that in order to provide protection to the secret keys there are some special attributes: ``sensitive`` and ``extractable`` attributes. The standard requires that to protect a key from being revealed, the attribute sensitive must be set to true. Moreover, once an attribute sensitive is set to true, it cannot be reset to false.  Objects whose extractable attribute is set to false cannot be revealed out of the device even when encrypted. Also, once set to false the extractable attribute cannot be set to true.
Unfortunately, such an informal formulation of the security requirements allows to set key attributes in such a way that sensitive keys become compromised as shown by the attacks  proposed by Clulow in Ches 2003  and those reported by Delaune et al in CSF 2008.

Graham told to the audience that his team contacted manufacturers with the hope of cooperating on analyzing real implementations, but the responses were pretty disappointing. Basically, manufacturers answered that there was no need to be worried since despite the fact that PKCS#11 may be insecure if implemented in a naive way, they have expert working on this issues.  Graham's team at this point had two possible choices: drop the subject or try to make a tool in order to find vulnerabilities on real devices. Obviously, they made the second choice and built the tool called ``Tookan``.

Tookan first extracts the functionalities of the device by means of a reverse engineering process. The results is written in a meta-language for PKCS#11 models that Tookan uses to generate the input for a model-checker. Finally, the model-checker output is sent back to Tookan for testing it directly on the device. 

In the second part of the talk Graham showed a demo of Tookan. He run the tool on an Aladdin eToken PRO. 
While waiting for the tool to process, he pointed out that the reverse engineering process is not complete. On some instances, it could result in a too restrictive model to be able of finding an attack. However, experimental results show that Tookan performs pretty well. Indeed, it was able to find several real attacks on flawed devices. As for the possibility of proving that a configuration is indeed secure, Tookan also provides the option of performing an over-approximation for the generation of fresh handles and keys (whose number may be unbounded). This over-approximation allows Tookan to show some configurations to be actually secure but there are other configurations for which the tool was not able to show correctness.  

Afterwards, Graham showed experimental results from using Tookan to find attacks on commercially available devices. For each device he shows a summary of the configuration information obtained by means of the reverse engineering module and a collection of attacks found. From the results can be noticed that every device providing the wrapping of sensitive keys  functionality, is also vulnerable to attacks while all the other devices avoid such attacks at the price of removing such functionality. 

Finally, Graham reported about manufacturer reactions. RSA registered vulnerability with Mitre and issued security advising the same day Graham was presenting the work at CCS 2010. Alladin and Gemalto sent responses for website. They, essentially, claim that the vulnerabilities found, are not relevant for their costumers. Anyone else sent minimal responses such as requesting to know who else has vulnerabilities. On the positive side, Tookan is now being used by Boeing and a major UK-based bank. 

Recently, Tookan has been deployed in testing PKCS#11-compatible HSMs that have a more sophisticated attribute policies than other devices. Still Tookan found many attacks. 

After thanking the speaker, the audience raised several very interesting questions that yield discussions ranging from the limitations due to the kind of abstraction Tookan performs, the possibility of using theorem provers and the amount of effort needed to generalize Tookan in order to verify different APIs. Such discussions along with the entire talk can be listened online at

Cars and Voting Machines: Embedded Systems in the Field

The second talk on the final day of the “Is Cryptographic Theory Practically Relevant?” workshop  was about exploiting some weaknesses in embedded systems used in some real-life applications and was given by Hovav Shacham. The talk summarizes the result of a comprehensive study undertaken by many researchers and its aim was to investigate and exploit vulnerabilities in embedded systems used in real life; in particular, the study was concerned with automobiles and voting machines.

All modern cars rely on computerized units to achieve different functionalities related to efficiency, entertainment, safety and security. Although there exist attacks which require access to the car, the study was more concerned with vulnerabilities which could be exploited remotely (i.e. without requiring physical access to the car) via different communication channels such as Bluetooth, WiFi and other channels which are used for instance for crash warning etc.

The team involved in the study developed a software that listens to communication between the different components in the car or between the car and the outside world, and then injects some messages to force some malicious behaviour. Exploiting some vulnerabilities, the study showed that an attacker could take complete control of many components in the car. Such vulnerabilities allowed the attacker to, for instance, disable the brake system, control the readings of the speedometer and lock the driver in.

The study also investigated some vulnerabilities in the context of voting via voting marines. Such machines are required to incorporated some basic checks to ensure the correctness and the fairness of the outcome of a vote. For instance, a mechanism to ensure that a voter can only vote once is required.
The attacks mounted only required access to the machines themselves and did not require access to neither the underlying source code nor the vendor's manual.

Those vulnerabilities are serious because of the communication capabilities which could affect not only a single car itself but an entire network.

Most of the vulnerabilities are an obvious result of the designers failing to understand cryptography and therefore resulting them getting even the simplest requirements wrong. Besides other security flaws, the attacks exploited the absence of proper communication authentication and the lack of sufficient mechanisms to protect the memory content of the computerized components.

A long answer to the simple question, "Is TLS provably secure?"

On Tuesday afternoon Tom Shrimpton gave a talk on the security of TLS [video]. This was of particular interest to me because it was a protocol I had spent some time studying during my PhD. TLS or Transport Layer Security is the most widely used security protocol on the Web; most famously being used to secure on-line credit card purchases.

In the first part of the talk Tom presented some joint work with Kenny Paterson and Tom Ristenpart which was published at Asiacrypt last year [pdf]. He started by posing the question: "Is TLS provable secure?". The answer to this was yes but with three caveats: 1) w.r.t the right model, 2) with some correctly chosen parameters, and 3) sometimes this isn't the correct question to ask.

The TLS record layer is responsible for ensuring the confidentiality and integrity of messages being sent. It follows a MAC-then-encode-then encrypt (MEE) structure. What is meant by this is that first a MAC is calculated on the message and then appended. Next this concatenated message is encoded by adding some additional padding bytes. Finally the message is encrypted; the recommended encryption scheme is CBC mode. Previous attacks on MEE type schemes have shown vulnerabilities in the composition especially when CBC mode is the preferred choice of encryption scheme. In particular the padding oracle attack of Vaudenay [ps] and its later extension in the attack on TLS by Canvel et al. [ps] exploit the timing difference between padding errors and MAC errors in order to recover plaintext. Tom showed that in TLS whenever the combined size of a message and tag is less than n-8 (when the block-size is n) with variable length padding we can always perform a distinguishing attack in an Authenticated-Encryption (AE) security model where the adversary is given access to a left-or-right (LOR) encryption oracle.

Previous work has been performed on examining the security of TLS by Krawczyk [pdf] and, Maurer and Tackmann [pdf] but it is not clear exactly how these results apply to TLS in practice. Krawczyk's analysis fails to consider padding and Maurer and Tackmann's work provides security based on a "secure" channels notion but only when minimal padding is used. The new security model presented in the talk was for Length-Hiding Authenticated Encryption (LHAE) thus allowing the study of variable-length padding. In this model the adversary now no longer needs to send two messages of equal length to the LOR encryption oracle as is the case in the normal model. Furthermore the following relationship can easily be proved (this is an extension of the standard relationship for IND-CCA security proved by Bellare and Namprempre [pdf]):

Now to prove the security of TLS:
  • The IND-CPA of CBC implies the LH-IND-CPA of TLS.
  • We also know that any MAC-then-Encrypt scheme is INT-PTXT secure by the results of Bellare and Namprempre but this does not imply INT-CTXT.
  • It therefore remained to find what the missing notion was that would give INT-CTXT security.

This missing notion was Collision Resistant Decryption (CRD). The theorem presented states that if
  • the block cipher is a prp with block size n,
  • the MAC is a prf with tag size t, and
  • for all messages M we have |M|+t ≥ n.
then MEE using the TLS pad and CBC mode is CRD-secure (with a birthday type bound).

Putting all this together proves that TLS is LHAE secure.

In the second part of his talk Tom presented some work of an even more applied nature. Here he described joint work with Dyer, Coull and Ristenpart, entitled "Peek-a-Boo, I Still See You: Why Traffic Analysis Countermeasures Fail". The scenario here is that a user wants to hide which websites he visits by using a proxy. A provably secure AE scheme (such as TLS) is used to create a secure connection between the user and the proxy and the hope is that an adversary will not be able to determine the websites visited by eavesdropping on the traffic sent between the user and the proxy. In practice however, this is very hard to achieve. In 2006 Liberatore and Levine [pdf] showed that it was possible to correctly identify packets based on their size alone with 68% accuracy. In the new work presented by Tom they show that it is very easy for an adversary to identify which websites the user is visiting in the above scenario. In fact even with the current countermeasures websites can still be identified easily. With the obvious countermeasure of fixing the size of each packet to be equal this reduced the attackers accuracy to 6% but introduced a 365% overhead. This renders this countermeasure completely useless in practice. To prevent this type of traffic analysis it would be necessary to hide three things: download time, bandwidth and bursts of packets. The current countermeasures preventing analysis of these are all highly inefficient.

Study group: Virtualisation

Today's study group was led by Dan and Marcin on the topic of virtualisation: the creation of virtual environments to emulate the physical environments on which software is designed to run. In a traditional computer system, application multi-tasking is enabled by the OS, a software layer in between hardware and applications which manages shared resources and protects applications from one another. Modern virtualisation is the natural next step: allowing multiple OS's to share the same hardware (whilst remaining isolated from one another). This is particularly desirable in the context of cloud computing, facilitating dynamic resource provision which is highly flexible to current demand and permits sales in small units as multiple customers can share the same hardware resources.

However, it isn't trivial because each OS has the same 'ring zero' privilege level, so how does the server now decide which applications get what, and when? The solution to this is to insert another layer in between the hardware and the multiple operating systems, called a virtual machine manager (VMM), which is endowed with 'ring -1' privileges and tasked with the responsibility of allocating resources between OS's and of isolating them from one another. In particular, no VM should be able to access the data or software of another VM (either directly or via a side-channel) or to affect VM availability.

Economic motives have driven the increasing trend towards virtualisation, but whilst it makes 'good business sense' it introduces novel security problems which need to be understood and dealt with. The VMM is designed in such a way as to protect a VM from other potentially malicious VMs on the same server, but how can we be sure if this objective has been achieved?

The first paper we looked at, "Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds" (Ristenpart et al., 2009) [pdf], explores the possible vulnerabilities of virtualised cloud services by way of a case study. Amazon EC2 has three levels of infrastructure: a region (e.g. US, Asia, etc), an availability zone (i.e. a data centre), and an instance type (e.g. Linux 32-bit). The user creates a VM image and asks Amazon to 'run' it, at which point it is placed onto a physical server (and acquires an internal and external IP address and domain).

The authors describe the three-part challenge facing a would-be attacker:
  1. To 'map' the address space of the cloud and instantiate a VM on the desired machine (i.e. the one hosting the target VM).
  2. To check for co-residency (i.e. confirm desired placement).
  3. To attack the target VM (for example, extract information).
Their subsequent investigation uncovers possible strategies for acheiving each of these goals, and leads naturally to straightforward recommendations for improved security:
  1. The address space can be 'mapped' simply by launching VMs with different parameters and seeing what IPs they are assigned. It turns out that 'similar' requests are placed in 'similar' areas of the map, so that carefully chosen parameters can increase the probability of being placed on the same server as the target VM - even more so if the launch can be timed to coincide. (Non-static IPs would remove the possibility to do this).
  2. The most conclusive check for co-residency is by observing the 'Dom 0' address of a packet sent to or from the target (each VMM has an IP address associated which will be attached to any packet passing through it). Alternatively, round-trip times for packets sent to the target can be measured: if these are small (or, if they are similar to those of packets sent from the adversary to itself), co-residency is likely. Lastly, numerically close IP addresses can also be an indicator. The authors suggest security policy mitigations to increase the challenge to the attacker.
  3. Having achieved and confirmed co-residency, the attacks they suggest relate to previously-discovered micro-architectural strategies, such as side-channel leakage from shared cache memory (see, e.g., "Cache-timing attacks on AES" (Bernstein, 2005) [pdf].
The second paper we looked at, "NoHype: Virtualized Cloud Infrastructure Without the Virtualization" (Keller et al, 2010) [pdf], proposed to mitigate the vulnerabilities of cloud infrastructure by getting rid of the virtualisation layer. They justified this by observing that the increasing trend in the number of cores-to-a-processor means the benefits of fine-grained provision will no longer rely on the ability to put multiple VMs on a single core. They thereby set out to demonstrate that some of the tasks performed in software by the VMM could be equally well performed in hardware, highlighting ways in which existing technology could be tweaked to achieve this as well as identifying gaps which would require new technologies to fill.

Theory and practice for hash functions

This week many members of the group have been in Cambridge for the Newton Institute workshop titled 'Is Cryptographic Theory Practically Relevant?' The workshop was my first encounter with the wider cryptographic community and I thoroughly enjoyed the talks and the atmosphere during my time at the Newton Institute. I particularly enjoyed Bart Preneel's discussion of hash functions [video], with particular focus on the SHA-3 competition. Hash functions are crucial to many cryptographic applications, so naturally the competition is being closely followed by many in the cryptographic community, but it was noted that the amount of active researchers in the field is surprisingly low, particularly when compared to the AES competition.

There are many requirements of a good hash function, but there is still a discussion on exactly what properties the winning entry should satisfy. It is vital that the security definitions are formulated correctly and that they accurately reflect practical needs, particularly when the hash function is iterated. In 2004 Maurer et al. [pdf] introduced indifferentiability as a way of using random oracles as a proof method, however in 2011 Ristenpart et al. [pdf] showed that this is not always enough when composition is concerned, and in keeping with the theme of the workshop the the gap between the theory and the practical applications was in fact considerable.

In 2008 60 submissions were made to the SHA-3 competition, and the winner from the final 5 entrants will be picked at some point in 2012. There are many attacks on Merkle-Damgard constructions [pdf] [pdf], so designing a new construction method has been a major feature of the competition so far. Bart Preneel noted that his student Bart Mennink has made some considerable progress in understanding the various constructions including the sponge [details], and more work in this area will help the community provide more precise and effective security definitions and create tight security reductions from which meaningful conclusions can be drawn. The point was made that industry was advised to stop using MD5 in the mid-1990s but this advice was largely ignored until 2009. Many papers attacking SHA-1 have been withdrawn, and in the community it is not an acceptable suggestion that this surprising lack of progress is due to the strength of the hash function and in fact more work needs to be done. These points were very effectively portrayed with images of the Golden Gate bridge shrouded in varying levels of fog.

Work analysing the SHA-3 candidates was highlighted ([pdf] , [pdf] , [pdf] ) and the concluding remarks of the talk noted the open questions such as:
  • Are standard model security proofs possible, and if so how will the new proof approaches help the rest of the cryptographic literature?
  • Are improvements to the current indifferentiability bounds possible?

Mathematical and practical security problems not directly related to cryptography

The third talk of the first day of the workshop "Is Cryptographic Theory Practically Relevant?" was given by David Naccache (a video of the talk is available online). In his abstract, he noted that the talk was intended for people bored by 'random oracles, lattices, security reductions and games'. As described by the title, the problems he discussed were not directly related to cryptography, but rather nice mathematical problems that arose from a number of practical security problems.

The first problem had to do with the depiction of data on radar charts. The point was to rearrange the axes of the chart to maximize the area depicting the attributes of ones employer, while minimizing the area depicting the attributes of their competitors. This problem was probably the least related to security, but a general theme throughout his talk was that many mathematical problems are interesting; you just need to convince people to give you funding for looking at them.

The second problem was related to physically protecting electronic devices or chips from tampering by embedding them in a 3D mesh of metal. This lead to the study of random Hamiltonian cycles in 3-dimensional planar graphs of a cube-like structure. The goal was to end up with a 'cage' with has the device embedded in the center, such that it is hard to drill towards the device. However, due to the particular structure of the graph, a single cage would always leave regularly spaced holes in between the edges. Fortunately, it is possible to interleave a smaller cage inside the bigger cage, thus neatly filling up the holes. These cages were constructed taking a regular Hamiltonian cycle in the graph and then transforming them randomly to get rid of the regular structure.

David went on to talk about the third problem, which arose from the probing of chips. It is possible to probe chips with special machines, in order to analyze what exactly happens on these chips. To somehow defend against this, David and his students looked at the problem of physically 'scattering' a secret on the chip, thus maximizing the physical distance between parts (such as shares from secret sharing). The chip was modeled as an n by n chessboard and the shares of different secrets were modeled as colored pawns. You have pawns of n colors and must place them on the board such that pawns of the same color are as far from each other as possible. This problem turned out to be related to the mathematical theory of tiling. David noted that the solutions obtained would always be periodic, which might be undesirable from a security point of view. Thus, some care must be taken to make periodic solutions non-periodic, which possible reduces the distances between pawns of the same color slightly. As someone in the audience pointed out, it is possible to extend this model by adding colorless pawns to model unimportant information that does not need to be scattered.

At this point, David was running out of time and offered to take questions. However, his next subject was titled "How to identify quickly" and the audience encouraged him to use the remaining two minutes to show how quickly one could identify. He used these last few minutes to describe how to speed up comparisons of a biometric scan to a database of biometric data for identification purposes. The trick was to add additional data such as the person's height and then sort the database entries by the distance from the measured height to the recorded heights. This could be extended to several types of data (he gave height and IQ as examples because they were independent) but this might require some extra statistics to remove dependence between these attributes. Unfortunately, he really was out of time at this point and did not get to talk about other problems that he had prepared. More information can be found once his slides become available, but he said that if people were interested in these problems they should send him an email as well.

The practical application of cryptography to international card payments

The following is a summary of the talk "The practical application of cryptography to international card payment" given by John Beric and Mike Ward from Mastercard International on day 2 of the workshop. The core topic of the talk was EMV, a global system for debit and credit card transactions based on smart cards.

After arguing for the necessity of card payments, the first speaker made it clear that the real world does not always implement what cryptographers like to see because of the long life cycles of the system. Then, he outlined the history of distance payments, in the form of cheques, magnetic stripe card transactions, and offline EMV transactions. Cheques provide physical evidence and are hard to forge, but they are slow to process and the signature check cannot be made at the time of the transaction. The only advantage of magnetic stripe cards over cheques is the faster processing, at the cost of no physical evidence and little protection against copying of the card. To make these acceptable to customers nevertheless, the banks operate a policy of zero liability on the side of customers, which in turn the facilites fraud.

Finally, offline EMV transactions offer evidence by a digital signature, and it is believed that the cost of breaking the scheme exceeds the possible profits. This contrasts the case of magnetic stripe cards, where the investments required to copy a card are rather low. Nonetheless, the world of EMV transactions is not perfect as there might be corrupted terminals intercepting the PIN or not displaying the correct amount of a transaction on the display. There are plans to mitigate this by employing the users' smartphones as part of the trusted computing base. But of course, cards and PIN can be stolen as well as smartphones.

The second speaker highlighted some technical details. The central theme in the implementation of EMV seems to be that the specification recommends a high level of security; however, the recommendations are not always followed in favour of the flexibility of a globally employed system. In some regions for example, the majority of cards do not support digital signatures but use a so-called static signature, which only provides authentication of the card instead of authorisation of a specific transaction. In the same spirit, the failure of a cryptographic check may not lead to the abortion of a transaction because a part of the system might simply not support the necessary cryptographic schemes. While the specification recommends the use of AES, implementations mostly use 3DES and even proprietary ciphers and signature schemes.

The talk was concluded by an outlook to a future specification, where elliptic curve cryptography (ECC) will be adopted. An open question related to this is whether Schnorr signatures will be used. In the question session, a member of the audience suggested that the emphasis should be on the simplification of the historically grown specification instead of the introduction of ECC, which he accounts more to the economic interests of the patent holders involved.

Cryptography with Work-based Corruptions and the Combinatorics of Anonymity

The third talk on Wednesday was given by Aggelos Kiayias (University of Connecticut). Protocols in the MPC setting  are designed to allow a group of n players compute a function on n arbitrary inputs. The protocol must at least ensure input privacy and output delivery to the clients. An adversary willing to make the protocol fail, can simply  break the assumption which relies on. But most likely it will try to corrupt more than t players (for some security threshold t). How is this corruption actually performed? Theoreticians just assume this is done instantly and comes at no cost. The reality is rather different: each player may have different resources or security parameters. 

The question Aggelos threw was whether it is possible to come up with an appropiate model. He proposed to use resource-based corruptions, where a symbolic amount of coins (or resources) is assigned to each player, e.g. an adversary will spend less time breaking weak passwords in dictionary attacks. The adversary can corrupt the player if it spends at least s coins in such operation. Things are more complex when all the players look identical for external observers (hidden diversity). It can be achieved by e.g. randomly permuting the position of the players in the cloud. 

He explained a game to measure the amount of coins that the adversary will need in order to corrupt (more than t) players. This is what he called 'Combinatoric Game of Anonymity': we have a system conformed with buckets and balls. Buckets are players, and their size is the number of balls required to fill them up (corrup them). Assuming the only feedback we get from the system is whether or not a bucket is filled, how do we know the amount of balls required to collapse the entire system? Clearly if we do know the exactly amount of balls, then we are back in the classical situation, and the players can be considered to be instantly corrupted by a sufficient enough resource-powered adversary. There are intermediate states of ignorance, though. For example the adversary may only know  n and that some (unknown) bucket has size s (size-only), or n and the maximal size ( max-only).

Another interesting question was whether we can discretize the amount of work needed to break a cryptosystem. He explained the concept of exact hardness for some threshold value epsilon. It is used to compare functions according to their inversion ( gives the number of steps needed to succesfully carry out the compilation with probability greater than epsilon). How easily can the exact hardness be computed? Bounds can be provided in the random oracle model, and ranges in the standard model.

 There are several interesting properties. For example, given a family of functions, find a value T such that it is not posible to amortize the exact hardness of any function in the family beyond T ( that is, the exact hardness of any function in the family for (any given arbitrary) epsilon, is greater than T times a finite linear combination of the exact hardness of arbitrary functions in the family). Another property is the indistinguishability, when an adversary can not infer which function is better to attack first.

Wednesday, February 1, 2012

Is Cryptographic Theory Practically Relevant? (Day 1)

Yesterday the workshop wondering about the practical relevance of cryptographic theory started at the reknowned Newton Institute in Cambridge. There was a staggering number of participants including people from both industry and from academia, covering a vast range of cryptographic theory and practice.

The workshop started with Serge Vaudenay's talk ominously titled "Privacy in Deniable Anonymous Concurrent Authentication with Setup is Impossible: Do we Care?". His choice of title had already received some critical comments in the comments section of Jon Katz's blog, so I was looking forward what the title referred to. Before Serge came to the main topic so to speak, he discussed another issue that fits very well with the Dagstuhl seminar from two weeks ago.

Last Eurocrypt he and his coauthors looked at statistical attacks on RC4.There they estimated certain complexities using assumptions about certain variables being independently distributed in some way. Since then, they have tried to validate these assumptions in practice by running experiments. It turned out that the variables behaved quite differently then assumed and expected, so a different model (related to modelling tornados) was needed. Turning the workshop's theme on his head, Serge concluded that practice can be relevant for theory.

He continued to talk about deniability in certain protocols. His main point was that a notion like deniability is extremely brittle. It is known how to achieve deniable authentication in the standard model, but as soon as a random oracle is added into the mix, the deniability no longer holds. Similarly, once the right kind of trusted hardware token is available, deniability can no longer hold. The point is that these extra powers are more useful to the adversary than to the protocol designer and from the literature on for instance receipt-free elections they don't come as an entire surprise, but it is still somewhat strange to see in action.

Serge's talk contained a third part on achieving different kinds of privacy in authentication protocols relevant for RFID tags, where he adapted a security definitions to bypass impossibility notions and realign with practice. The combination of the three parts of his talk was a clear reminder of the limitations and potential dangers of conducting cryptographic theory in complete isolation from practice, as one might well end up with unrealistic models or assumptions. In other words, a great start to the workshop.

Cryptomathic HSM Portal

Mike Bond from Cryptomathic gave a talk on their HSM Portal. A Hardware Security Module (HSM) is a certified physically tamperproof device that is used and often mandated in areas such as banking.
HSMs are ancient technology by today's standards, both in respect to the algorithms, parameters and modes in use and the APIs.

Yet an HSM is not "security" in and of itself but a module in a larger system: building such a system securely on top of a HSM is by no means easy.
HSM portal is a solution that offers an higher-level environment to build a HSM-based system. Customers can get the technical details (key length, key refresh time period, algorithms, modes of operation sorted by Cryptomathic and delivered in a configuration file). An API call might now look like "ENCRYPT file FROM alice TO bob", in other words something that non-cryptographers have a chance of getting right.

Three types of crypto-users were mentioned: home-growers, that know little of cryptographic theory and devide their own ad-hoc "cipher" when needed; passable theorists (among whom Mr Bond counts himself) who know the theory and core theorists who are "most likely to have written the theory and founded their own start-up company based on it".

Mr Bond made the point that industry does not care about security as much as compliance. This reduces the incentive to apply theoretical advances in cryptography. An algorithm, mode of operation or API is good if it is standardised and ticks certain boxes rather than if it has a formal proof of security.
In the case of API attacks, compliance boils down to specifying what an honest user should do with the API, not what a possibly dishonest or careless user CAN do.