Sunday, November 15, 2015

Games of Timing for Security in Dynamic Environments

Last week I had the pleasure of attending the 6th International Conference of Decision and Game Theory for Security based in the (very swanky) Grand Royale Hotel next to Hyde Park. The first of these conferences occured in 2010 and has continued to attract
international scholars and researchers. Decision and Game theory has proved to be a valuable and systematic framework used to deal with the intricacies involved in making rational decisions in security.

Games of timing have been an interest of game theorists for quite some time now. However, according to recent literature, most have been assuming that the cost and effectiveness of actions are time-independent. An example of this is the game of FlipIt[1]
between an attacker and defender playing for possession of a resource.
 A talk at this year's conference given by Ben Johnson removes this assumption by setting up a model that captures the discovery, repair and exploitation of software vulnerabilities. An example given by Ben was Microsoft officially stopping all support of Windows XP in 2014. Security professionals conjectured that attackers would begin ``stockpiling'' vulnerabilities until this time in order to exploit them fully.
The question that Johnson et. al.'s paper [2] and his talk tries to answer is whether the attacker should wait to exploit these vulnerabilities considering there is a risk of the internal security team discovering them. If you require a more in depth description of anything discussed here then please refer to the authors' paper.[2]

The authors consider the life cycle of the software as finite with an end time $t = T$.  The rate of vulnerability discovery is an arbitrary function of time $V(t)$. The defender's security investment $d(t):[0,T] \to \mathbb{R}_{\geq 0}$ is a function of time representing the level of her security investment. The timing of the attacker's exploitation of vulnerabilities is $a(t):[0,T] \to \mathbb{R}_{\geq 0}$. This is the amount of time the attacker will wait to exploit the vunerability when discovered at time $t$.

The repair process for vulnerabilities is determined by an exponential decay function $e^{-\lambda \tau}$. This represents the probability that a vulnerability remains exploitable at a time $\tau$ after discovery. The authors consider their model in both continuous and discrete time. For the purpose of this blog post, i'll just discuss continuous-time.

The defender's total cost over the time interval $[0,T]$ is simply $\int^T_{t=0} d(t) dt$. The expected loss to the defender if the attacker waits for time $a(t)$ to exploit a vulnerability found at time $t$ is $e^{-\lambda a(t)} \frac{R}{d(t + a(t))}$ where $R$ is a scaling factor between security costs and losses. These can be combined to find the defenders total payoff
 U_d = - \int^T_{t=0} \bigg{(}d(t) + V(t)  e^{-\lambda a(t)} \frac{R}{d(t + a(t))}\bigg{)}dt
Likewise, the attacker's payoff is given by
 U_a =  \int^T_{t=0} \bigg{(} V(t)  e^{-\lambda a(t)} \frac{R}{d(t + a(t))}\bigg{)}dt
The authors then turn to looking for conditions for useful Nash Equilibria[3] to exist.
Interestingly, it can be shown that the attacker will never wait if the vulnerability function satisfies
\frac{V(t+a)}{V(t)} \geq e^{-\lambda a(t)}
for every $t \in [0,T]$ and $a(t) \in [0, T-t]$. This begins to provide guidelines for finding vulnerability repair rates in order to stop the attacker from stockpiling vulnerabilities. Further work suggested by Ben is to study the Stackelberg equilibria[4] of the game.

[1] -
[2] -
[3] -
[4] -

Tuesday, October 27, 2015

Sardinia eCrypt School, October 2015

As I was standing in the sweltering heat, the burning light from above with the hustle bustle of the busy bodies rushing all around me, I thought to myself "in about three hours I'll finally get through Stansted airport security".

One of the reasons I applied for a PhD at the University of Bristol was to travel; to travel all around the world attending conferences, networking with other academics in my field and collaborating on applicable pieces of work. However, I hadn't anticipated that I would arrive at Bristol and meet my supervisors (Elisabeth Oswald and Daniel Page) for them to say "thank goodness you're here, now pack your bags and head to Sardinia for a week".

I didn't argue.

The school I'm referring to is the "School on Design and Security of Cryptographic Algorithms and Devices", and this year it was co-sponsored by the International Association for Cryptologic Research (IACR). A total of roughly 80 Academics and Professionals from all around the globe attended the conference, many of whom were new PhD students (comme moi), and a few were seasoned Crypto School Veterans. As a newbie I experienced the excitement of meeting all these new faces, and getting to know them over a few glasses of wine (the magnificent hotel resort was very generous with its distribution of wine, particularly to my table).

The Summer School took the form of a crash course in all different aspects of cryptography, focusing on covering a lot of ground with introductory lectures. For my first blog post I really wanted to focus on one particular talk, but after attending them all it became clear that I had quite a few favourites; for their content, Benedikt Gierlichs "Introduction to Implementation Attacks" and Josep Balash "Introduction to Fault Attacks" were fascinating. With a background of little knowledge of practical attacks on embedded security, I was shocked to see just how easy one could manipulate theoretically secure devices, simply by making minor alterations to the embedded circuit (see slides from Benedikt Gierlich "Introduction to Implementation Attacks" for more detail).

As for the presentation of the talk, I give a special mention to Gregor Leander. He was able to introduce Lightweight Block Cipher design whilst holding the attention of his audience through quick wit; he told stories of multiple attempts to create nifty lightweight block ciphers, for them to be proven weak and useless within several months of publication. My favourite was the story of Keeloq - a proprietary hardware-dedicated block cipher that uses a non-linear feedback shift register. It was sold to Microchip Technology Inc in 1995 for $10 million (wow), and it was (maybe is) used in many remote keyless entry systems by companies such as Chrysler, Daewoo, Fiat, GM, Honda, Toyota, Volvo, Volkswagen Group, Clifford, Shurlok, Jaguar, etc. However, the Keeloq algorithm does not use cryptographic nonces (for simplicity), nor does it include time stamping (due to clock drift). This means, of course, that the algorithm is vulnerable to replay attacks: simply jam the channel while intercepting the code, and one can obtain a code that may still be usable at a later stage. If your car still uses Keeloq, you should probably either invest in a steering wheel lock, or paint your car bright pink.

To summarise, the organisers of the School did a fantastic job of bringing everyone together, the speakers were all fantastic and engaging, and I learnt a great deal about Cryptography whilst having the greatest week in Sardinia. I would strongly recommend this school to anyone who has just started a PhD in cryptography, regardless of their topic title. I will most certainly be getting involved in as many of these schools as I can!

Finally, a big shout out to all the friends I made at the School - you are all brilliant and I can't wait for the next one. Also, sorry for my atrocious dancing throughout the week. Blame the bar for playing 'Barbie Girl'.

Thursday, October 22, 2015

Obfuscation: Hiding Secrets in Software

Last week the HEAT-hosted summer school on FHE and multi-linear maps took place. On the final day we learned about indistinguishability obfuscation from the eminent cryptographer Amit Sahai. The talk was entitled 'Obfuscation: Hiding Secrets in Software' and a summary is presented here.

Suppose you want to keep a secret, but some adversary has taken over your brain: whenever you think about the secret, the adversary is able to read it. While this Orwellian nightmare is not realisable for us (at the moment!), in software this happens all the time. This is the fundamental question of obfuscation; i.e., Can we keep a secret from some adversary if all of the functionality is known? Indistinguishability obfuscation is an attempt to realise this.

One early idea on the way to program obfuscation was secure multi-party computation (MPC). The rough idea of secure MPC is to allow each individual of a party to compute some function on everyone's input without anyone learning too much about the input of any other person. In such a situation, an adversary may be one member of the party, or even act as multiple members in order to try to work out something about the input of one of the other party members. In this case, the adversary has a lot of information he can exploit; for true obfuscation, however, we want that nothing at all be revealed to an adversary.

Now we ask the important question, Which programs allow for secrets to be kept? Suppose we have a program with some secret $s$ accepting an integer $x$ as input such that it outputs 1 if $x<s$ and 0 otherwise. By a simple binary search, $s$ can be determined by the adversary. Plausible security requires unknowable queries, so if we are to succeed in obfuscation we need to have a 'virtual black box'.

However, back in 2001, black box obfuscation was suggested and ruled out in the same paper: it cannot be achieved for all programs. The authors go on to describe and define indistinguishability obfuscation (iO) -- obfuscating two programs with the same functionality so the programs are computationally indistinguishable. Interestingly, if P=NP then any circuit can be obfuscated, and so iO cannot produce hardness itself.

This well-known paper was the first construction of iO for NC$^{1}$ circuits. The authors describe a useful possible application: crippleware. Suppose you are a software vendor and have written a program with lots of functionality, and you want to distribute a trial version of the software which is somewhat restricted. In order to block out trial users from using content for which they have not yet paid, one possibility is for you to go through the code and manually remove the blocks of code for which a full licence should be required. In a large program this very cost- and time-inefficient. However, if instead you simply restrict the functionality of the user interface and release an obfuscated version of the program, this version will not have any functionality beyond that which was intended.

Very recently, there have been many attacks on multi-linear maps, many of which rely on obtaining top-level encodings of zero. Because in obfuscation the adversary never has access to encodings of zero, it is resistant to these attacks. Thus the hope remains that obfuscation will not be broken.

Wednesday, October 21, 2015

Using Linearly-Homomorphic Encryption to Evaluate Degree-2 Functions on Encrypted Data

This post is about a talk in the final session of CCS 2015 that was presented by Dario Fiore. The presented work is a joint work with Dario Catalano and the full version of their paper can be found here.

Homomorphic Encryption (HE) provides a solution to the problem of outsourcing computations privately to a cloud. There are a number of Somewhat Homomorphic Encryption (SHE) schemes that support many additions but only a limited number of multiplications. Some examples of such schemes in the chronological order are Goldwasser-Micali, El Gamal, Okamoto-Uchiyama, Paillier, Boneh-Goh-Nissim, and all the underlying SHE schemes of Fully Homomorphic Encryption (FHE) schemes since the construction of Gentry. Though SHE schemes can evaluate only a limited set of circuits compared to FHE schemes, it is still interesting to consider them for applications mainly for efficiency reason. Those SHE schemes that support only a single operation on ciphertexts - addition or multiplication - is called linearly homomorphic. That is, (additive) linearly-homomorphic encryption schemes can evaluate only linear functions on ciphertexts. (Here, by  a "linear function" we mean that the function can be expressed as a polynomial of degree one in the input variables with the scalars coming from a ring.)

The main problem addressed in the current work is to enable evaluation of degree-two functions using linearly-homomorphic encryption schemes. To this end, the authors propose a generic transformation that extends an (additively-written) linearly-homomorphic scheme to support a single multiplication operation so that it will now be able to evaluate a subset of functions of degree two.  The only requirement of the underlying linearly-homomorphic scheme is that it should be public space, which means that the message space must be a publicly known finite commutative ring and that it is possible to efficiently sample a random element from this ring. This property seems to hold for nearly all of the known linearly-homomorphic encryption schemes or can be easily adapted to become so.

Compactness is one of the three main requirements of a (fully) HE scheme that says that the complexity of decryption must be independent of the complexity of the circuit evaluating the given function. The other two requirements are semantic security and circuit privacy for the underlying plaintext messages. The latter means that it must not be possible to learn anything about the messages in the ciphertexts input to a function. It is shown that the proposed transformation preserves semantic security and the transformed scheme will be levelled circuit private if the underlying scheme is circuit private. The transformed scheme will be able to compactly evaluate a subset of degree-two functions (represented as polynomials) $\mathcal{F}_2^* $ consisting of the following polynomials $f$
f(m_1,...,m_t) = P(m_1,...,m_t) + \sum_{i=1}^L Q_i(m_1,...,m_t) * R_i(m_1,...,m_t),
where $P$, $Q_i$, $R_i$ are linear polynomials.  It turns out that the subclass $\mathcal{F}_2^*$ of quadratic polynomials is relevant for applications, for example, SPDZ protocol for multi-party computation requires a HE scheme for $\mathcal{F}_2^*$ with $L=1$. Though there are many prior works that aim to construct efficient SHE schemes, they do not achieve full compactness even for this subclass of quadratic functions. In the current work, this limitation for the quadratic functions outside $\mathcal{F}_2^*$ is overcome by working in a different model that uses two servers that do not communicate nor collude. A transformation of the underlying HE scheme is given in this model that is similar to the previous transformation, using which it is now possible to evaluate any quadratic polynomial compactly. Interestingly, the former (single-server) transformation preserves other properties such as zero-knowledge proofs of plaintext knowledge, threshold decryption, and multi-key homomorphicity. Also, the above results are generalised to obtain a $d$-homomorphic to $2d$-homomorphic transformation, and to obtain a two-server protocol to evaluate all degree-three polynomials from BGN HE scheme. 

The authors have compared implementations of the Paillier scheme and the Joyce-Libert scheme when transformed to support evaluation of degree-two functions, with that of the HElib implementation of the BGV FHE scheme instantiated with parameters to support a single multiplication, and also with an implementation of the BGN SHE scheme. Upto values of $L = 10$, the transformed Joyce-Libert scheme has comparable decryption time, but with ciphertexts 25 times shorter, and the encryption and the homomorphic operations at least 30 times faster.

Tuesday, October 20, 2015

52 Things, Number 50: What is the BLS pairing-based signature scheme?

This week we look at what the BLS pairing-based signature scheme is. See here for full details.

This signature scheme makes use of the Weil pairing on elliptic curves, essentially a bilinear form (with multiplicative notation) on points of order dividing $n$ on a curve, taking values in $n^{th}$ roots of unity.

So assume you have an elliptic curve $E/\mathbb{F}_{3^l}$, following the notation in the original paper. The scheme is as follows:

KeyGen: Let $E/\mathbb{F}_{3^l}$ be an elliptic curve and $q$ be the largest prime factor of the order of the curve. Let $P$ be a point on it of order $q$ and $x \in \mathbb{Z}_q^*$ be selected at random. Finally let $R = xP$. Then output $(l, q, P, R)$ as the public key and $x$ as the secret key.

Sign: To sign a message $M \in \{0,1 \}^*$ we map $M$ to a point $P_M$ in $<P>$. This is done via an algorithm $h$ described in section 3.3 of the paper, and is a hash function. Then let $S_M = xP_M$. The signature $\sigma$ is the $x$-coordinate of the point $S_M$, and $\sigma \in \mathbb{F}_{3^l}$.

Verifiy: Given a public key $(l, q, P, R)$, a message $M$ and a signature $\sigma$, do:
  • find a point $S$ on the curve of order $q$ whose $x$-coordinate is $\sigma$ and whole $y$-coordinate belongs to $\mathbb{F}_{3^l}$. If no such point exists, reject the signature as invalid.
  • set $u = e(P, \phi(S))$ and $v = e(R, \phi(h(M)))$, where $e$ is the Weil pairing on the curve and $\phi$ is an automorphism $E \leftarrow E$. $h$ is the same $h$ mentioned earlier.
  • if either $u = v$ or $u^{-1} = v$ accept the signature as valid; reject otherwise.

Monday, October 19, 2015

52 Things: Number 52: Pick an advanced application concept such as e-Voting, Auctions or Multi-Party Computation. What are the rough security requirements of such a system?

This is the last of our 52 things which we think every first year PhD student in Cryptography should know. And as you may have gathered over the last 52 posts, we expect students to know all sorts of aspects ranging from theory to practice. But the key thing is that you need to consider in cryptography not only security against players who play by the rules, but also security against players who do not play by the rules. Lets examine this in the context of Voting, Auctions and Multi-Party Computation.

Let us first discuss what we mean by these three applications. In voting we have voters who select candidates according to some voting scheme (first-past-the-post, alternative-vote, approval voting or whatever). The vote should remain secret, only valid voters can vote, only one vote per candidate is allowed, votes must be valid (for a real candidate for example), the final result must be correct, voters must not be able to be coerced, the list of security requirements are quite long. 

For auctions we may want bids to be private, we may not trust the auctioneer, there may be multiple items, with multiple possible final prices, the selection of the winning bid/price will be due to some algorithm, the final output may need to be auditable.

For Multi-Party Computation (by which we mean a computation of a function on the private inputs of a set of parties) the security requirements are simpler in that we want only the output of the function to be revealed, and nothing about the inputs (bar what can be computed from the output). However, whilst this is a simpler goal, the functionality is wider than for auctions and voting in that we require ANY function should be able to be computed.

What makes these operations interesting is that we EXPECT the bad guy to be part of the protocol. Compare this with simple encryption, where a message from Alice to Bob is sent. In encryption we expect Alice and Bob to be trustworthy and the bad guy is the person on the outside looking in. For voting, auctions and MPC we do not trust anybody, the bad guy could be a voter trying to cast multiple votes, a vote tallier trying to count the votes incorrectly, a bidder trying to made a winning bid which is not the highest, or an auctioneer trying to work out what the value of a non-winning bid is etc. 

The parties in the protocol do not even need to behave by the rules, i.e. follow the protocol. They could send messages which are produced incorrectly but which "look like" valid ones, but which later produce incorrect results for the protocol. We need to protect against this so called "malicious" behaviour.

There might be a group of bad guys working together to defeat the system, we need to determine how big a coalition of bad guys we can tolerate in our protocol. In MPC there is a big difference between the case when we have an honest majority and a dishonest majority. For honest majority protocols we can ensure that the honest parties always end up with a valid function output. For dishonest majority protocols we cannot stop a dishonest party from terminating the protocol for everyone.

We need to protect against the problem of who goes first or last. A property in the literature called "fairness". For example suppose we have an election with three voters; A, B and C. Suppose the votes are encrypted, player C can ensure that the candidate voted for by A wins (and hence find out who A voted for) by replicating A's vote. This should be protected against. 

The adversary may have a set of parties who it controls at the start of the protocol, a so-called static adversary. Or perhaps as the protocol proceeds the adversary decides which parties it wants to corrupt, a so-called adaptive adversary.

As one can see there are a multitude of security concerns one may have in such advanced protocols, and indeed a multitude of security results. Indeed each application domain gives rise to different security properties that one may require. Given the wide range of possible application protocols this means a never ending series of problems for cryptography to solve; and thus a never ending supply of problems for cryptography PhD students to solve.

Thursday, October 15, 2015

Inference Attacks on Property Preserving Encrypted Databases

This post will focus on Inference Attacks on Property-Preserving Encrypted Databases by Naveed, Kamara and Wright presented this week at CCS in Denver. As the name suggests, the idea of property-preserving encryption (PPE) is to encrypt data in such a way that some property, such as ordering (OPE), is preserved---i.e. if $a>b$ then $\mathsf{Enc}(a)>\mathsf{Enc}(b)$. Another example is deterministic encryption (DTE) which preserves equality, meaning that if $a=b$ then $\mathsf{Enc}(a)=\mathsf{Enc}(b)$. PPE is of course inherently less secure than semantically-secure encryption, however it opens the door to a number of applications, in particular the context of healthcare. As a straightforward example, if a large database holds patient records and each column is encrypted using some method of PPE, e.g. gender and disease using DTE, age and admission date using OPE (fields such as patient identifiers may not require any functionality). If a medical professional wishes to analyse data of all patients between the ages of 18 and 25 who have suffered from altitude sickness, instead of downloading the entire database (which she would have to do if the database was encrypted using standard, probabilistic encryption), she can calculate $\mathsf{Enc.OPE.age}(18)=c_1$, $\mathsf{Enc.OPE.age}(25)=c_2$ and $\mathsf{Enc.DTE.disease}(\textrm{altitude sickness})=c_3$ and only download the database entries that have an age ciphertext $c$ in range $c_1 \leq c \leq c_2$ and disease ciphertext $c_3$.

This saves bandwidth, and consequently time and money, but we really don't want the database (or queries to it) to leak anything more than the properties claimed by the encryption mechanisms, namely equality for DTE, and order+equality for OPE. Unfortunately, human characteristics and medical diagnoses have low entropy: if an adversary knows that a medical database holds records for patients from one geographic area, then the distributions of age and race are going to be very similar to the general population for that area, and even worse most of the data fields will be highly correlated to data from another nearby hospital. This opens the door to two attack vectors: inference attacks that use publicly-available data to gain information about queries (the focus of the paper), and de-anonymisation attacks (that can be built from inference attacks) that aim to identify entries of the EDB. Of course we can use semantically-secure encryption for sensitive attributes (requiring download of the whole column before analysis can occur) to try to thwart these attacks, but which attributes are sensitive? The answer to that question is beyond the scope of this post but the answer is, in general, as many as possible. Note that attributes may be sensitive not only for patients but also to hospitals, which may want to keep secret how many patients die of certain diseases.

The authors show that it is possible to mount inference attacks on encrypted databases (that are built on tools such as CryptDB and Cipherbase) storing real patient data from 200 US hospitals, using only the ciphertexts and public auxiliary information. The public data is obtained from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS) database. The talk provided a jarring 'context is everything' moment by including a slide that displayed only the words "We attack each hospital individually" in large font. This means that the authors look at each hospital database on an individual basis rather than grouping them, and use the auxiliary info to mount their attacks. A number of tools are used and can be found in the paper. The easiest to explain is frequency analysis on a column encrypted using DTE (authors focus on mortality risk): sort the histogram, compare to the auxiliary histogram and map same-ranked elements to infer which entries correspond to which attribute. The authors' novel $l_p$ optimisation recovered mortality risk and patient death data for at least 99% of the 200 largest hospitals, and recovered disease severity for 100% of patients for at least 51% of those hospitals. Sorting attacks on data encrypted using OPE take columns that are 'dense' (meaning that all values have entries), and allow the attacker to decrypt all values with no knowledge of the key: the authors recovered the admission month and mortality risk of 100% of patients for at least 90% of the 200 largest hospitals. Another novel attack called the cumulative attack recovered disease severity, mortality risk, age, length of stay, admission month, and admission type of at least 80% of patients for at least 95% of the largest 200 hospitals. These results only exploit the databases themselves and not leakage from queries ('ciphertext only'), suggesting that the results are a lower bound on the information that could be gleaned from these encrypted databases.

Interestingly, the following talk in the session was given by Florian Kerschbaum defined a new security model for order-preserving encryption, potentially giving a countermeasure against some of the attacks on OPE given above.