Monday, June 10, 2013

Preliminary results on a paper from WISTP 2013

Unfortunately, I missed most of this year's WISTP due to a rerouted plane (I was lucky to arrive in time for my own talk) but one of the few talks that I was able to attend was Pierre Dusart's talk on his paper "Lightweight Authentication Protocol for Low-Cost RFID Tags" (co-authored with S. Traoré). In the paper, the authors propose a new lightweight primitive to be used in combination with authentication protocols on RFID tags. The design of the Dusart-Traoré primitive is best described as a 4-round substitution-permutation network without key scheduling operating on a 128-bit state (in analogy to AES terminology) organized as 16-element array of 8-bit words. It uses a 128-bit key. It's structure is shown in the following image:
(Some of the permutation arrows have been omitted to improve readability.) For the substitution layer the primitive uses the AES S-box and the permutation layer in round j is defined as
Fj(Mj) = f(M0j,M0+2j-1mod16j) || ... || f(M15j,M15+2j-1mod16j)
where the f function maps two input bytes onto a single output byte.

We were intrigued by the idea of a potentially secure symmetric lightweight primitive with low round complexity as well as the lack of a hard security estimate (i.e something like "at least 2128 operations are needed to break our primitive"). So we decided to take a closer look at the security of this primitive and discovered a most interesting differential property: Assume you have message inputs where all input bytes except one (w.l.o.g let this be M8) are constant, then we get a differential trail similar to the one shown below:

This implies that for the permutation layer of the last round we have
with constant M04 and variable M84 as shown in this zoomed excerpt of the previous image:

Based on this observation we have been able to implement a chosen plaintext attack that recovers the entire key of the primitive on average using O(221) time (i.e. less than a second on my laptop) and a very reasonable number of (M,C) pairs. Thus the primitive is effectively broken. More details of the attack will follow soon in a preliminary paper on eprint.

Wednesday, June 5, 2013

Eurocrypt 2013, Day 3, Session 'Secure Computation': How to Hide Circuits in MPC: An Efficient Framework for Private Function Evaluation

The second talk in the secure computation session on the third day of Eurocrypt was on the paper   "How to Hide Circuits in MPC: An Efficient Framework for Private Function Evaluation", authored by Payman Mohassel and Saeed Sadeghian from University of Calgary. Saeed gave the talk. This post also covers the more elaborate talk given by Payman after Eurocrypt at University of Bristol.

The paper revisited the problem of Private function evaluation (PFE), a relatively less explored problem compared to the well-known problem of secure function evaluation (SFE)/multi-party computation (MPC).  In a PFE protocol among n parties, say P1,....,Pn, the goal is to compute a function f represented by a circuit C, held privately by the first party P1 on the private inputs of the n parties. Unlike secure function evaluation (SFE), the circuit is not public, rather it is the private input of the first party. In particular, nothing else other than the size (number of gates) is leaked about the circuit. PFE is particularly useful in scenarios where learning the function compromises privacy, reveals security vulnerabilities, or when service providers need to hide the function or specific implementation of it to protect their Intellectual Property. Another important motivation is that PFE yields efficient program obfuscation (if interaction is allowed).  The paper investigates PFE in the most basic setting of semi-honest adversaries and report the most efficient constructions to date.  The main contribution is a general framework that can be instantiated to either two party or multi party PFE considering either Boolean or  arithmetic circuits.  

History. A bit of history of PFE is given below. The problem of PFE can be solved by reducing it to SFE of a universal circuit Ug that takes as input the  circuit C (containing g gates and representing function f) and the parties' private inputs x1,....,xn and outputs f(x1,....,xn).  The main objective is then to design smaller size universal circuits. Two approaches are known so far for boolean circuits. Valiant showed a construction of a Boolean universal circuit achieving (asymptotically) optimal circuit size of 19glog(g) for any given Boolean circuit of size g. An alternative construction is due to Kolesnikov and Schneider. They obtain a worse bound of 1.5 g log2 g. Nonetheless, the construction of Kolesnikov and Schneider seems to yield smaller universal circuits than Valiant's construction for circuit size less than 5000. Unfortunately,  the universal circuit approach does not lead to satisfactory solutions for arithmetic circuits due to the fact that the universal circuits are too large for any practical purpose (state of the art: size of  Ug becomes as high as O(g5)). Noting the bounds on the size of universal circuits and current state of the art SFE constructions (e.g. Yao for two party and GMW for multi party), it can be concluded that PFE constructions based on the Universal circuit approach do not achieve complexity that is linear in the circuit size g.  The second, and perhaps asymptotically optimal approach of designing PFE is via FHE (where even the circuit size could be kept private). Due to the high computational overhead, this solution can not be considered practical given the current state of the art of FHE. In a novel attempt, Katz and Malka designed two party PFE based on singly homomorphic encryption that attains linear complexity. Specifically, their constructions require number of public key operations that is linear in the number of gates in the circuit. The construction of Katz and Malka does not extend to the multi-party settings for n>2. In light of the above history, it can be concluded that the existing solutions are more expensive compared to their SFE counterparts (the current state of the art SFE constructions require linear number of symmetric key operations) and no multi party construction or arithmetic circuit based PFE with linear complexity (even asymptotically) is known thus far. This leads to the interesting question:

Can we design PFE with linear complexity in all standard settings (without resorting to FHE based trivial constructions), namely for any settings that is a cross product of (two party, multi party) and (Boolean circuit, arithmetic circuit)?

Contribution of the paper. This paper  answers the above question in affirmative when semi-honest security is concerned with and improves the state of the art of PFE notably by presenting a general framework that can be instantiated in any of the above mentioned settings. To fully hide a circuit C in PFE one needs to hide (i) the circuit topology and (ii) the function of the gates in the circuit (AND, OR, XOR etc.). The framework addresses the above two important issues independently. They introduce two functionalities:  circuit topology hiding (CTH) functionality and  private gate evaluation (PGE) functionality. Then they show how these two functionalities can be composed to realize PFE. And lastly, the paper presents a number of ways to realize the functionalities.  In what follows, I present intuitive treatment of these functionalities and try to describe how they are composed in a PFE. I also briefly mention how the functionalities are realized in the paper:

PGE functionality. The problem of PGE is as follows: let G be a gate that is to be evaluated. It's type is known only to P1. The parties wish to evaluate G on inputs a and b that are secret shared among the parties (according to some secret sharing scheme) so that the output remains shared and the type of G remains hidden from everyone except P1. The ideal functionality PGE is described as follows: it takes the gate G from P1, the shares of a and b from all the parties, computes G(a,b), finds secret sharing of G(a,b) and finally passes the ith share to the ith party.
Note that PGE can be seen as a PFE protocol where the circuit to be evaluated is a single gate. Very efficient solutions for implementing PGE is shown in the paper based on 4-out-of-1 OT  and singly homomorphic encryptions.

CTH functionality. A bit of background is needed first. To hide the circuit topology, the first task is to conceive it as a mathematical object. The authors note that a circuit topology can be described by a mapping. Let us describe the mapping. First find a topological ordering of the circuit C consisting of g gates. Label the input wires to the jth gate as iw2j-1 and iw2j (assume for simplicity that the circuit consists of only two input gates).  The set of incoming wires, say IW are the set of input wires to all the gates in the circuit. IW contains 2g elements. Next define another set OW, the set of outgoing wires. OW is defined as the union of the set of circuit input wires and the output wires for each non-output gate in the circuit. OW contains n + g - o wires where n defines the number of input wires and o defines the number of the output wires in the circuit. The outgoing wires are labeled as follows: The n input wires are labeled as ow1,....,own. Next output wire of the jth non-output gate (i.e. the output of this gate is not a part of the circuit output) is labeled as own+j. Now the topology of a circuit C can be fully described using the following mapping ΠC: {1,...,|OW|}--> {1,...,|IW|} from OW to IW. The mapping ΠC maps i to j if the ith outgoing wire owi is connected to jth incoming wire iwj. To be more clear, have a look at the following example circuit and its corresponding mapping shown using arrows.      

While ΠC is not a function, its inverse is. Given C, the mapping can be computed efficiently. Further the mapping defined above is an extended permutation since it not only permutes the elements in {1,...,OW} but also can replicate them as many times as needed. More formally, a mapping is called an extended permutation if every element in the co-domain (which is IW for ΠC) has been mapped from a unique element in the domain (which is OW for ΠC).  But a single element from the domain can be mapped to multiple entries in the co-domian. Unlike a permutation, an extended permutation can have domain and co-domain of different sizes.

The goal of CTH functionality is to enable oblivious evaluation of the mapping in an on-demand fashion as and when a gate is computed using PGE and the gate output wire needs to be mapped to the gate input wires it is connected to, to continue further computation. To understand what CTH functionality is expected to do for us, let me detail below how PGE and CTH functionalities are composed  to build PFE. The PFE protocol starts with an initialization phase. In this phase P1, knowing the circuit C, sorts the gates topologically and computes ΠC. Next every party shares its input (according to some secret sharing scheme) among all. The shared inputs are labeled as the first n wires in the set OW. From now on, the idea is to map an outgoing wire to incoming wire(s) (according to ΠC) as soon as it is ready (meaning the value associated with the wire is available in the shared format among the parties). Once all the parties share their inputs in the initialization phase, the parties perform the mapping via CTH functionality obliviously so that ΠC is not leaked to anyone other than P1. The CTH functionality enables this via two types of queries (which the parties are allowed to issue to CTH). By issuing a mapping query OMAP([x],j), the parties wish to map the jth outgoing wire where [x] is the associated sharing for the outgoing wire ([] is used to denote sharing). On receiving such a query, CTH takes ΠC from P1 and the shares of x from the parties. It computes ΠC(j) and for each l where j is mapped to lth incoming wire, CTH internally associate and store a re-randomised sharing of x. Note that neither the indices to which j is mapped to and nor the rerandomized sharings are revealed to the parties at this stage (information on circuit topology leaks otherwise). Rather the re-randomized sharings corresponding to the mapped input wires will be distributed on-demand; meaning as and when the parties want to evaluate a gate with those mapped input wires. So once OMAP([*],j) is queried for all j corresponding to n input wires, the parties proceed to evaluate the first gate. The sharings corresponding to the input wires of the first gate are stored in the CTH functionality and is not yet available to the parties. The parties issue REVEAL query (the second allowed query) to CTH to get the sharings. A reveal query REVEAL(j) implies the parties wish to know the sharing associated with jth incoming wire. CTH distributes the stored re-randomized sharing corresponding to jth incoming wire; so Pi receives the ith share from CTH. In general for evaluating jth gate, the parties send REVEAL(2j-1) and REVEAL(2j) and on receiving the corresponding shares they invoke PGE functionality to compute the sharing of the gate output.

To get a more clear picture, see the figure below that explains how party Pi interacts with CTH and PGE functionalities to evaluate jth gate. The numbers in the blue colored square boxes indicate the orders in which Pi proceeds. First Pi issues REVEAL(2j-1) and REVEAL(2j) to CTH and in return receives re-redomized shares of values a and b that are associated with (2j-1) and 2j th incoming wires. Then it invokes PGE and once it receives the share of the output of the jth gate, it makes an OMAP query to CTH to map the outgoing wire of jth gate.

One of the major contributions of this paper is to instantiate  CTH.  They show how to evaluate an extended permutation obliviously i.e. without leaking the permutation. There are two known solutions for the problem in the literature; one based on general MPC (e.g Yao for two party) which are inefficient and other based on homomorphic encryption. The authors present an alternative that takes the approach of efficiently implementing extended permutation using a switching network and then showing how to efficiently implement the switches using OT. They show how to construct a switching network consisting of O(g log g) switches for an extended permutation corresponding to a circuit of size g. Due to OT extension, evaluations of the switching network turns out to require O(g log g) symmetric key operations. In contrast, the homomorphic encryption based solutions from the existing works require O(g) public key operations. Though asymptotically worse, the approach based on switching network shows better concrete complexity over the other two alternatives. I skip the details on how to construct the switching network for an extended permutation and how to securely evaluate the switches. The major building block here is an efficient protocol for oblivious shuffling. Please refer to the paper for the technical details.

Open problems. Loads of open questions are posed at the end. One interesting direction is to investigate PFE with other security notions. This paper considers semi-honest setting.  Designing PFE with malicious and covert security  is a good open question in itself. It can be further investigated if the above notions of security can be achieved while maintaining linear complexity. PFE in information theoretic settings with linear complexity is another open question.  It is also important to find PFE that hides even the number of the gates in the circuit (known solution is via using FHE).

Studying the round complexity of PFE is another important direction (even considering the most basic setting of semi-honest adversary).  Known SFE construction with linear complexity for two party requires 2 rounds (Yao). PFEs with linear complexity on the other hand require at least 3 rounds. Can the round complexity of PFE with linear complexity be brought down to 2? Note that universal  circuit based approach gives 2 round PFE, but the PFE does not provide linear complexity. BMR90 reports SFE with constant round in multi-party settings. Can we achieve  constant round PFE in multi-party setting?

Study of concrete efficiency of PFE is another interesting direction. Asymptotically, the best solution for PFE requires linear in g public key operations.  The solution in this paper (based on switching network) requires g log g symmetric key operations. When concrete efficiency is concerned with, the latter solution may be more useful due to the gap between the efficiency of public and symmetric key operations. In the SFE world, the current state of the art achieves g symmetric key operations. It is interesting to find PFE that matches the state of the art of SFE.  

Sunday, June 2, 2013

Eurocrypt 2013, Day 4: How to Garble RAM Programs

The last talk  of Eurocrypt 2013 was given by  Steve Lu on  secure two-party computation via garbling RAM programs (joint work with Rafail Ostrovsky).  He started with  the  motivating example of secure cloud computing, in which there is a client that wants to store some data remotely and then repeatedly a remote server have to perform computations on that data, without knowing the data or the nature of the computation, and the result of the computation. 
The goal of their work is to obtain an analogue Yao's garbled circuits construction for RAM programs, more specifically they show how to garble any RAM program π that runs in t time and has size n in O( poly(k,log n)) time, keeping all the non-interactive properties of the Yao’s Garbled circuits and assuming the existence of one-way functions.  This problem  can be trivially solved by representing the program π  as a circuit and then garble it, but unfortunately in some cases this approach can lead to a big blow-up in program size.

The main two ingredients of their solution are Obliviuos RAMs, firstly introduced by  Ostrovsky  and Yao’s Garbled Circuits. The former   enables a client storing only a small amount of data locally to remotely store a large amount of data  and access them while hiding the identities of the items being accessed. The RAM model of computation is used with stored programs and a small stateful CPU that executes a sequence of reads and writes to locations stored on a large memory. Given the CPU state ∑ and the most recently read element x, CPU(∑, x) performs add/mult, updates program counter, executes PRF, outputs the next read/write command and updates the new state ∑’. As for the Yao’s garbled circuit , the construction  is based on three PT algorithms (G,GI,GE), where G is the program garbling algorithm, GI is the input garbling algorithm and GE is the evaluation algorithm.
Steve Lu described these algorithms. In order to garble a RAM program πt that runs in time t, G generates the garbled circuits  necessary to execute t steps of πt. Each step consists of a garbled ORAM query followed by a garbled CPU computation in such a way that  the garbling is independent of the actual program path. The garbled program is then sent to the server that can evaluate the program. The GI  algorithm for garbling input is just a time labeled encoding and the GE algorithm performs evaluation of the garbled program executing the same steps as the program garbling algorithm G but evaluating garbled circuits instead of generating them.
The talk ended with the interesting open problem of how to achieve reusable   garbled RAM programs, as has been done in (GKPVZ12) for garbled circuits.