Bristol Cryptography Blog: 2013

Tuesday, December 24, 2013

FPDetective: Dusting the Web for Fingerprints

'Twas the study group before Christmas, and all through the group,

All papers were read, by Damgard and Shoup.

The stockings were hung, the schemes were secured,

With hopes that the crypto, security ensured.

With billions of us browsing the web everyday the web has become a major source of revenue for advertising companies. By exploiting characteristics of web browsers, companies can fingerprint users and perform more targeted advertising. Most (tech savy) people are nowadays familiar with the concept of a cookie (and not the type left out for Santa on Christmas Eve). In short, this is a file left behind on your computer when you visit a particular website and is used to recognise you the next time you re-visit the website. Recent legislation has meant that websites must make clear that a cookie will be stored on your machine (see the EU Directive on Privacy and Electronic Communications). A cookie, however, is not the only way to fingerprint a user. Mayer in 2009 [pdf] and Eckersley in 2010 [pdf] showed that it was possible to exploit characteristics such as screen dimension and the list of installed fonts, in order to invisibly fingerprint a user (see below for more details). In total Eckersley was able to uniquely fingerprint 94.2% of half a million users. In last week's study group Luke discussed the paper "FPDetective: dusting the Web for Fingerprints" by Acar et al. [pdf]. This paper presents a framework which can be used to detect when websites are using these (alternative) fingerprinting techniques and presents results which show that such techniques are used in the wild.

How does fingerprinting work?

There are several ways that device fingerprinting could be performed, the most common being based on either JavaScript or Flash. In the case of JavaScript there are two objects which are most commonly accessed to perform fingerprinting:

navigator - This will contain information about the browser vendor and version, plugins supported, MIME types and some information about the OS and architecture of the machine running the browser.
screen - Information about the resolution of the monitor, and number of colours and pixel depth.

By hashing the concatenation of the content of these two objects (together with the list of plugins and MIMEtypes), Mayer was able to uniquely identify 96% of a total of 1328 clients. Eckersley extended this in the Panopticlick project by including further contents, one of which was the list of fonts. Surprisingly the list of fonts plays a big role in the uniqueness of fingerprints. As mentioned earlier Eckersley uniquely fingerprinted 94.2% of half a million users. For readers interested to see if their own browser fingerprint is unique they can visit the Panopticlick website to test it out.

FPDetective

Acar et al. created the FPDetective framework to discover when websites are performing fingerprinting. It consists of a web-crawler using the PhantomJS browser to collect data related to JavaScript fingerprinting and Chromium for data related to flash. Next their parser was used to extract data from the obtained log files. Following this the extracted data was stored and combined on a central server. Source code and further details are available here.

Using FPDetective Acar et al. analysed the top one million website (as listed by Alexa). They found that 404 of these were running JavaScript-based fingerprinting scripts, with a total of 13 unique scripts being used. One of the scripts even went a step further and tried to remove itself after running to hide any traces. Following this Acar et al. ran the experiments to look for flash-based fingerprinting and found that 95 out of a total of 10000 websites were active in doing so. Interestingly one of these was the Turkish Ministry of Education.

Mitigation techniques

The Tor Browser lists cross-origin fingerprinting unlinkability as a security goal, so unsurprisingly it has strong defences against fingerprinting. It does this by trying to make all browser sessions look the same. It also limits font-based fingerprinting by bounding the number of fonts that a page can requests (Acar et al. found a vulnerability against this but it has since been fixed in recent versions of Tor).

Firegloves was a proof of concept Firefox extension created for research purposes which tries to randomise the fingerprint. Acar et al. found that it was quite ineffective against preventing fingerprinting but since it is no longer maintained this could be because of changes to the Firefox extension system. Ironically since Firegloves itself is detectable this could actually become part of the data used to fingerprint a user due to its limited user base (1750 users).

"Do Not Track" (DNT) is an HTTP header field which allows users to request that websites do not perform any tracking and is implemented in many modern web browsers. Acar et al. ran their experiments again, this time with the DNT field set to 1, but found that this yielded exactly the same results. This shows that websites are simply ignoring user's tracking preferences.

Thursday, December 5, 2013

Factoring RSA keys---in practice!

I've been at Asiacrypt 2013 in Bangalore this week, and taking a bit of time out from going to the talks and checking out India I thought I'd write about Nadia Heninger (@nadiaheninger) and Tanja Lange's (@hyperelliptic) presentation on their research into factoring 1024-bit RSA keys. The paper can be found here, and there's a set of slides here. This discovery was presented first at the Crypto 2013 rump session.

The one sentence summary is that they were able to recover at least 184 of the 1024-bit RSA keys generated by a certified smartcard and used in a Taiwanese national database of digital certificates for citizens.

Other that the fact that they were able to this is quite alarming, the most important thing to note here is that they didn't need anything like the computing power of a special purpose cluster or a large botnet, or a highly optimised version of an implementation of the general number field sieve to factor those 1024-bit keys; the trick here is to exploit the randomness used to generate the keys themselves.

What's the problem with randomness?

To explain this it's necessary to go back to some research by Lenstra et al. (Crypto 2012) and Heninger et al. (Usenix 2012). An RSA moduli is a large integer that is the product of two large prime numbers, so if you're given a public RSA moduli and are able to find one of its prime factors, then you can get the corresponding private key. Obviously the size of the RSA modulus is chosen to be large enough such that this should be impossible to do (note: there seems to be a general recommendation at the moment to move away from 1024-bit RSA keys up to 2048-bit keys).

The critical observation made in the referenced research was that it's easy to check whether two RSA moduli share the same prime factor (and to find that factor). So if you have a set of RSA public keys, it turns out that using the greatest common divisor (GCD) algorithm it is relatively cheap to check for the presence of any shared factors and if found, recover the private keys.

On paper this shouldn't ever be a concern, as all our prime factors should be chosen uniformly at random, and so the chance of any two being the same should be vanishingly small. However, this is not always the case---some devices appear to have entropy problems with their random number generation that result in some factors being generated more frequently. Heninger et al. were able to retrieve private RSA keys for 0.5% of TLS hosts and 0.03% of SSH hosts on the Internet, and Lenstra et al. managed to recover 12720 keys from 4.7 million moduli.

The underlying problem appears to stem from entropy shortfalls in the random number generation methods used in devices that are typically either headless or embedded. I'd highly recommend checking out both papers for more detail.

Going further than finding shared primes

The research team here initially tried to find shared factors within the smartcard signature dataset, and were able to recover 103 private keys. What differentiates this particular attack is that Bernstein et al. were able to factor a further 81 keys by looking in more detail at the particular properties of the flawed random number generation.

The authors hypothesise that in this instance it appears that the shared factors are likely to arise because of a glitchy hardware random number generator. By inspecting the shared primes, the authors observed that the rng seems to create primes that have a particular structure, for instance containing repeated binary patterns (e.g. "001001001001" or simply "00000000").

The trick used here was to use the observed patterns as a profile for trying to guess additional primes that are not shared with any other moduli but do closely match one of the observed patterns, and then to simply use trial division across the remaining unattacked moduli in the dataset to see if the guessed prime factor actually is a match. There's not really space to go into detail here, but the authors then go a step further and use Coppersmith's method to find more prime factors of a particular form, getting us up to the final value of 184 recovered private keys.

One interesting aspect to this is that these smartcards passed both FIPS and Common Criteria certification, and so one might wonder why and how this randomness issue managed to sneak by the evaluation processes. According to the authors the Taiwan government department responsible has initiated the task of issuing replacement cards.

Finally, this is another example of how the lack of quality randomness can spectacularly screw otherwise perfectly secure crypto up. Random number generation is a key element of a large number of currently used protocols, and so it remains to be seen how large this particular attack surface actually is (for instance, see Nadia's Asiacrypt rump session talk on ECC), especially given the certification issue here and the ongoing discussion about Dual_EC_DRBG.

Much more detailed info on this particular attack can be found here and for a look at the problem of bad randomness producing weak keys in general, check out factorable.net---both sites are well worth a read.

Thursday, November 28, 2013

Sleuth: Automated Verification of Software Power Analysis Countermeasures

Yesterday's study group was led by Elisabeth and was about the work contributed by Bayrak et al. to CHES 2013:

Ali Galip Bayrak, Francesco Regazzoni, David Novo, Paolo Ienne, "Sleuth: Automated Verification of Software Power Analysis Countermeasures", Cryptographic Hardware and Embedded Systems - CHES 2013, Lecture Notes in Computer Science 8086, 2013, pp 293-310 (http://dx.doi.org/10.1007/978-3-642-40349-1_17)

In a nutshell, if one tries to implement a block cipher in a straightforward manner, the implementation may be leaky and susceptible to side-channel attacks. To increase the system's security against attacks of this nature, hardware and software countermeasures can be employed. In the aforementioned work, the authors present a method for automated verification of the security against side-channel attacks of a protected software implementation.

The authors discuss the notions of sensitivity and insensitivity of security-critical applications against power analysis. An operation or group of operations is defined as sensitive, if it depends on secret data but not on any random data. As an example, the authors mention this operation:

st = pt xor key

This operation is vulnerable to power analysis when executed on embedded devices, because power consumption characteristics depend on the secret key, which can therefore be recovered through Differential Power Analysis. This vulnerability can be avoided by applying a random mask to the secret variable. Intermediate calculations are randomized and the mask is removed at the end of the calculation before outputting the ciphertext. However, a secret intermediate value may be leaked if a mask is removed too early and problems of this nature are difficult to detect manually in long and complex implementations.

For their "Power Analysis Sensitivity" method, the authors define four elements:

A Straight Line Program, which is a sequence of branch-free operations.
Additions to the Type System, whereby the developer tags each variable as public, secret, or random.
A Leakage Model.
Sensitivity of a vector of operations and the leakage related to it.

In this context, a set of operations is sensitive if it has both of the following characteristics:

It statistically depends on at least one secret input variable
It does not depend statistically on any random input variables

Verification is a two-step process. Firstly, the authors convert a program to a Data Flow Graph (DFG). Subsequently, the DFG is analysed for the detection of sensitive operations.

Each DFG node represents either an arithmetic/logic operation (x = y op z) or an array handing operation (x = y[z] or y[z] = x). The former node type will have two incoming edges (y and z) and one outgoing edge (for x).

Nodes handing array operations are somewhat more complex, because the accessed address may or may not depend on an input variable. The address is statically-determinable if it depends on a constant (e.g. y[0] = x), since the program will always access element 0 regardless of inputs. Conversely, x = y[z] is not statically-determinable if z is a node input. If all accesses to an array are statically-determinable the array is called directly accessed and all its accesses are treated as if they were ordinary assignment operations (as above). Otherwise, it is called indirectly accessed and its behaviour is represented in the DFG through a Look-Up-Table (LUT). To achieve perfect accuracy, the LUT is formed by all inputs influencing either data in the array or the array element being accessed. Therefore, the LUT is a 2^m : 1 multiplexer, whereby m is the total width in bits of the aforementioned inputs.

A DFG can compute the univariate leakage of an operation as, for example, the Hamming Weight (HW) of the result of the lookup operation. Similarly, bivariate leakage caused by the sequential execution of two operations can be calculated as a Hamming Distance (HD). HW and HD are only examples of leakage models.

As discussed above, sensitivity has been defined in terms of statistical dependence between variables. To test these conditions, the authors formulate the problem as a satisfiability query (SAT) and argue that the statistical dependence condition is analogous to don't care analysis of logic synthesis. Their solution first checks the second sensitivity condition (dependence on random variables) before testing dependence on a secret variable (first condition).

Sleuth has been implemented in the back-end of the LLVM open source compiler. Sleuth uses LLVM assembly as its input. The authors discuss that machine-specific assembly can be converted to LLVM assembly in a straightforward fashion and mention as an example a script developed by themselves to perform such conversion from AVR assembly. High-level languages (e.g. C) can be analysed by first compiling them into LLVM assembly using off-the-shelf compilers. As discussed above, the verification algorithm works on straight line programs. To this end, Sleuth converts LLVM assmebly to a straight line program by unrolling loops and inlining function calls, without applying any optimizations. The output of this conversion is used to generate the DFG and operations are then queried for sensitivity based on security types and leakage models provided by the user.

Sunday, November 24, 2013

Active two-party computation

A large number of protocols have been proposed in the last few years for active secure two-party computation. One significant approach is based on Yao’s garbled circuit. We have already discussed garbled circuit technique previously in this blog. Informally speaking, in the standard protocol the two parties, Alice and Bob, work asymmetrically: one party (say Alice), also called the generator, constructs a garbled circuit computing the original function f, and sends it to Bob, the evaluator. He evaluates the garbled circuit producing the output f(x,y), where x and y in {0,1}ⁿ are the private inputs of Alice and Bob, respectively. A garbled circuit (GC) is nothing but an “encrypted” circuit: along with the GC the evaluator receives “random” keys corresponding to his input bits y_i; then he computes their encryption obliviously via 1-out-of-2 OTs and evaluates the circuit.

To achieve active security the “cut-and-choose” paradigm is often used. Alice produces many garbled circuits, then Bob asks Alice to open a fixed fraction of them (typically an half). He aborts if any of these check-circuits are inconsistent; otherwise he evaluates the remaining ones (the evaluation-circuits) and chooses the majority of their results as final output. While cut-and-choose solves the problem of corrupted Alice constructing the garbled circuits incorrectly, it produces some well-known byproducts. Mainly we need to ensure that the parties use the same inputs in all the evaluations. Many different approaches have been proposed to address the problems related with the cut-and-choose paradigm and malicious adversary.

In our last study group, led by Enrique, we consider three different papers LP07, LP11 and L13. In LP07 the authors firstly provide an implementation and a full proof of security for active secure Yao’s protocol based on cut-and-choose. To ensure Alice’s input consistency they proposed an approach requiring O(s²n) commitments, where s is the statistical parameter of cut-and-choose and n is the input size. Another problem is the so called selective failure attack: a corrupted Alice could provide inconsistent random key to the Bob's input wire and its associated OT, gaining information about Bob's input. The suggested solution in LP07 requires a larger circuit and an encoding of the input bits.

We recall that Bob cannot abort even if he knows that Alice is cheating, i.e. in the case that the evaluation-circuits do not all provide the same result, and that this is the reason he takes the majority output. Now suppose that the total number of garbled circuits is s and that an half of these are opened. An adversary succeeds if s/4 circuits are incorrect and none of them is checked. So Alice can cheat with probability at most 2^-s/4. In LP07 the authors prove a bound for the error probability of 2^-s/17. In practice this means that to achieve an error of 2^-40 we need to construct and exchange 680 circuits.

In LP11 this bound is improved to 2^-0.311s. In this work the authors not only decrease the number of circuits (132 to have an error of 2^-40), but they also remove the commitments and the necessity to encode the bit's input, incorporating the checking on the circuits and the OT. In order to obtain this result they define a new functionality, the cut-and-choose OT functionality. Their implementation is based on an instantiation of the OT protocol from PVW08 based on the DDH assumption; however while the protocol in [PVW08] is in the common reference string model, the protocol in LP11 is in the plain model.

The major drawback of this approach still remains the large number of OT needed. In LP13 the author specifically handles this issue. The main idea here is that Alice can only act incorrectly by making all of the check-circuits correct and all of the evaluation-circuits incorrect (not just the majority). If this holds, the probability that each circuit is independently chosen to be an evaluation or a check circuit is 1/2, and the probability of Alice to cheat is exactly 2^-s. This implies that to obtain an error probability of 2^-40we need 40 circuits! This result is obtained using a "proof of cheating": if Bob obtains two different values in the evaluation phase, he is able to produce a proof that he received inconsistent outputs. Then if this proof is effectively valid, he receives Alice's input and can evaluate the circuit correctly.

Tuesday, November 19, 2013

How to stop the NSA from tampering with your vote

I'm just back from a two-day PhD Voting Workshop in Switzerland. A great event in many ways, from the location (all security events should be held in castles from now on!) to the food. Oh, and we talked about voting too.

Few people understand what cryptographic voting is really about but a lot of organisations and even nations are waking up to the idea of voting online. The two are of course not the same thing: by cryptographic voting we mean a lot more than SSL/TLS.

Some examples: Switzerland at the moment has neither online nor cryptographic voting at a national level, Estonia has online voting but it's not cryptographic in the way we use the word, Israel uses cryptographic but not online voting for primaries (wombat-voting.com) and Norway has an online, cryptographic (in our sense of the word) system.

The moment you decide to implement online voting, you run into the whole area of computer and internet security issues: malware, password problems, man-in-the-middle attacks, the NSA and so on. Cryptography can't solve these problems entirely but it can add an extra safeguard: verifiability.

A verifiable election typically uses a "bulletin board" on which everyone posts their ballot, usually an encrypted form of their vote. Any voter can save a copy of their ballot and check later on that it's still on the board and hasn't been tampered with. Better still, any member of the public can take a look at the board and check that each ballot there is valid and that the announced election results match the ballots on the board. This is like putting a camera in the vote-counting centre and letting the whole world watch the counting process, with the difference that everyone's vote is kept secret even though the count is public.

At the workshop this year we had many interesting talks. Some were on existing systems like the Estonian scheme, the UniVote system used at Swiss universities or the Primesvote system used by shareholders of the Swiss bank UBS. Others talked about new systems under devlopment, with two separate projects looking at bringing cryptographic voting to smartphones.

What impressed me most as opposed to some other conferences I've been at was the very practical aspect, every talk either about a system that exists for real or one the speakers are trying to build in the coming years. Cryptographic voting as an idea has been around for over 30 years now but there's still many challenges in bringing it to the market. Meanwhile, the market is definitely there if we can build systems that are usable in practice.

So what about the NSA? Well, anyone deploying online voting secured by nothing more than a padlock icon is just opening the door to them - we know that they can get around that kind of cryptography if they really want to. But Snowden also said that cryptography works "if done properly" - and I consider cryptographically verifiable voting to be the only proper way to hold elections online.

Friday, November 15, 2013

How to Search on Encrypted Data, in Practice

With the rise of cloud computing and storage in the last decade or so, many people have raised concerns about the security issues of outsourced data. And to match this, cryptographers have come up with many proposals aiming to solve all of these problems. The most well-known of these methods, Fully Homomorphic Encryption, can in theory offer a perfect solution to this, but is extremely inefficient and not currently practical for most real-world situations. Other solutions include Oblivious RAM and special forms of identity-based encryption, but these are both still pretty expensive.

Symmetric Searchable Encryption (SSE), which we looked at in this week's study group, stands at the other end of the scale. It is extremely practical, capable of handling extremely large (hundreds of GB) databases, and even has performance comparable to MySQL. However it does compromise somewhat on leakage. It is similar in some ways to deterministic encryption, but ensures that leakage only occurs when a query is made (rather than as soon as the database is uploaded), and has some clever techniques to minimise the quantity of this leakage.

Our study group focussed on the Crypto 2013 paper by Cash et al., which also describes their state-of-the-art implementation. In this post I'll cover the basic concepts and mechanisms, and explain exactly what leaks and when.

For a basic overview of the many ways in which encrypted search can be done, I recommend checking out Seny Kamara's blog, which currently has a series of articles running on this topic.

Scenario

A client holds a database (or collection of documents) and wishes to store this, encrypted, on a third party server. The client later wants to search the database for certain keywords and be sent an encrypted list of documents that contain these keywords. Note that we're not searching the entire contents of the database, only a set of keywords that have been indexed in advance (just as Google doesn't look in every HTML file on the web for your search terms, instead using precomputed data structures based on keywords). Also note that the server only returns a set of encrypted identifiers, rather than the documents themselves. In the real world the client would then request the documents from the server.

The remarkable aspect about the Crypto paper is that it aims to do the above in time proportional to the number of documents matching the least frequent keyword in the search query -- independent of the database size! Efficiency-wise, this seems optimal -- you can't do much better this, even on plaintext data. The compromise, of course, comes with leakage, which we'll look at shortly.

Warm-up: Single Keyword Search

Let's start with the simple scenario of searching for a single keyword.
To enable search time proportional to the number of documents matching the keyword, the database first needs to be structured in a special way. It is arranged into a table T, so that for every keyword w, T[w] contains a list of indices of all the documents containing w, encrypted under the key Kw, obtained by applying a PRF to w. That is:

T[w] = { Enc Kw (ind) : ind ∈ DB(w) }

where DB(w) is the set of indices of documents containing w.

Now this is not yet secure. Firstly, the keywords will be leaked to the server as they are not encrypted. Secondly, even if the keywords were somehow hidden, the structure of T will reveal the frequencies of all keywords in the database, which the server could use to perform frequency analysis. The way around these problems is to store T in a special data structure that hides the association between keywords and documents.

The data structure, called a TSet, consists of a number of fixed size buckets. Each document index in T[w] is randomly assigned to a bucket, by using a hash function H and PRFs F and F applied to w and the index of T[w], as illustrated below. A bit β is concatenated with the entry of T[w] to signal when the last element of T[w] is reached. This is all then XORed with a key (also derived from H), and added to a random empty position in the bucket, alongside a label L. L is also obtained by hashing, and is needed to identify the correct entry within a bucket. Modelling H as a random oracle ensures that the bucket and key are truly random, and using the key as a one-time pad hides all information about T[w] from the server.

The bucket sizes need to be chosen to ensure that the probability of overflow isn't too high - as long as this probability is reasonably small, the setup can just be repeated with different PRF keys until everything works. This needs a storage overhead of around 2-3x.

Searching

After this setup is done by the client, they upload the TSet data structure to the server. Searching for a keyword w is done by:

1. Client sends the search tag stag = FKw(w) to Server

2. Server sets β = 1, i = 0. While β = 1, do

Derive a bucket, label and key by hashing Fstag(i)
Search the bucket for an entry containing the label
XOR with the key to obtain β, si
Send si to the client, increment i

3. Client decrypts the indices si

Leakage

Let's stop a minute and think about how much leakage is going on here. The server can learn:

The indices of all documents containing w (the access pattern)
When repeated queries have been made (the search pattern)

Compare this with a simple solution using deterministic encryption, where the table T is stored with each w encrypted deterministically. In this case the server can straight away learn the keyword frequencies, before any queries have been made, whereas using the TSet as described above, this information is only gradually revealed with each query.

Conjuctive queries - naively

For the case of conjunctive queries, i.e. searches of the form w1 AND w2 AND w3, things are more complex. Naively, the client could perform each search separately, then compute the intersection themselves. However, this has two main drawbacks. For queries where one of the keywords is very common, e.g. "gender=male", this would require a large proportion of the database indices to be sent to the client. Moreover, the server still learns the encrypted indices of every document matching one of the keywords, which is more leakage than just the access pattern of the result.

Alternatively, you might try and have the server perform the intersection. The client could send the server the secret key Kw (derived from a PRF) for each keyword, but this would still leak all indices matching at least one keyword.

Conjunctive queries - less leakage

To get around these issues, they propose a modified protocol called Oblivious Cross-Tags (OXT), which allows the server to compute the intersection without learning (much) more information than the indices of documents matching the entire query. Impressively, they manage to do this in time proportional to the least frequent keyword in the query.

For this to work, another data structure, called an XSet, is needed. The XSet is essentially a set of all 'encrypted' pairs (w, ind), where w is a keyword appearing in document ind. The TSet from the single-query search is also augmented with an 'encrypted' index for each entry. During search, the TSet is used for the first keyword, which gives the server the encrypted indices matching this keyword. The client also sends some tokens allowing the server to compute encryptions of (w, ind) for the rest of the keywords and the encrypted indices already recovered. These pairs are then searched for in the XSet (implemented as a bloom filter) and the resulting matches sent to the client.

The cleverness lies in the special way in which the XSet pairs and the extra indices in the TSet are encrypted. Their initial idea was to use an interactive protocol based on DDH to give the search tags for the correct terms and indices to the server, but this turns out to be insecure, and also needs an extra round of computation. By precomputing the first protocol message and making some other neat changes, they get around both of these issues. I'll leave you to figure out the details from the paper (pages 12-15).

To make the running time only dependent on the least frequent keyword, you have to ensure that this keyword is always searched with the TSet, and the XSet is used for all other keywords. By storing a small state with some basic keyword statistics, it's straightforward for the client to do this every time.

Leakage

Leakage-wise, this improves a lot on the naive methods. The only situation where the server might be able to learn more than desired is when the client sends two queries with different least-frequent keywords, but the same most frequent keywords. In this case, say when (w1, w2) and then (w1', w2) are queried, the server can compute the indices matching (w1, w1'). Also, note that in a conjunctive query of several terms, e.g. (w1, w2, w3), the server can learn the number of documents matching (w1, w2) and (w1, w3) in addition to the number of matches for the entire query (this leakage doesn't seem to be mentioned in the paper, but came up in our discussion).

Conclusion

I think this technology is a really cool way of increasing the security of deterministic encryption, whilst maintaining plaintext-like search speeds, and it seems like a good alternative to things like CryptDB, which leaves much to be desired regarding security. Even so, I would advise caution to anyone considering this kind of technology. We've recently been reminded of the perils of deterministic encryption (and several other bad security practices) with the leakage of Adobe's password database, so a careful assessment of the scenario is essential.

It's worth noting that in many cases the adversary is not the storage provider, but a 3rd party who may have obtained access to their servers. Using SSE ensures that nothing can be learned when seeing a snapshot of the database, and it would typically be much more difficult for an attacker to also gain access to the individual queries. For these scenarios it could really make a lot of sense to deploy SSE.

Clearly this is an area that needs more research to assess the suitability of specific applications. Also it would be really nice to see if other technologies like FHE and ORAM can be used to reduce the leakage, whilst still maintaining a reasonable level of efficiency.

Monday, November 11, 2013

These are not the packets you are looking for.

Deep Packet Inspection (DPI) is now a common tool in the management of computer networks. On the positive side it can be used to help improve network performance but more negatively, it can aid businesses or nation states in data mining, eavesdropping and Internet Censorship. So how does it work? Most modern DPI systems use regular expressions as a means to define a fingerprint for a particular protocol. Packets can then be examined to see whether they match with this expression/fingerprint.

On the Tuesday morning of CCS Kevin Dyer explained how to fool a regular-expression based DPI system to incorrectly classify a connection as that of the attacker's choice, for example make Tor traffic look like regular HTTP traffic. The paper he presented is entitled "Protocol Misidentification Made Easy with Format-Transforming Encryption" and is joint work with Scott Coull, Tom Ristenpart and Tom Shrimpton [pdf]. To perform such misidentification attacks they introduce a new cryptographic primitive called Format-Transforming Encryption (FTE).

An FTE scheme, just like any other encryption scheme, is defined by three algorithms: key generation, encryption and decryption. The difference arises in the fact that in FTE the encryption scheme takes as input the key k, message m and in addition a message format F. The ciphertext that is output by the encryption algorithm shall be contained in the language L(F) defined by F. Similarly, for decryption the message format must be given as input. It is easy to see that this is in fact a generalisation of Format-Preserving Encryption introduced by Bellare et al. [pdf], where the message format of both the plaintext and the ciphertext are the same.

An FTE scheme can be created by using an "Encrypt-then-Unrank" construction. More specifically, first a message is encrypted using a secure authenticated encryption scheme (the implementation given in the paper uses an Encrypt-then-MAC construction with Counter mode and HMAC). Next the ciphertext is treated as an integer in {0, 1, ..., |L(F)|-1} and is encoded using a function unrank which maps the inputs to an element in the language L(F). (In order to decrypt, unrank must be a bijection with an efficiently computable inverse rank). The paper builds on this scheme by describing how to construct a full FTE record layer which can be used to transform the format of arbitrary streams of TCP traffic.

The paper gives full details of the experiments they carried showing their (near perfect) success in fooling almost all of the studied DPI systems into continually misclassifying packets (the only blip was the SSH classifier of the nProbe DPI system). Further to this they also studied the efficiency of FTE where the bandwidth overhead can be as small as 16%. Most interesting of all, in addition to the experimental results, Kevin described how they were able to successfully use the system to prevent Internet censorship. The Great Firewall of China is known to block a number of websites including Facebook and YouTube. By setting up an FTE tunnel between the USA and China, Kevin and his co-authors were able to successfully view censored websites within China. Building on this they were also able to successfully use the FTE tunnel to route Tor traffic between the China and the US (Winter and Lindskog recently showed Tor was also being actively blocked [pdf]).

For anyone interesting in using their FTE implementation it is freely available here.

Friday, November 8, 2013

The Bitcoin Improvement That Really Protects Your Privacy

At PETshop on Monday before CCS, Cédric Fournet gave a talk on how extend Bitcoin to be provably anonymous.

Recent papers show that Bitcoin only offers limited anonymity because transactions are linked by the so-called addresses in the public ledger. This allows to run graph algorithms to find clusters and thus link addresses together and possibly to people if an address was to used to buy or sell anything. There are numerous anonymizer services, some of which simply steal the coins. However, even services with better intentions cannot completely guarantee anonymity.

Zerocoin addresses the issue by having transaction only linked through secret information. More concretely, every transaction features a commitment to a random serial number, the serial number of a previous transaction, and a zero-knowledge proof that the serial number is actually hidden in one of the previous transactions. The authenticity of a transaction is given by the proof, the uniqueness of the serial number prevents double spending, and the zero-knowledge property breaks the public link between transactions. However, every proof depends on the whole history, which implies that the verification complexity grows with the ledger, whereas in Bitcoin this only depends of the last usage of the coin.

It remains to choose a commitment scheme. Zerocoin uses Pedersen commitments and double-discrete logarithm proofs, to prove knowledge of an opening of an undisclosed commitment in a given list. It uses about one minute to both generate and verify a proof. To speed this up, Cédric proposed a system called Pinocchio, which is a pairing-based non-interactive zero-knowledge proof system. This allows information-theoretic succinct proofs with eight group elements in an extension field of a 254-bit prime order field. Another possibility would be to use SHA1-based commitments, but this would make verifying inefficient.

Cédric also highlighted that verifiable computation could become the norm with such a practical system that implements succinct non-interactive zero-knowledge proofs for an NP-complete language.

Thursday, November 7, 2013

ACM CCS 2013 - Day Two

The section on Secure Multiparty Computation started with the very good presentation given by abhi shelat on “Fast two-party Secure Computation with Minimal Assumptions” (joint work with Chih-Hao Shen). He started the talk pointing out how in recent times multiparty computation have become more and more practical: since the Yao's seminal work in 1982, many different protocols have been proposed providing, in the last few years, drastic improvements and efficient implementations relevant for real-world applications.

The approach presented in the talk for secure two-party computation is essentially based on Yao's construction and on cut-and-choose technique to achieve security in the active model.

Informally speaking the protocol works as follows: we have two parties, Alice and Bob, that want to securely and privately evaluate an arbitrary function represented by a Boolean circuit C. Then one party, say Alice, prepares a “garbled” (or “encrypted”) version of the circuit C and gives it to Bob together with her appropriate input keys, in such a way that he can evaluate the circuit without learning anything about Alice's inputs and intermediate values. To achieve security against active adversary Alice creates many copies of the garbled circuit (with different randomness) and sends them to Bob. Then he asks Alice to open a fixed fraction of these circuits checking that they are correctly encoded.

Unfortunately using this technique we have three main issues to resolve: 1) Input inconsistency: we need that Alice use the same inputs in each evaluated circuits 2) Output authentication and privacy: a malicious Bob could learn Alice's output or outputs an incorrect value 3) Selective failure attack: a malicious Alice can use inconsistent inputs in the OTs and in creation of the garbled circuits to learn Bob's inputs. There are many different approaches to address these issues.

Abhi started focusing on the output authentication and privacy issue. The problem has been studied in many prior works: in [LP07] a very elegant solution has been proposed, but it requires O(k^2n ) symmetric operations (where k is the security parameter and n is the inputs size); the subsequent protocols in [LP11, Kir08, SS11] require only O(kn) symmetric operations but they also need some extra expansive operations. The new approach they proposed consists essentially in a new witness-indistinguishable proof and it does not need any extra computations or any number-theoretic assumptions. To ensure the input consistency they use an auxiliary 2-universal hash circuit; and for the selective failure attack they propose the same solution as in [LP07], but with a different and more efficient implementation. All in all they manage to obtain a two-party computation protocol in the malicious setting that is faster than prior protocols, which use the same techniques, and with fewer assumptions.

The talk concluded with a brief comparison with other works on secure computation in the malicious setting, and in particular with [NNOB12]. More specifically the protocol in [NNOB12] turns out to be more efficient in terms of computation complexity (it requires O(k/log(n) C) symmetric operations in the RA model, while the one presented in the talk requires O(kC) operations in the standard model), but it is worst in terms of communication complexity.

Finally Abhi discussed about the advantage of using garbled circuits for secure two-party computation, in particular because, using this technique, we do not have any asymptotic overhead in passing from passive to active security.

In the second talk of the section Michael Zohner presented a joint work with Gilad Asharov, Yehuda Lindell and Thomas Schnesider on More Efficient Oblivious Transfer and Extensions for Faster Secure Computation. In this work the authors describe a more efficient protocol for OT extensions in the passive model. In addition they provide specific functionalities for OT extensions especially designed for secure computation protocols and an open source optimized OT implementation.

The last talk of the section was given by Marcel. He very well presented the Bristolian work An Architecture for Practical Actively Secure MPC with Dishonest Majority (joint work with Peter and Nigel).

Tuesday, November 5, 2013

Hiding Where You Put That Data

A key issue in securing data is that encryption is often not enough. Consider a situation where you encrypt data in a database, it can turn out that your access patterns to the database can reveal a lot of information about your queries. For example suppose you look up the IBM stock price and then buy IBM stock. An attacker who does not see what you access can corelate the access location with you buying IBM stock. This leads to the question as to whether one can hide the access pattern.

The solution to this problem is to use ORAM, and that was the topic of two papers in the Tuesday afternoon of CCS. In the first the authors presented the path-ORAM algorithm. This is a method to hide access patterns. Data is held in a tree structure, with multiple blocks of data held at each node. The tree is assumed to be held on a remote server. To access a specific block the user requests one path in the tree, which contains their desired block. The server only learns the path, and not which specific block is accessed.

This deals with reading data, but we also need to write the data back. Here we re-randomize the locations of all the blocks on the retrieved path and then we re-write the path back. By clever design of this re-randomization step the method can be made incredibly efficient and practical.

In the second talk the path ORAM algorithm was implemented to create a secure processor, namely a processor which operates on secure data. The resulting processor, called PHANTOM, is novel in that it also hides the memory access patterns. This is done by using a path-ORAM implementation in hardware. That this can even be considered is an indication that path-ORAM is an efficient methodology for obtaining ORAM in practice.

The talk on PHANTOM explained some of the engineering challenges in implementing ORAM in hardware. Explaining some of the changes needed to the basic algorithm so as to efficiently realise it, given the underlying hardware constraints. A lot of use is made of pipelining, so as to ensure that operations are performed in parallel. Care is also taken to avoid timing side-channels, by creating a DRAM buffer to mitigate delays between the RAM and the secure processor. Concrete timings obtained were about 25-35 micro seconds per ORAM access, this is a slow down of around 32 to 44 times over standard (non-secured) RAM access.

Honey I Blew Up The Password File

Ari Juels today at CCS gave a presentation of his Honeywords idea (joint work with Ron Rivest) which aims to proect against server breaches of password files.

Assume an adversary determines a password file, which we can assume (for most users) allows the adversary to obtain the clear text password (since most users choose easily guessed passwords). This is a real attack which has been mounted against various companies in the past couple of years, often resulting in high profile media coverage which the companies would have rather avoided.

Ari suggested adding to the password file a set of ``fake'' passwords, called Honeywords. Suppose there are 19 such Honeywords and one real one. The adversary on performing a server breach obtains twenty possible passwords, and so if he submits one of these at random he will be detected with probability 19/20, i.e. 95 percent. Such a detection also reveals to the server that they have been compromised.

This simple idea raises two key issues. Firstly how to detect the correct password out of a set of passwords, and how to choose the Honeywords. For the first problem the authors suggest a simple form of distributed cryptography. The first server takes the input password and checks it against the set of twenty words. The index of the input password within this set is then passed to a second server, which simply returns whether the index is the index of a Honeyword or the correct password. As described the system is clearly secure against passive corruption of one of the two servers.

As for the second problem of how to choose Honeywords, the authors present a number of techniques. The main issue being that simply choosing random words as the Honeyword for some users will easily lead to the detection of the correct password.

Combined with standard hashing techniques for password storage the proposed Honeyword system seems to work well. However, an issue not discussed by the talk is what happens if an adversary actively compromises one of the two servers.

Friday, November 1, 2013

Does novel hardware require novel cryptography?

Novel hardware technology usually is introduced to overcome shortcomings of established technology but nothing is free in live so the novel technology usually has shortcomings of it's own which need to be accommodated in system design. For example, we're looking at a wave of Non-Volatile RAM (NVRAM) technologies - Flash memory being the best known but others such as Phase-Change Memory (PCM), Ferroelectric RAM (F-RAM) or Magnetoresistive RAM (MRAM) - sweeping in to replace established DRAM in embedded devices. This happens due to the relatively high power consumption of DRAM which needs to be regularly refreshed if no write operation happened within the last time frame. Contrary to that, NVRAM - as the name suggests - retains data even after power loss, removing the need for power consuming refresh operations.

However, these novel NVRAM technologies all have one major draw-back: after a certain number of write operations, they will invariably fail. Due to this, wear-levelling techniques that map logical addresses to pseudo-random physical addresses in order to spread write operations over the entire physical address space, ensuring that all memory areas have approximately the same life time. Of course, would cryptographic memory protection interfere with wear-levelling or incur additional write operations, that would be a bad thing. But do we need cryptographic protection for RAM memory? After all, we have lived happily for decades with unencrypted and unauthenticated DRAM. (Except maybe a few special cases where encryption and authentication was necessary.)

The authors of " An Efficient Run-time Encryption Scheme for Non-volatile Main Memory", Xian Zhang, Chao Zhang, Guangyu Sun, Tao Zhang and Jia Di raise an interesting issue: Cold boot attacks on unencrypted RAM become a lot easier if the RAM is non-volatile; NVRAM does not loose it's information and hence the "cold" part of cold boot attacks is not required any more. Basically NVRAM behaves similar to hard disks and several information leaks due to lost or stolen laptops have highlighted that hard disks in mobile devices should be encrypted.

Thursday, October 31, 2013

How Effective is your Side-Channel Attack?

The guiding principle of this week's study group, led by Carolyn, is the quest for adversarial confidence in the key guesses output by side-channel analysis. A common feature of differential side-channel analysis (SCA) is the divide-and-conquer strategy, in which small parts (subkeys) of the global (full) key are targeted separately, and concatenated to form a key guess. This strategy is possible since cryptographic algorithms (like block ciphers) usually, at some stage, process intermediate values that depend on just a few key bits, so that the space of possible values is a searchable size. As an example, consider an S-box in the first round of AES as the adversary's target. The S-box applies a highly nonlinear transformation to the XOR of a plaintext byte and a key byte. In this scenario the attacker knows the plaintext, so can compute the S-box output for each of the 256 possible values of the subkey. If we assume the attacker has some knowledge of the functional form of the side-channel leakage (e.g. that it’s proportional to the Hamming weight of the processed data), then he can use this to predict the leakage under each of the 256 hypotheses. Applying this technique to several collected traces (relating to different plaintexts), the attacker can compute the correlation between the 256 leakage prediction vectors and the N vectors of trace measurements, aligned at each point in time. Consequently, the largest correlation should indicate the true value of the subkey and the time point at which the S-box output is processed by the device. The attacker repeats this process for each key byte, and concatenates his 'best guesses' for each byte to form his best guess for the global key, which he then tests, for example by attempting to decrypt a known ciphertext. This will fail even if just one subkey guess is wrong, and the attacker won't know where the problem is. Thus the attacker seeks some extra information from the 'divide' stage, for example a ranking (or even a probability distribution) on the subkeys, and some measure of confidence associated with the guessed subkeys to allow him to target the weakest parts of the key in a post-SCA brute-force search.

The focal point of the study group was 'Success through confidence: Evaluating the effectiveness of a side-channel attack' [paper,slides] by Thillard et al., presented at CHES this year. In a nutshell the paper suggests an approach to computing a confidence measure for each subkey (so he knows which bytes need brute-forcing), and shows how picking robust strategies increases confidence. For the proof of concept, the authors consider an idealised attack scenario in which the attacker knows the true leakage function φ so can perfectly predict the data-dependent part of the power consumption, and the remaining variance is Gaussian noise. Assume also that there are N traces {l_i}_i=1^N corresponding to N plaintext inputs, and that it's possible to compute the target function under each subkey hypothesis and to map these to columns of power consumption predictions {φ_k,i}_i=1^N.

At SAC 2008, Rivain [paper] showed how to compute precise success rates in this setting, beginning with the observation that rankings in the correlation vector are equivalent to rankings in the alternative distinguishing vector 1/N Σ_i=1^N φ_k,i l_i. This vector is more convenient to work with since the probability distribution can be nicely characterised as a multivariate Gaussian. It's hence very easy to calculate the success rate, by computing (directly from the multivariate Gaussian cumulative distribution function) the joint probability that the distinguisher score associated with the correct key hypothesis is greater than the scores associated with each of the incorrect hypotheses. The authors also refer to the work of Fei et al. [paper] presented at CHES 2012, which showed that the distribution of distinguishing vectors can be expressed entirely as a function of the algorithmic properties of the target function (scaled by a factor depending inversely on the noise size). These properties were given the description 'confusion coefficients', and are useful because they link side-channel vulnerability to algorithmic properties of a target function.

Thillard et al. extend the notion of confusion coefficients to the setting of correlation differential power analysis (DPA), in which multiple bits are targeted by combining them in a power model leakage function (e.g. the Hamming weight or Hamming distance of intermediate values). Their proposal necessarily depends on the form of leakage and the strong assumption that the attacker knows this form of the leakage. This means that they don’t quite achieve the goal of Fei et al.’s original notion of confusion coefficients, which was to capture the impact of algebraic properties on DPA, independent of the physical implementation. Having defined correlation confusion coefficients, they combine the notion with Rivain's work on exact success rates – that is, they express the success rate as a function of the correlation confusion coefficients. They wish to investigate the stability of a key guess (to encourage confidence) - intuitively if a subkey is ranked top after n trace measurements and also after n+1, n+2, n+3 traces and so on, it's more likely to be correct than if the ranking fluctuates. Helpfully, the authors consider the evolution of the attack giving a matrix of outcomes at selected intervals as the number of traces increases, and they allow for strategies which need not output a subkey guess at all (e.g., in the case that no stable ‘winner’ emerges). They advocate selection strategies which trade off better quality guesses for more missing values (implying more brute-force searching after the side-channel phase of the attack). The confidence metric they propose for a selection strategy is the probability that it returns the correct key, divided by the total probability of it returning any guess. It can be computed by concatenating distinguishing vectors at different points in time (the resulting vector is again treated as a multivariate normal distribution). Taking more of the attack evolution into account increases the confidence of a subkey guess but also the frequency with which the attack fails to return a subkey. The authors also consider different types of selection rules, and how to deal with the inevitable false positives.

Another recent paper by Veyrat-Charvillon et al. at SAC2012 [paper] provides a method of assigning probabilities to subkeys and enumerating over the most likely global keys until the correct key is found, or the attack reaches the computational limits of the attacker. This is clearly related in its aims, but it seems that there is work remaining to be done to bridge the gap between the two papers. This continued work on SCA is taking into account the purpose of device evaluation by trying to represent true capabilities of realistic adversaries; however when this is not possible it is still useful to start at some idealised scenario, and work from there.

Saturday, October 26, 2013

The (B)ASICs of Bitcoin

"This planet has -- or rather had -- a problem, which was this: most of the people living on it were unhappy for pretty much of the time. Many solutions were suggested for this problem, but most of these were largely concerned with the movement of small green pieces of paper, which was odd because on the whole it wasn't the small green pieces of paper that were unhappy." (Douglas Adams, The Hitchhiker's Guide to the Galaxy (1979))

30-odd years later, the planet and unhappiness endure but the small green pieces of paper are teetering on the edge of archaism as much of life -- economy and trade included -- plays out increasingly in the 'digital realm'. Bitcoin -- the topic of our latest study group, led by Jake -- is a popular recent solution to the problem of making payments without recourse to anything so vulgar as physical matter; the upshot being, in place of notes and coins transferred from hand to hand one now exchanges solutions to hard mathematical problems electronically across a network.

The primary focus of the session was the technical workings of the system, but we were inevitably drawn into a variety of intriguing digressions on the 'value' of bitcoins (and the nature of value generally), the dependence of the system on majority consensus, and the difficulty of genuinely achieving the idealised goal of decentralisation with the introduction of powerful (and energy efficient) custom-built hardware miners. In what follows, I will give my best attempt at a technical overview, and then touch on some of the ideas and questions which came up in our discussions.

Bitcoin is a "peer-to-peer electronic cash system" proposed by the (presumed pseudonymous) developer Satoshi Nakamoto in 2008 [PDF] and released into the wild in early 2009. It aims to provide a network-wide-agreed payment mechanism which doesn't rely on any central authority or (as long as there's an honest majority) on mutual trust between the nodes in the network.

Jake began his explanation of the Bitcoin system by describing a transaction: A bitcoin owner possesses one or several 'addresses', each associated with a bitcoin balance and a public/private key-pair (P_k,S_k). To make a payment of (say) 8 BTC from an address with a balance of 10 BTC the owner broadcasts two transactions: one with a value of 8 BTC to the payee, and one with a value of 2 BTC back to himself. A transaction is a hash of the previous transaction, the value to be transferred, and the public key of the recipient, signed under the private key of the sender. It is recommended (but not mandated) that the sender generates a new key pair and address for the portion of the value that he 'keeps', in order to promote a degree of anonymity in the system (more about that later).

New transactions are gathered into blocks by miner nodes, at which point it becomes a race to find a difficult 'proof-of-work' for the block (see below) and broadcast it to the rest of the network. The network nodes check the proof-of-work, and the validity of the transactions in the block; if satisfied, a node drops the block it has been working on, adds the accepted block to the chain, and commences working on the next block. Because the new block contains the hash of the previous block, nodes are able to detect if they have missed a block and request it to keep their version of the chain complete. Likewise, if the chain forks (i.e., if different nodes receive different versions of the same block simultaneously), the fastest-growing branch is the one which is preserved (that is, majority consensus decides), and any nodes working on the discarded chain are able to self-correct according to the information in future blocks. (For this reason, verification checks must go back further than one step, and payees should ideally wait for 'confirmation' -- which happens when a block is sufficiently buried under new blocks -- before spending received bitcoins).

The 'proof-of-work' in this case is to find a nonce which, when hashed (twice with SHA-256) along with the rest of the block -- all new transactions, a reward transaction to the miner, a time-stamp, and some meta-data -- produces a value with a certain number of leading zero bits. The number required is specified according to the current difficulty level and evolves according to increasing computer power and varying network participation, the intention being to control the rate of block creation. Over time, new blocks become harder to mine, and the reward decreases (it began at 50 BTC, is currently at 25 BTC, and will be halved every 4 years), until the number of bitcoins reaches 21 million (estimated to happen around 2140) and thereafter remains constant.

Incentives to mine arise from the reward paid to the node which finds the proof-of-work, and an optional transaction fee which right now most nodes are willing to waive but which will play a more important role as the reward, and mining rate, decrease. Many miners cooperate in pools, which run from central servers and pay out mined bitcoins to all individuals in proportion to the computing power contributed. Incentives to obey protocol arise from the fact that only the longest chain (i.e. the one agreed on by the majority) will survive, so it is a waste of time trying to hash 'rubbish' or introduce invalid transactions: only with a colluding majority could valid transactions be rejected or invalid ones accepted.

We talked a little about the 'success' of Bitcoin as a currency. Ultimately, money only has value to the extent that it can be exchanged for 'stuff' -- "Surely use alone // Makes money not a contemptible stone" (George Herbert) -- and this extent is entirely dependent on the confidence of a community in money as a store of value and their subsequent willingness to accept it in exchange for said 'stuff'. Unsettling circular, no? The variety and quantity of goods and services available for BTC is still very limited, and prices tend to be fairly independent of the dramatically fluctuating Bitcoin-to-dollar/sterling/etc exchange rates. This reality, coupled with the deflationary bias of Bitcoin value (which national currencies work hard to avoid), provide strong incentives to hoard rather than to spend Bitcoin, so that it functions more like an investment than a currency -- potentially problematic given its lack of intrinsic value (if its value depends entirely on a community's willingness to accept it as payment, what becomes of it if nobody trades in it?)

And then there is the question of anonymity -- a motivating factor in the development of many electronic cash systems, Bitcoin included (the idea being that physical coins cannot be traced to previous owners, and one might naturally wish to emulate this functionality in an online setting). Anonymity in the Bitcoin system is achieved to a superficial degree, because the 'addresses' with which bitcoin balances are associated need not be publicly tied to an identity; however, the ability to link transactions to individual pseudonyms, coupled with the opportunity to exploit public information linked from outside sources, means that there is substantial scope for de-anonymisation (a problem which is by no means unique to Bitcoin).

We also talked about the rise of the Bitcoin ASIC miners, which are quickly concentrating network influence into the hands of a small number with exceptional computing capabilities. A participant's ability to mine essentially depends on their computing power relative to other participants. With the introduction of ASICs to the network, mining difficulty rapidly increased (to keep pace with mining speed) to the point where it is no longer cost effective for 'normal' users (with only CPU or even GPU capabilities) to mine -- the energy costs outweigh the reward. This is problematic because it undermines the Bitcoin 'philosophy' of decentralised power, by creating an elite core of uber-nodes with potential opportunity to over-rule the 'majority' because they hold the larger part of the computing power in the network. Various 'solutions' to the ASIC problem have been touted -- for example, switching to SHA-3 for future transactions. Of course, that would put the current ASICs out of useful action, but it would only be a matter of time before new SHA-3 ASICs were developed. Likewise, a dynamic system where the mandated inputs to the hash were periodically changed would only encourage the development of hardware miners able to operate with variable inputs.

Perhaps Bitcoin can never live up to the ideological hype surrounding it. But I think many of us came away from the session reinforced in our impression of it as an elegant piece of cryptography and a fascinating high-impact experiment with much to teach us about the challenges and possible solutions associated with the increasing dominance of the Internet in many aspects of personal and societal life, including the economy. (Of course, it goes without saying that the above opinions and ponderings are only my interpretation of a conversation which was not in any case intended to be authoritative or conclusive on what is a highly complex and potentially divisive topic -- I hope they will be taken in that light!)

Wednesday, October 23, 2013

Indistinguishability Obfuscation

Last week's study group was given by Joop. The theme was Indistinguishability Obfuscation and the talk centered around the recent results in [PDF].

Informally, obfuscating a program is the process of concealing what the program is doing while at the same time preserving its functionality. Among other things this serves to protect against code tampering and reverse engineering. Barak et al. [PDF] showed a number of interesting cryptographic applications of such a paradigm including: constructing homomorphic encryption and removing random oracles [PDF].

In a bit more details, an obfuscator O is an efficient (probabilistic) compiler that takes a program (or a circuit) P and produces an intelligible version O(P) of P with the same functionality as that of P. This is formalized by viewing O(P) as a black-box in the sense that whatever O(P) can compute, can also be computed given an oracle access to the original program P.

Barak et al. [PDF] showed that the above formalism of obfuscation is problematic and is, in fact, infeasible. They showed that for some families of functions with a property Π : ℱ→ {0,1}, there is f ∈ ℱ where given any program that computes f, Π(f) can be efficiently computed. However, Π(f) cannot be efficiently computed given oracle access to f other than by random guessing. Thus, they suggested the following weaker definition termed indistinguishability obfuscation to replace the above formalism:

An indistinguishability obfuscator iO for a class of circuits C ensures that given any two equivalent circuits C₁∈C and C₂∈C, the two distributions iO(C₁) and iO(C₂) are indistinguishable.

The covered paper provides an indistinguishability obfuscator constructions for NC¹ (i.e. all polynomial-size, log-depth circuits) using Multilinear Jigsaw Puzzles. It exploits the theorem proven by Barrington [PDF] that proves any circuit in NC¹ can be transformed into an oblivious branching program using 2k square matrices M⁰₁,…, M⁰_k and M¹₁,…, M¹_k. Therefore, evaluating C on input x where |x|=l can be done via the following matrix product equation C(x) =0 iff ∏_i=1^kM_i^x_f(i)= I.
This observation forms the basis of transforming any NC¹ circuit into a branching program.
Barrington also showed how to transform any circuit {0,1}^m → {0,1} of depth d into a width-5 permutation branching program of length n≤4^d. The program is defined by the tuples (inp(i), A_i,1, A_i,0) for 1≤i≤n, where inp(i) is the function inp: [n]→[m] describing the bit of the input examined at step i, and the matrices A_i,b are 5x5 permutation matrices. For any input (b_l...b_m) computing the program on the input corresponds to computing P=∏_i=1ⁿA_{i,b_inp(i)} an returning 1 if P=I (i.e. the identity matrix) and 0 otherwise.

Now, suppose that two parties Alice with input x and Bob with input y want to evaluate a circuit C∈NC¹ on their inputs, i.e. jointly compute C(x,y).

The construction is based on an earlier scheme by Kilian [PDF] which is as follows: Alice computes a randomized version of the branching program for the circuit by choosing n invertible matrices R₁,...,R_n and computes their inverses. Then she sets Ã_i,b=R_i-1 A_i,b R_i^-1 for i=1,...,n and b ∈ {0,1}. Alice now sends to Bob the matrices corresponding to her input x. Also, the two parties run an oblivious transfer protocol so that Bob obtains the matrices corresponding to his own input y. By putting the matrices in the right order, Bob can now compute C(x,y).

The OT protocol ensures that Alice does not learn Bob's input. On the other hand, the randomization step ensures that Bob also does not learn Alice's input.

Here Alice is regarded as the obfuscator, whereas Bob is the evaluator.

The authors classify possible attacks on Kilian's scheme into 3 categories as follows:

Partial Evaluation Attacks: The adversary computes and compares partial products corresponding to the same input x but different inputs y and y'. Thus, learning some information about x.

Mixed Input Attacks: The adversary performs the matrix product correctly but does not respect the input function.

Non-Multilinear Attacks: The attacker either computes non-multilinear functions over the matrices or does not conform to the algebraic structure of the matrices.

To apply this to Poly-sized circuits, the authors combine the above obfuscator for NC¹ with a fully homomorphic encryption scheme (FHE). The obfuscator generates to pairs (SK₁,PK₁) and (SK₂,PK₂) of secret/public keys for the FHE scheme. The circuit C is then encrypted under both public keys. Both public keys, both ciphertexts and an NC¹ obfuscator for decrypting using SK₁ are published. To evaluate the program on an input x, the evaluator uses the Eval algorithm of the FHE scheme on x and both ciphertexts to get the ciphertexts e₁ and e₂ of C(x) under both public keys. The evaluator keeps a track of all the intermediate bits in the evaluation phase which acts as a proof. The evaluator then feeds the proof along with x, e₁ and e₂ to the NC¹ obfuscator. If ciphertexts e₁ and e₂ are consistent, the obfuscator decrypt e₁ using SK₁ to reveal C(x).

Since the proof ensures that e₁ and e₂ are always consistent, we can switch back and forth to decrypting using the other secret key and thus realize the required indistinguishability.

Sunday, October 20, 2013

Secure End-to-End Communication with Optimal Throughput and Resilient Against Malicious Adversary

On the third day of DISC'13, the second session was on "Crypto, Thrust and Influence". The session started with my talk on the paper titled "Asynchronous Multiparty computation with Linear Communication Complexity". The next talk was by Paul Bunn on his work "Secure End-to-End Communication with Optimal Throughput and Resilient Against Malicious Adversary" with Rafi Ostrovsky. This post briefly covers the paper by Paul Bunn and Rafial Ostrovsky.

In the paper, the authors investigate the feasibility of routing or end-to-end communication in a network in which neither the nodes nor the links are reliable. Namely, for the network links, they do not assume any form of stability: the topology of the network is dynamic (links may spontaneously fail or come back to life at any time), transmission time across each link may vary from link to link as well as across the same link from one transmission to the next (i.e. asynchronous edges), and there is no guarantee enough links are available (even over time) for communication to even be possible. Unreliability of network nodes means that they may actively and maliciously deviate from protocol specifications, attempting to disrupt communication as much as possible. In particular, a malicious adversary may corrupt an arbitrary subset of nodes, taking complete control over them and coordinate attacks to interfere with communication between the uncorrupt nodes. Finally, the nodes are corrupted by polynomially bounded adversaries.

It is not hard to see that very few guarantees can be achieved by any protocol that is forced to operate in networks with so few assumptions. So in many situations, it may be possible that there is no robust protocol at all. Therefore, instead of measuring the efficacy of a given protocol in terms of its absolute performance for robustness, the authors employ competitive analysis to evaluate protocols: the throughput-performance of a given protocol with respect to the network conditions encountered is compared to the performance of an ideal protocol (one that has perfect information regarding the schedule of active/inactive links and corrupt nodes, and makes perfect routing decisions based on this information). Indeed, the combination of the strong notion of unreliability (of both edges and nodes) together with the use of competitive analysis provides a meaningful mechanism to evaluate routing protocols in networks that demonstrate unreliability in unknown ways. Competitive analysis is an existing and useful tool for evaluating protocols in unreliable networks and distributed algorithms with shared memory computation.

They present a protocol that routes effectively in the above network setting, utilizing standard cryptographic tools to guarantee correctness (i.e. Receiver gets all of the messages from Sender, in-order and without modification) with low memory burden per node. They show that their protocol achieves optimal throughput while evaluating the throughput-efficiency of their protocol using competitive-analysis. Specifically, they prove the following theorem.

Theorem. Assuming Public-Key Infrastructure and the existence of a group homomorphic encryption scheme, there exists a routing protocol that achieves correctness and optimal competitive-ratio in a distributed asynchronous network with bounded memory Θ(n²) and dynamic topology (and no connectivity assumptions), even if an arbitrary subset of malicious nodes deliberately disobey the protocol specifications in order to disrupt communication as much as possible.

The memory usage of their protocol matches the memory requirements of the protocols that do not provide security against corrupt nodes. Overall, the talk as well as the work appear pretty interesting.

Monday, September 30, 2013

Formal Verification of Security Properties in RTL Hardware Designs

Once more, the entire topic of embedded systems security is awfully under-represented at the otherwise excellent¹ Embedded Systems Week. Only the fairly small but highly interesting WESS Workshop addressed the topic properly; one of the most interesting papers presented at WESS is "Automatized High-Level Evaluation of Security Properties for RTL Hardware Designs" by Andrea Höller (who also gave the talk), Christopher Preschern, Christian Steger, Christian Kreiner, Armin Krieg, Holger Bock and Josef Haid.

Let me start by quickly explaining what "RTL Hardware Designs" are. "RTL" stands for "Register-Transfer Level" and basically defines the logic design phase within the circuit (or system) design process. At the register-transfer level, hardware description languages such as Verilog or VHDL are used to describe the logic circuits that operate on data flows between registers. That means, the RTL phase comes after the functional description of a circuit/system has been developed and after the system components and their interfaces have been specified; at the RTL level, the system components get "implemented" and the finished RTL design gets then translated into a netlist using a specific hardware library, placed and routed.

Formal Verification is in some sense the holy grail of security researchers and engineers: Ideally, we would like to have formal proofs and assurances all the way from the theoretic foundations down to the most obscure implementation detail of security-relevant products. In other words, we would like to have an exhaustive model of the product based on which we can check that all required security properties are provided by the product at all times and under all circumstances. This requires two steps:

Extraction of an accurate model from the implementation.
Model checking, i.e. verification whether the model maintains the specified security properties.

In one way this is nice because it can be automated to a large degree and thus removes the human error factor from the security equation but: Exhaustive formal verification is extremely difficult at best and, quite often, absolutely infeasible. So it will never replace implementation-specific and experimental security verification. But we can use formal verification to cut down on development costs and time-to-market; the sooner security problems are identified during the design stage the easier (and cheaper) they are to fix. With each fixed security problem the chance that your prototype manages to pass (e.g. Common Criteria) security evaluation increases.

In some points the Common Criteria security evaluation process already requires formal verification techniques to be employed. For example, each product needs to come with a model extracted from its functional specification and a proof that the model (and therefore the functional specification) indeed maintains all required security properties. But as far as formal verification of hardware design is concerned, this is all that is required by Common Criteria, leaving a large gap towards exhaustive formal verification of the finished product. This is where Höller et al.'s work comes into play. They show a possible extension to the Common Criteria verification process by extracting a verifiable model from the RTL specification and using conventional model checkers such as NuSMV (http://nusmv.fbk.eu/) together with the LTL (linear temporal logic) property specifications to verify whether the RTL implementation still meets the security requirements. It should be noted that the LTL property specifications are already required within the existing Common Criteria process and can be reused for Höller et al.'s additional verification step. In the second part of their paper Höller et al. show how to use the RTL model checking to check whether an implementation is resistant against certain types of faults.

My personal best paper award for WESS definitely goes to Höller et al.

---
¹) Disclaimer: The mainline conferences haven't actually started yet. My assessment of ESWeek's excellency is based on past experience.