## Tuesday, April 28, 2015

### Eurocrypt 2015: The Sum Can Be Weaker Than Each Part

Tuesday's Real track began with a session on hash functions. The second of the two talks was about the SPHINCS signature scheme, a paper Ryan recently blogged about [A]. The first was presented by Gaetan Leurent about work coauthored with Lei Wang [1], and is the subject of this blog. It continues the the study of how you can combine two hash functions to make something that protects against a weakness in one of the hash functions...
The Problem
There are countless times when schemes assume they have access to a secure hash function, and may fail catastrophically if this is not met (a secure hash function should be collision and preimage resistant [B]). However, the hash functions we actually use are always being attacked, and as these attacks get better the security provided decreases. So, designers often choose to combine the output of two different hash functions $H_1,H_2$, hoping that to break the combination the attacker will have to break both of them. The two most intuitive methods for doing this are concatenation and xor: $C(m)=H_1(m)||H_2(m)$ and $X(m)=H_1(m)\oplus H_2(m)$. Generally (as will be assumed for the rest of this blog) both $H_1,H_2$ are "narrow pipe" (their internal state is the same size as their output) MD [C] hash functions (eg MD5,SHA1,SHA2 etc). However, it is important to keep in mind that this is not the only way of building a hash function.

Past work
In his multicollisions paper[2], Joux showed that $C$ was no stronger than an ideal hash function (despite having double the output size). Broadly speaking, this proved that the concatenation combiner was, in some sense, no worse than using either function separately. Hoch & Shamir [3] extended this result, showing that even if the hash functions were in some sense "partially broken", reasonable security could be attained.
In terms of the Xor combiner, the Hoch&Shamir result implies security is up to $2^{n/2}$ for preimage resistance, but no attack has been found close to this bound.

New Contribution
Leurent & Wang provide a preimage attack of complexity roughly $2^{5n/6}$. They do this by creating a collection of messages $M_{a,b}$ such that $H_1(M_{a,b})=a$ and $H_2(M_{a,b})=B$. This is done using a novel techinque of "switching" and "interchanges". Having done this, they choose $A,B$ and a message $m'$ such that $X(M_{a,b}||m)=h_1(A,m) \oplus h_2(B,m)$ is the target digest (where $h_i(x,y)$ is the internal routine of $H$, initialised with state $x$ processing message $y$). This choice of $A,B$ can then be made using a birthday-like attack, leading to the sub-exhaustive complexity.

An interesting consequence of this is that the xor of two secure hash functions (who would have preimage security up to $2^n$) is less secure than either one individually.In the paper, they also demonstrate that if the hash functions are not completely secure, this attack can be improved to roughly $2^{n/2}$, demonstrating Hoch & Shamir's results are in fact tight. So, when the functions are secure, the xor combiner reduces the security provided, but once they begin to break down it is likely it will start to assist, keeping security at the 2^{n/2} level.

PS: A bit more detail on "switching"
Let me try to briefly explain how the interchange system is created. Consider the internal states whilst calculating $H_1(M)$. These form an obvious chain, and we can think of this as our base line $L_1[0]$. Applying an arbetrary change to the first block of $M$, we will get a second chain that is in some sense parallel to the original one, but offset by this first change (think of this as line $L_1[1]$). Doing the same for $H_2(M)$, we get another collection of lines $L_2$.

Now, using multicollisions, the authors demonstrate it is possible to find a sequence of message blocks that when appended to the inputs of both hash functions increase your line number in one of the collections without changing it in the other. So, for example, we find a multicollision that lets us transition from $(L_1[0],L_2[0])$ to $(L_1[1],L_2[0])$. Having done this for enough pairs, it becomes possible to get from the initial states to any one of $2^{t^2}$ possible states (where $t$ is the number of lines) after just $2^{2t}$ multicollisions. This is how we create the messages $M_{A,B}$ efficiently enough that it does not dominate the attack.

Errata:
A previous version incorrectly stated that the xor combination of two hash functions is robust under collision resistance. The combiner is robust for combining two PRFs[4], but this is a very different result.