## Wednesday, June 10, 2015

### 52 Things: Number 36: Index Calculus Algorithm

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the mathematical attacks with a description of an index calculus attack...

What is the objective?
An index calculus attack is a method for trying to solve the discrete logarithm problem (DLP).  Very briefly, it works by writing the target value as the product of powers of elements in a factor base, elements whose logarithm is already known, then extract the target value through laws of logarithms. We now proceed to explain what that means in a bit more detail.

How does it work?
The algorithm can be applied to calculating the discrete logarithm for an arbitrary element $h$ any group $G=\langle g \rangle$. We will rely on the fact that if $x^ay^bz^c=1$, then $a*\log_g(x)+b*\log_g(y)+c*\log_g(z)=\log_g(1)=0$. So, if we can find some collection of $x_i$ who's logarithms are all known values $L_i=\log_g(x_i)$ and somehow manage to write $h=x_1^{a_1}\dots x_r^{a_r}$, then we know that $\log_g(h)=a_1*L_1+\dots+a_r*L_r$. The index calculus attack exploits this, and the efficiency (or inefficiency) of the attack comes down to how fast the various stages of this can be done. For context, alongside the generic technique, we will follow an example in terms of the discrete logarithm over the group $\mathbb{Z}/p\mathbb{Z}$ with generator $g$, the most common application. Being a little lazy, we will use the terms "offline computation" and "precomputation" interchangeably to refer to work that need only be done once per group.  Similarly "online" and "everytime" work corresponds to work that must be done for every DLP required.

(Precomputation, basically free) Choose a Factor Base.
The factor base is a collection of elements ${b_0=g,b_1,\dots,b_r}\in G$. How to pick them, and how many to pick, are dependant on the group we're working over and the running times of the later stages. Indeed, simply choice of $r$ generally leads to a trade-off between expensive online (small $r$) and offline (large $r$) computation. Working within our example, one would generally pick $-1$ and the first $r$ primes, since these tend to make the online calculations more efficient (see below).

(Precomputation, expensive but very parallel) Find relations between the DLPs of the Factor Base elements.
Using whatever techniques we can (generally just taking arbitrary products and hoping to get lucky!) we find equations in terms of the different factor base elements relating them to both each other. By taking logs, these translate into linear relations between their discrete logarithms. We continue searching for these until we have found $r$ independent relations, which clearly takes longer the bigger we make $r$. That said, this can easily be done in parallel by simply asking each process to search independently and then merging the result sets. Our example works in exactly this way.

(Precomputation, relatively cheap) Solve the Factor Base DLPs
From the previous step, we have a number of linear relationships between the DLs of the factor base elements. In particular, we have $r+1$ equations in $r+1$ variables (since $\log_g(g)=1$ is known a priori), and so can solve to find all their logarithms. Whilst this requires using a large matrix solver, it tends to be basically free compared to the previous and next steps, since solving linear equations is much more efficient than the almost exhaustive nature of searching for relations.

(Online, expensive but very parallel) Write $h$ as a product of factor base elements
We now try and find a value $y$ and a list $a_i$ such that $h g^y = b_1^{a_1} \dots b_r^{a_r}$. This can easily be done in parallel, since each process tries a different collection of $y$ values,   stopping as soon as one of them. Once that's done, we simply take logs across each value, meaning:
$$\log_g(h) = -y + L_1a_1 + \dots + L_r a_r$$
Now, I've skimmed over a big issue in that previous paragraph: how do we find this $y$? Well, in the case of our example its not too bad. Because the factor base were all small primes, we simply try and factor $hg^y$ using traditional division-like techniques. However, in other groups this can be very difficult indeed, and computationally impractical.

A very brief conclusion
So, the Index Calculus algorithm uses the fact that taking logarithms transforms multiplications into sums to try and find the discrete logarithm of a particular point. It does this by building up a table of known results (the factor base), then finding an element related to the target that can be easily written in terms of these. As such, the algorithm is very generic, and by changing the size of the factor base $r$ one recovers a number of obvious classical attacks. However, picking a value of $r$ such that every stage of the computation can be done efficiently is generally not possible, since either the precomputation or online computation (or often both!) will be prohibitively expensive.