*This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the mathematical attacks with a description of an index calculus attack...*

**What is the objective?**

An index calculus attack is a method for trying to solve the discrete logarithm problem (DLP).
Very briefly, it works by writing the target value as the product of
powers of elements in a factor base, elements whose logarithm is already
known, then extract the target value through laws of logarithms. We now
proceed to explain what that means in a bit more detail.

**How does it work?**

The algorithm can be applied to calculating the discrete logarithm for an arbitrary element $h$ any group $G=\langle g \rangle$. We will rely on the fact that if $x^ay^bz^c=1$, then $a*\log_g(x)+b*\log_g(y)+c*\log_g(z)=\log_g(1)=0$. So,
if we can find some collection of $x_i$ who's logarithms are all known
values $L_i=\log_g(x_i)$ and somehow manage to write $h=x_1^{a_1}\dots
x_r^{a_r}$, then we know that $\log_g(h)=a_1*L_1+\dots+a_r*L_r$. The
index calculus attack exploits this, and the efficiency (or
inefficiency) of the attack comes down to how fast the various stages of
this can be done. For
context, alongside the generic technique, we will follow an example in
terms of the discrete logarithm over the group
$\mathbb{Z}/p\mathbb{Z}$ with generator $g$, the most common
application. Being a little lazy, we will use the terms "offline
computation" and "precomputation" interchangeably to refer to work that
need only be done once per group. Similarly "online" and "everytime"
work corresponds to work that must be done for every DLP required.

*(Precomputation, basically free) Choose a Factor Base*.

The
factor base is a collection of elements ${b_0=g,b_1,\dots,b_r}\in G$.
How to pick them, and how many to pick, are dependant on the group we're
working over and the running times of the later stages.
Indeed, simply choice of $r$ generally leads to a trade-off between
expensive online (small $r$) and offline (large $r$) computation. Working
within our example, one would generally pick $-1$ and the first $r$
primes, since these tend to make the online calculations more efficient
(see below).

*(Precomputation, expensive but very parallel) Find relations between the DLPs of the Factor Base*

*elements.*

Using whatever techniques we can (generally just taking arbitrary products and hoping to get lucky!)
we find equations in terms of the different factor base elements
relating them to both each other. By taking logs, these translate into
linear

*relations*between their discrete logarithms. We continue searching for these until we have found $r$ independent relations, which clearly takes longer the bigger we make $r$. That said, this can easily be done in parallel by simply asking each process to search independently and then merging the result sets. Our example works in exactly this way.*(Precomputation, relatively cheap) Solve the Factor Base DLPs*

From the previous step, we have a number of linear relationships between the DLs of the factor base elements. In
particular, we have $r+1$ equations in $r+1$ variables (since
$\log_g(g)=1$ is known a priori), and so can solve to find all their
logarithms. Whilst this requires using a large matrix solver, it tends
to be basically free compared to the previous and next steps, since
solving linear equations is much more efficient than the almost
exhaustive nature of searching for relations.

*(Online*

*, expensive but very parallel*) Write $h$ as a product of factor base elements
We
now try and find a value $y$ and a list $a_i$ such that $h g^y =
b_1^{a_1} \dots b_r^{a_r}$. This can easily be done in parallel, since
each process tries a different collection of $y$ values, stopping as
soon as one of them. Once that's done, we simply take logs across each
value, meaning:

$$\log_g(h) = -y + L_1a_1 + \dots + L_r a_r$$

Now,
I've skimmed over a big issue in that previous paragraph: how do we
find this $y$? Well, in the case of our example its not too bad. Because
the factor base were all small primes, we simply try and factor $hg^y$
using traditional division-like techniques. However, in other groups
this can be very difficult indeed, and computationally impractical.

**A very brief conclusion**

So, the Index Calculus algorithm uses the fact that taking logarithms transforms multiplications into sums to try and find the discrete logarithm of a particular point. It does this by building up a table of known results (the factor base), then finding an element related to the target that can be easily written in terms of these. As such, the algorithm is very generic, and by changing the size of the factor base $r$ one recovers a number of obvious classical attacks. However, picking a value of $r$ such that every stage of the computation can be done efficiently is generally not possible, since either the precomputation or online computation (or often both!) will be prohibitively expensive.

## No comments:

## Post a Comment