This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This next blog continues the topic of 'Cryptographic Implementation Details'.
The first sub-stage to of the function is to calculate t = abar*bbar. This is done by multiplying abar and bbar together to give a 128 bit product. An additional function could be written to do this which takes abar and bbar and returns two 64bit values, one with the high 64 bits (thi) and one with the low 64 bits (tlow).
The next stage is the computation of u = (t + (tm’ mod r)m)/r. Here t is a 128 bit integer and m’ is a 64 bit integer. As we mod by r, only the lower 64 bits of tm’ are required, meaning that we can disregard the top 64 bits of t.
if (ulo < tlo) uhi = uhi +1; // test for overflow from ulo and add if necessary to uhi
ov = (uhi < thi) | ((uhi == thi) & (ulo < tlo)); // check for carry
if(ov > 0 || ulo >= m) // test if there was overflow or ulo is higher that
ulo = ulo – m;
return ulo;
mulul64(p, rinv, &phi, &plo); // performs multiplication and returns two 64 bit values phi and plo
p = modul64(phi, plo, m); // returns value of 128bit input mod m
[1] http://www.hackersdelight.org/MontgomeryMultiplication.pdf
[2] http://www.ucl.ac.uk/~ucahcjm/combopt/ext_gcd_python_programs.pdf
In this post I will aim to compliment
what we saw last week regarding the more theoretical aspects of Montgomery
arithmetic with a practical implementation of it. The implementation is written
in C and written for a computer with a word size of 64 bits. The moduls m can therefore be as large as 264 –
1 and a and b can be as large as m – 1. We will take r = 264. As in
the previous post, most of what is given here is derived from [1] so please
refer to this for more information.
You will remember from the last
blog post, that four steps were given to the algorithm (please see post for a
full description of the algorithm if this is hazy). For the purposes of our implementation,
we will eamine each of these stages separately.
1. The
GCD Operation
This function uses the binary
extended gcd operation to find the integers r-1 and m’ such that rr-1
= 1 + mm’. These integers are required for the subsequent steps of the Montgomery
reduction. The algorithm takes r and m and pointers to values r-1
and m’ which the algorithm derives. This is done using the extended gcd
algorithm which could be implemented in a number of ways (the purpose of which
this blog is not about) and I refer the reader to [1] or [2] for detailed
descriptions of it.
2. Transform
the Multipliers
The second stage aims to compute
abar = ar mod m and bbar = br mod m. As r = 264, no multiplication is
required but only a shifting by 64 bits (due to our selection of r = 264),
giving a 128 bit output with 64 0s tailing the value of a or b. The remainder
of the division by m is then given as abar or bbar. A function which takes the high 64 bits (x) and
the low 64 bits (y) and the value for m (z) and returns the 64 bit result could
be written to do this. Such a function could be defined as follows:
uint64 modul64(uint64 x,
uint64 y, uint64 z);
Where uint64 is a type defined as follows:
typedef
unsigned long long uint64;
3. Montgomery
Multiplication
The function which carries out
the Montgomery multiplication can be defined as a function which takes the 64
bit values abar, bbar, m and mprine and returns a 64 bit value for the output
of the Montgomery multiplication step.The first sub-stage to of the function is to calculate t = abar*bbar. This is done by multiplying abar and bbar together to give a 128 bit product. An additional function could be written to do this which takes abar and bbar and returns two 64bit values, one with the high 64 bits (thi) and one with the low 64 bits (tlow).
The next stage is the computation of u = (t + (tm’ mod r)m)/r. Here t is a 128 bit integer and m’ is a 64 bit integer. As we mod by r, only the lower 64 bits of tm’ are required, meaning that we can disregard the top 64 bits of t.
tm = tlo*mprime; // Disregard thi
mulul64(tm, m, &tmmhi, &tmmlo); // Function which
performs 128 bit multiplication
The subsequent multiplication by m gives an output of 128 bits and the
addition of t an output of 129 bits, which can be carried out as 128bit + 128bit
= 128bit output and compute the carry separately. In C:
ulo = tlo + tmmlo;
uhi = thi + tmmhi;if (ulo < tlo) uhi = uhi +1; // test for overflow from ulo and add if necessary to uhi
ov = (uhi < thi) | ((uhi == thi) & (ulo < tlo)); // check for carry
The last step is the calculation
of if(u >=m) then return u – m; else return u. This is shown below:
ulo = uhi;
uhi = 0;if(ov > 0 || ulo >= m) // test if there was overflow or ulo is higher that
ulo = ulo – m;
return ulo;
4. The
Inverse Transformation
In the final stage we compute ur-1
mod m which is the product of a and b modulo m. This could be done by calling
the following functions:mulul64(p, rinv, &phi, &plo); // performs multiplication and returns two 64 bit values phi and plo
p = modul64(phi, plo, m); // returns value of 128bit input mod m
This gives the output p which is
the 64 bit output equal to ab mod m and the Montgomery reduction is complete.
[1] http://www.hackersdelight.org/MontgomeryMultiplication.pdf
[2] http://www.ucl.ac.uk/~ucahcjm/combopt/ext_gcd_python_programs.pdf
No comments:
Post a Comment