I'm confused about how one might supposedly bypass the final subtraction of the modulus in radix-2 montgomery modular multiplication, when used in a modular exponentiation algorithm. The following two papers put forward the conditions for bypassing the subtraction.
- Montgomery Exponentiation with no Final Subtractions: Improved Results
- Montgomery Multiplication Needs no Final Subtractions
I don't understand exactly what is required in terms of the "preprocessing and postprocessing" to eliminate the need for the repetitive subtraction of the modulus at the end of the montgomery multiplication.
My understanding after reading the above papers is that, to eliminate the final subtractions, you must:
Zero-extend each input operand to the modular exponentiation by two
e.g. new[2049 downto 0] = (0, 0, old[2047 downto 0])
- Increase the loop bound inside the Montgomery multiplications by two, such that two more iterations of the loop are executed
I've made these modifications to a working algorithm, however the results are not as I expect and I do not understand why. Therefore, I assume I am misinterpreting something in these papers, or leaving out a critical step.
Let us refer to my (working) radix-2 montgomery modular exponentiation function in (C-like pseudocode). Note that I have extended the operand width by two most-significant zero digits (just to make sure I'm not overflowing). They used to only be 2048 bits.
let NUM_BITS = 2048
let rsaSize_t be a 2050-bit vector type
// Montgomery multiplication: outData = XYr^(-1) modulo M,
// where the radix r=2^n (n=NUM_BITS)
function montMult( rsaSize_t X, // Multiplier
rsaSize_t Y, // Multiplicand
rsaSize_t M, // Modulus
rsaSize_t outData) // Result
{
rsaSize_t S = 0; // Running sum
for (i=0; i<NUM_BITS; i++)
{
if (X.bit(i)==1) // Check ith bit of X
S += Y;
if (S.bit(0)==1) // check LSB of S
S += M;
S = S >> 1; // Rightshift 1 bit
}
// HERE IS THE FINAL SUBTRACTION I WANT (NEED) TO AVOID
if (S >= M)
{
S -= M;
}
outData = S.range(NUM_BITS-1,0);
}
// montgomery modular exponentiation using square and multiply algorithm
// computes M^e modulo n, where we precompute the transformation of the
// base and running-partial sum into the montgomery domain
function rsaModExp( rsaSize_t e, // exponent
rsaSize_t n, // modulus
rsaSize_t Mbar, // precomputed: montgomery residue of the base w.r.t. the radix--> (2^2048)*base mod n
rsaSize_t xbar, // precomputed: montgomery residue of 1 w.r.t. the radix--> 2^2048 mod n
rsaSize_t *out) // result
{
for (i=NUM_BITS-1; i>=0; i--)
{
montMult(xbar, xbar, n, xbar); // square
if (e.bit(i)==1)
montMult(Mbar, xbar, n, xbar); // multiply
}
// undo montgomery transform
montMult(xbar, 1, n, out);
}
Am I missing something in the papers? I do not believe this is an implementation error, as my code matches exactly what is put forth in the papers. I believe that I might be a conceptual error. Any and all help appreciated.
Thanks!