I know that modulo of power of 2 can be calculated using bitwise operator
x % 2^n == x & (2^n - 1).
But I am wondering is there any generalized bitwise algorithm exists to find the modulus of any number is not a power of 2. For example,
7%5
Thank you in advance.
There are a couple, for special cases, including 5.
Since 16 ≡ 1 (mod 5), a trick you could do is split your variable into 4-bit nibbles, look up the modulus of each nibble in a table, and add the values together to get the modulus of the original number.
This program uses bitfields, table lookups, and addition. It would also work for modulo 3 or 15 and could be extended to larger chunks with a bigger lookup table.
For finding the modulus of a bignum, you can take advantage of the fact that any power of 16 is congruent to 1 modulo 5. Therefore, whether your word size w is 2⁸, 2ⁱ⁶, 2³² or 2⁶⁴, you can write your bignum as a₀w⁰ + a₁w¹ + a₂w² + ... ≅ a₀1⁰ + a₁1¹ + a₂1² + ... ≡ a₀ + a₁ + a₂ + ... (mod 5). This is also why the sum of the decimal digits of any number is congruent to the original number modulo 3 or 9: 10 ≡ 1 (mod 3).
This also works for 3, 5, 15 and 17 on bytes, factors of 255 and 257 on 16-bit words and factors of 65,535 and 65,537 on 32-bit words. If you notice the pattern, it's because b²ⁿ = (bⁿ+1)(bⁿ-1) + 1, where b = 2 and n = 2, 4, 8 or 16.
You can apply a variant of this method to any n such that your chunk size is congruent to -1 (mod n): alternate addition and subtraction. It works because a₀w⁰ + a₁w¹ + a₂w² + ... ≡ a₀(-1)⁰ + a₁(-1)¹ + a₂(-1)² + ... ≡ a₀ - a₁ + a₂ - ... (mod n), but is less useful because many such values of n are Mersenne primes. It’s similar to how you can take mod 11 of any decimal by going right to left and adding, subtracting, adding and subtracting the digits, e.g. 144 ≅ 4 - 4 + 1 ≡ 1 (mod 11). Just like with digits, you could do the same trick with five-bit chunks, since 32, like 10, is also congruent to -1 modulo 11.
Another useful special case occurs when w ≡ w² ≡ c (mod b). Then you have a₀w⁰ + a₁w¹ + a₂w² + ... ≡ a₀·1 + a₁c + a₂c + ... ≡ a₀ + c(a₁ + a₂ + ...) (mod b). This is analogous to how 10 ≡ 100 ≡ 1000 ≡ ... ≡ 4 (mod 6), so any number is congruent to its last digit plus four times the sum of its remaining digits, modulo 6. The computation can be a lookup and an addition per byte, and one multiplication by a small constant that you can do with a bit shift or two. For example, to take mod 20, you can add all but the lowest-order bytes mod 20, multiply the sum by 256 mod 20 = 16, which is just a left shift of 4, then add the final byte. This can be very convenient: not counting numbers that give remainders of 1 or 0, this works with nibbles modulo 6, 10 and 12, and with bytes modulo those values and 20, 24, 30, 34, 40, 48, 60, 68, 80, 96, 102, 120, 136, 160, 170, 192, 204 and 240.
If a number can be expressed as the product of special cases, you can solve for it using the Chinese Remainder Theorem. For example, 77 = 11×7, 32 ≡ -1 mod 11, and 8 ≡ 1 mod 7, so you could find the remainders divided by 11 and 7, which determine the remainder divided by 77. Most small prime numbers fall into one of the special cases previously discussed.
Many later RISC architectures had hardware divide but not modulus, and told programmers to compute
a%b
by computinga-(a/b)*b
. ARM A64 is the one most in use today. If you don’t have hardware division either, check out this answer. An example of another approach when the base is a small constant is given here, and is widely-used on CISC architectures.There is also an algorithm written by Sean Anderson in 2001 but probably discovered earlier to compute the modulus by a number one less than a power of 2. It’s similar to the technique I used above, but relies on bit shifts and could be extended to any factor of
(1<<s)-1
. That’s almost what you’re looking for!Generally, your optimizing compiler should be using the most efficient method to implement
%
on your hardware already. In your example, any decent compiler will just fold the constants and optimize7%5
to2
.No, there is no generalized approach for finding division remainders without actually doing division.
Powers of two are an exception because of binary representation, which lets you divide by two using shifts. The same principle is in play as the one that lets you divide decimal numbers by powers of ten simply by dropping digits off the end.
Obviously, nothing stops you from coding up division using bit operations. You would need to code subtraction, too, because the algorithm requires it as a "primitive operation". As you can imagine, this is going to be very slow.