Performance implications of using Java BigInteger

2019-07-06 22:18发布

问题:

We have an interesting challenge. We have to control access to data that reside in "bins". There will be, potentially, hundreds of thousands of "bins". Access to each bin is controlled individually but the restrictions can, and probably will, overlap. We are thinking of assigning each bin a position in a bitmask (1,2,3,4, etc..).

Then when a user logs into the system, we look at his security attributes and determine which bins he's allowed to see. With that info we construct a bitmask for this user where the "set" bits correspond to the identifier of the bins he's allowed to see. So if he can see bins 1, 3 and 4, his bit mask would be 1101.

So when a user searches the data, we can look at the bin index of the returned row and see if that bit is set on his bitmask. If his bitmask has that bit set we let him see that row. We are planning for the bitmask to be stored as a BigInteger in Java.

My question is: Assuming the index number doesn't get bigger that Integer.MAX_INT, is a BigInteger bitmask going to scale for hundreds of thousands of bit positions? Would it take forever to run BigInteger.isBitSet(n) where n could be huge (e.g. 874,837)? Would it take forever to create such a BigInteger?

And secondly: If you have an alternative approach, I'd love to hear it.

回答1:

BigInteger should be fast if you don't change it often.

A more obvious choice would be BitSet which is designed for this sort of thing. For looking up bits, I suspect the performance is similar. For creating/modifying it would be more efficient to use a BitSet.

Note: PaulG has commented the difference is "impressive" and BitSet is faster.



回答2:

Java has a more convenient class for this, called BitSet.

You do not need to check if the bit is set in a loop: you can make a mask, use a bitwise and, and see if the result is non-empty to decide on whether to grant or deny the access:

BitSet resourceAccessMask = ...
BitSet userAllowedAccessMask = ...
BitSet test = (BitSet)resourceAccessMask.clone();
test.and(userAllowedAccessMask);
if (!test.isEmpty()) {
    System.out.println("access granted");
} else {
    System.out.println("access denied");
}

We used this class in a similar situation in my prior company, and the performance was acceptable for our purposes.



回答3:

You could define your own Java interface for this, initially using a Java BitSet to implement that interface.

If you run into performance issues, or if you require the use of long later on, you may always provide a different implementation (e.g. one that uses caching or similar improvements) without changing the rest of the code. Think well about the interface you require, and choose a long index just to be sure, you can always check if it is out of bounds in the implementation later on (or simply return "no access" initially) for anything index > Integer.MAX_VALUE.

Using BigInteger is not such a good idea, as the class was not written for that particular purpose, and the only way of changing it is to create a fully new copy. It is efficient regarding memory use; it uses an array consisting 64 bit longs internally (at the moment, this could of course change).



回答4:

One thing that should be worth considering (beside using BitSet) is using different granularity. Therefore you use a shorter bit set where each bit 'guards' multiple real bits. This way you would not need to have millions of bits per user in ram.

A simple way to achieve this is having a smaller bit set like n/32 and do something like this:

boolean isSet(int n) {
    return guardingBits.isSet(n / 32) && realBits.isSet(n);
}

This gives you a good chance to avoid loading the real bits if those bits are mostly zero. You can modify this approach to match the expected bit-set. If you expect almost all bits are set you can use this guarding bits for storing a one if all bits it guards are set. So you only need to check for bits that might be zero.

Also this might be even the beginning. Depending on the usage and requirements you might want to use a B-tree or a paginated version where you only held a fraction of the big bit field in memory.