Developing in Java, I need a data structure to select N distinct random numbers between 0 and 999999 ?
I want to be able to quickly allocate N numbers and make sure they don't repeat themselves.
Main goal is not to use too much memory and still keep performance reasonable.
I am considering using a BitSet But I am not sure if the memory implications. Can someone tell me if the memory requirements of this class are related to the number of bits or to the number of set bits? and what is the complexity to setting/testing a bit ?
UPDATE: Thanks for all the replies so far.
I Think I had this in my initial wording of this Q but removed it when I first saw the BitSet Class. Anyway I wanted to add the following info: Currently I am looking at N of a few thousands at most (most likely around 1000-2000) and a number range of 0 to 999999. But I would like my choice to take into consideration the option of increasing the range to 8 digits (i.e. 0 to 99 999 999) while keeping N at roughly the same ranges (maybe increase it to 5K or 10K). So the "used values" are quite sparse.
A
BitSet
will take up as much space as 1,000,000 booleans, which is 125,000 bytes or roughly 122kB, plus some minor overhead and space to grow. An array of the actual numbers, i.e. anint[]
will take N × 4B of space plus some overhead. The break-even point isI'm not intimately familiar with Java internals, but I suspect it won't allocate more than twice the actual space used, so you're using less then 250kB of memory with a bitset. Also, an array makes it harder to find the duplicates when you need unique integers, so I'd use the bitset either way and perhaps convert it to an array at the end, if that's more convenient for further processing.
Setting/getting a bit in a
BitSet
will have constant complexity, although it takes a few more operations than getting one out of aboolean[]
.It depends on how large
N
is.For small values of
N
, you could use aHashSet<Integer>
to hold the numbers you have already issued. This gives youO(1)
lookup andO(N)
space usage.A
BitSet
for the range 0-999999 is going to use roughly 125Kb, regardless of the value ofN
. For large enough values ofN
, this will be more space efficient than aHashSet
. I'm not sure exactly what the value ofN
is where aBitSet
will use less space, but my guestimate would be 10,000 to 20,000.The size is determined either by the largest bit that has ever been set, or the
nBits
parameter if you use theBitSet(int nBits)
constructor.Testing bit
B
isO(1)
.Setting bit
B
isO(1)
best case, andO(B)
if you need to expand the bitset backing array. However, since the size of the backing array is the next largest power of 2, the cost of expansion can typically be amortized over multipleBitSet
operations.