Data structure recommendation

2019-09-05 07:37发布

Developing in Java, I need a data structure to select N distinct random numbers between 0 and 999999 ?

I want to be able to quickly allocate N numbers and make sure they don't repeat themselves.

Main goal is not to use too much memory and still keep performance reasonable.

I am considering using a BitSet But I am not sure if the memory implications. Can someone tell me if the memory requirements of this class are related to the number of bits or to the number of set bits? and what is the complexity to setting/testing a bit ?

UPDATE: Thanks for all the replies so far.

I Think I had this in my initial wording of this Q but removed it when I first saw the BitSet Class. Anyway I wanted to add the following info: Currently I am looking at N of a few thousands at most (most likely around 1000-2000) and a number range of 0 to 999999. But I would like my choice to take into consideration the option of increasing the range to 8 digits (i.e. 0 to 99 999 999) while keeping N at roughly the same ranges (maybe increase it to 5K or 10K). So the "used values" are quite sparse.

2条回答
Root(大扎)
2楼-- · 2019-09-05 07:46

A BitSet will take up as much space as 1,000,000 booleans, which is 125,000 bytes or roughly 122kB, plus some minor overhead and space to grow. An array of the actual numbers, i.e. an int[] will take N × 4B of space plus some overhead. The break-even point is

4 × N = 125,000
N = 31250

I'm not intimately familiar with Java internals, but I suspect it won't allocate more than twice the actual space used, so you're using less then 250kB of memory with a bitset. Also, an array makes it harder to find the duplicates when you need unique integers, so I'd use the bitset either way and perhaps convert it to an array at the end, if that's more convenient for further processing.

Setting/getting a bit in a BitSet will have constant complexity, although it takes a few more operations than getting one out of a boolean[].

查看更多
等我变得足够好
3楼-- · 2019-09-05 07:56

It depends on how large N is.

For small values of N, you could use a HashSet<Integer> to hold the numbers you have already issued. This gives you O(1) lookup and O(N) space usage.

A BitSet for the range 0-999999 is going to use roughly 125Kb, regardless of the value of N. For large enough values of N, this will be more space efficient than a HashSet. I'm not sure exactly what the value of N is where a BitSet will use less space, but my guestimate would be 10,000 to 20,000.


Can someone tell me if the memory requirements of BitSet are related to the number of bits or to the number of set bits?

The size is determined either by the largest bit that has ever been set, or the nBits parameter if you use the BitSet(int nBits) constructor.

and what is the complexity to setting/testing a bit ?

Testing bit B is O(1).

Setting bit B is O(1) best case, and O(B) if you need to expand the bitset backing array. However, since the size of the backing array is the next largest power of 2, the cost of expansion can typically be amortized over multiple BitSet operations.

查看更多
登录 后发表回答