boolean[] vs. BitSet: Which is more efficient?

2019-01-02 20:49发布

What is more efficient in terms of memory and CPU usage — an array of booleans or a BitSet? Specific BitSet methods are not used, only get/set/clear (==, =, Arrays.fill respectively for an array).

8条回答
零度萤火
2楼-- · 2019-01-02 21:13

As for memory, the documentation for a BitSet has pretty clear implications. In particular:

Every bit set has a current size, which is the number of bits of space currently in use by the bit set. Note that the size is related to the implementation of a bit set, so it may change with implementation. The length of a bit set relates to logical length of a bit set and is defined independently of implementation.

The source for Java library classes is openly available and one can easily check this for themselves. In particular:

The internal field corresponding to the serialField "bits".
89 
90     private long[] words;

As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.

Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:

  1. It depends on the application, which nobody responding generally has access to. Analyze and profile it in the context it is being used in. Be sure that it's a bottleneck that's actually worth optimizing.
  2. Questions like this that ask about speed generally show that the OP thinks they care about efficiency but wasn't willing to profile and didn't define performance requirements. Under the surface, that's usually a red flag that the OP is headed down the wrong path.

I know this is an old question but it came up recently; and I believe this is worth adding.

查看更多
皆成旧梦
3楼-- · 2019-01-02 21:14

From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):

BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.

boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.

查看更多
公子世无双
4楼-- · 2019-01-02 21:16

Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).

Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.

You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).

查看更多
浮光初槿花落
5楼-- · 2019-01-02 21:18
  • Boolean[] uses about 4-20 bytes per boolean value.
  • boolean[] uses about 1 byte per boolean value.
  • BitSet uses about 1 bit per boolean value.

Memory size might not be an issue for you in which case boolean[] might be simpler to code.

查看更多
素衣白纱
6楼-- · 2019-01-02 21:19

I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.

查看更多
不流泪的眼
7楼-- · 2019-01-02 21:23

A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001 could be squeezed down by frequency to something equivalent to {(7)0, (1)1}. A more "randomized" string 00111010 would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet.

查看更多
登录 后发表回答