I want to store System.currentTimeInMillis
in memory with minimum space possible. because I have to store millions of them in memory.
I converted it to binaryString
which gave me 41 bits
Here is my program
public class BitSetSize {
public static void main(final String[] args) {
final long currentTimeMillis = System.currentTimeMillis();
final String currentTimeToBinaryString = Long.toBinaryString(currentTimeMillis);
System.out.println("Size in bits: " + currentTimeToBinaryString.length());
final BitSet bitSet = BitSet.valueOf(new long[]{currentTimeMillis});
System.out.println("Bitset length: " + bitSet.length());
System.out.println("Bitset size: " + bitSet.size());
System.out.println("Size of biset object(bytes): " + MemoryMeasurer.measureBytes(bitSet));
}
}
But when I run it I get
Size in bits: 41
Bitset length: 41
Bitset size: 64
Size of biset object(bytes): 48
Question
- Why does bitSet.length()
and bitSet.size()
differ? I assume length()
is correct?
- I am using memory-measurer to learn about the size of bitSet
, but it tell me 48 bytes
, why is it not (41/8) byte
?
I am confused
First of all I want to advise the right tool to analyze object layout schemes in JVMs - JOL. In your case(java -jar jol-cli/target/jol-cli.jar internals java.util.BitSet
) JOL produces the following result:
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.util.BitSet object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) f4 df 9f e0 (11110100 11011111 10011111 11100000) (-526393356)
12 4 int BitSet.wordsInUse 0
16 1 boolean BitSet.sizeIsSticky false
17 3 (alignment/padding gap) N/A
20 4 long[] BitSet.words [0]
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
Your calculations were not correct because of static fields, thus an empty BitSet
itself reserves 24 bytes. Please note that these calculations are not 100% exact because it was not taken into account size of long[]
object. So the right results are java -jar jol-cli/target/jol-cli.jar externals java.util.BitSet
:
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.util.BitSet@6b25f76bd object externals:
ADDRESS SIZE TYPE PATH VALUE
7ae321a48 24 java.util.BitSet (object)
7ae321a60 24 [J .words [0]
This means an empty BitSet
itself uses 48 bytes including long array. Also you can get the estimated object layout in different VM modes java -jar jol-cli/target/jol-cli.jar estimates java.util.BitSet
Why does bitSet.length() and bitSet.size() differ? I assume length() is correct?
The BitSet.size()
is the size of the internal data structure it uses to store bit values. Since the BitSet
internally uses a long[]
array the size is always a multiple of 64 bits. E.g. if you set the 64th bit in a BitSet
the BitSet
must increase the capacity of the long[]
array in order to store that value, because each long can "only" store 64 bits. E.g.
BitSet bitSet = new BitSet();
for (int i = 0; i <= 64; i++) {
bitSet.set(i, true);
System.out.println(bitSet.size());
}
BitSet.length()
returns the actual occupied bits in the BitSet
. So if you create a new BitSet
it's length is 0. If you then set the 4th bit the length will be 5. The size
will remain 64, because only one long is needed to store 5 bits.
BitSet bitSet = new BitSet();
System.out.println(bitSet.length()); // 0
bitSet.set(4, true);
System.out.println(bitSet.size()); // 64
System.out.println(bitSet.length()); // 5
I am using memory-measurer to learn about the size of bitSet, but it tell me 48 bytes, why is it not (41/8) byte?
Because of memory padding. Also known as data structure alignment.
The BitSet
object needs mathematical 41 bytes in memory.
- 8 bytes for the object header
- 20 bytes for the
long[]
- 8 bytes for the
long
in the array
- 4 bytes for the
wordsInUse
int
variable
- 1 bytes for the
sizeIsSticky
boolean
But the jvm can't allocate 41 bits so it rounds it to the next multiple of 8. That is 48.
This size may vary, because the object header size might vary from one JVM implementation to another. So if the object header is 16 bytes. The total will be 49 and the jvm rounds it to the next multiple of 8. In this case 56.
Your current code can't be storage for millions of long
(System.currentTimeInMillis
). You can use trove TLongHashSet or you should look at sparse bitset. But BitSet has int index, so you should compress long from currentTimeInMillis to int. E.g. bitSetIndex = (int)(currentTimeInMillis - initialTime). It gives you 2^32 millisec (~ 50 days) interval starting from initialTime.
//store sample for bitset:
bitSet.set(System.currentTimeInMillis());
EDIT
One BitSet object allocates more than 100 bytes on the heap. So you should reuse one BitSet object for a lot of long values. The simplest way is to use long value as index inside BitSet and set value as true at this index. But there are several problems (I described them above):
- BitSet has int index not long
- java.util.BitSet is not memory effecient.
See java doc of BitSet.
Every bit set has a current size, which is the number of bits of space
currently in use by the bit set. Note that the size is related to the
implementation of a bit set, so it may change with implementation. The
length of a bit set relates to logical length of a bit set and is
defined independently of implementation.
As BetaRide mentioned, the actual size that the BitSet takes up is implementation-specific. That said, in the Oracle/OpenJDK implementations (at least, in 6, 7, and 8), the basic element of state is a long[]
of words. That means that the size is always a multiple of 64.
As for the 48 bytes, I count in the code:
- 16 bytes for the BitSet object itself
- 20 bytes for the
long[]
object (16 for the object, 4 for the length)
- 8 bytes for the array's content (each element is 8 bytes, but you only have one)
- 4 bytes for
int wordsInUse
- 1 bytes for
boolean sizeIsSticky
Which yields 49 -- not far off the 48 you're seeing. If those object headers are compressed, but padding is also introduced, then that's probably where the 48 is coming from.