We've had success so far using ChronicleMap
for most things we wanted to use it for, and most data sets have worked just fine. One use case we have is using it as a multimap, covering most of the concerns with doing so. We're using it as a Map<String,Set<Integer>>
specifically in this case. However, we've run into some interesting JVM crashes and are having trouble finding a deterministic pattern so we can avoid them.
So, before we put all the Set<Integer>
into ChronicleMap
, we have it entirely in the JVM so we write at once to reduce fragmentation. Since we have it entirely in memory, we can determine what the max and average Set<Integer>
size is, and can easily size the ChronicleMap
appropriately using ChronicleMapBuilder.averageValueSize
. In most cases, this works just fine.
In some cases, however, the JVM crashes when the size of the Set<Integer>
deviates to far from the average. For example, the average size might be 400, but we could have outlier sets with 20,000 integers in them. We can still size the map using the average serialized size of a set of 400 integers, and it starts populating ChronicleMap
just fine until it reaches a list of a very large size.
So the question is: how do I figure out how big I can deviate from the average? I was hoping the average was indeed an average, but there appears to be some max that above that causes the JVM to die.
We devised an algorithm to split the large sets into smaller sets (e.g. if the key was AAA, then now there are keys AAA:1, AAA:2, ... AAA:n). The size of the split setwas 10 times the average size. In other words, if the average size was 500, but we had a set that was 20,000, we'd split it into four 5,000 (500 * 10) element sets.
This worked in most cases, but then we ran into another curious case and even this splitting wasn't sufficient. I reduced the factor to 5 times the average size and now it works again... but how do I know that's small enough? I think knowing the source issue or how to determine exactly what causes it is the best way, but alas, I have no idea why ChronicleMap
is struggling here.
Also, FWIW, I'm using an older version 2.1.17. If this is a bug that was fixed in a newer version, I'd like to know a little detail about the bug and if we can avoid it through our own means (like splitting the sets) but still continue using 2.1.17 (we'll upgrade later; just don't want to rock the boat too much more).
I can not be 100% sure without reproducing the bug, but I have an idea why JVM crashes occur in this case. If I am right, it happens if your entry size exceeds 64 * chunkSize of the ChronicleMap. Chunk size could be configured directly, but if you configure just average key and value sizes, it defaults to such a power of 2, that is between averageEntrySize/8 and averageEntrySize/4, where average entry size is the sum of your averageKeySize and averageValueSize, plus some internal overhead added. So in your case, it is likely that if you have average values - sets of 400 or 500 ints (each 4 bytes), + small keys, I suppose chunkSize is computed as 256 bytes, so your entries should be smaller than 256 * 64 = 16384 bytes.
Again if I'm right in my hypotesis from where this bug comes, Chronicle Map 3 shouldn't have this bug and should allow entries arbitrarily larger than average size or chunk size.