Memory mapped collections in Java

2019-07-28 04:56发布

问题:

I'm filling up the JVM Heap Space.

Changing parameters to give more heap space to the JVM, or changing something in my algorithm in the code not to use so much space are two of the most recommended options.

But, if those two have already been tried and applied, and I still get out of memory exceptions, I'd like to see what the other options are.

I found out about this example of "Using a memory mapped file for a huge matrix" and a library called HugeCollections which are an interesting way to solve my problem. Unluckily, the library hasn't seen an update for over a year, and it's not in any Maven repo - so for me it's not a really reliable one.

My question is, is there any other library doing this, or a good way of achieving it (having collection objects (lists and sets) memory mapped)?

回答1:

Unluckily, the library hasn't seen an update for over a year, and it's not in any Maven repo - so for me it's not a really reliable one I agree and I wrote it. ;)

I suggest you look at https://github.com/peter-lawrey/Java-Chronicle which is higher performance has been used a bit. It really design for List & Queue but you could use it for a Map or Set with additional data structures.

Depending on your requirements, you could write your own library. e.g. for time series data I wrote a different library which is not open source unfortunately but can load tables of 500+ GB pretty cleanly.

it's not in any Maven repo

Neither is this one but would be happy for someone to add it.



回答2:

You don't say what sort of collections you're using, or the way that you're using them, so it's hard to give recommendations. However, here are a few things to keep in mind:

  • Keeping the objects on the Java heap will always be the simplest option, and RAM is relatively cheap.
  • Blindly moving to memory-mapped data is very likely to give horrendous performance, especially if you're moving around in the file and/or making lots of changes. Hash-based collection types are the worst, as they work by distributing data. Tree-based collection types are generally a better choice, and linear collections can go both ways.
  • Once you move off-heap, you need a way to translate your objects to/from Java. Object serialization is the easiest, but adds lots of overhead. Binary objects accessed via byte buffers are usually a better choice, but you need to be thread-conscious.
  • You also have to manage your own garbage collection for off-heap objects. Not a problem if all you're doing is creating/updating, but quickly becomes a pain if you're deleting.
  • If you have a lot of data, and need to access that data in varied ways, a database is probably your best bet.


回答3:

Sounds like you're either having trouble with a memory leak, or trying to put too large an Object into memory.

Have you tried making a rough estimate of the amount of memory needed to load your data?



回答4:

Assuming you have no memory leaks or other issues and really need that much storage that you can't fit it in the heap (which I find unlikely) you have basically only one option:

Don't put your data on the heap. Simple as that. Now which method you use to move your data out is very dependend on your requirements (what kind of data, frequency of updates and how much is it really?).

Note: You can use very large heaps with a 64-bit VM and if necessary enlarge the swap space of the OS. It may be the simplest solution to just brutally increase the maximum heap size (even if it means lots of swapping). I certainly would try that first in the situation you outlined.