Java heap space: Hashmap, ArrayList

2019-08-29 10:57发布

问题:

I would like to process a text file (about 400 MB) in order to create a recursive parent-child-structure from the data given in each line. The data have to be prepared for a top down navigation (input: parent, output: all children and sub children). E.g. of lines to be read: (child,id1,id2,parent,id3)

132142086;1;2;132528589;132528599
132142087;1;3;132528589;132528599
132142088;1;0;132528589;132528599
323442444;1;0;132142088;132528599
454345434;1;0;323442444;132528599

132528589: is parent of 132142086,132142087,132142088
132142088: is parent of 323442444
323442444: is parent of 454345434

Given: OS windows xp, 32bit, 2GB available Memory and -Xmx1024m Here is the way I prepare the data:

HashMap<String,ArrayList<String>> hMap=new HashMap<String,ArrayList<String>>();
  while ((myReader = bReader.readLine()) != null) 
          {
             String [] tmpObj=myReader.split(delimiter);
                   String valuesArrayS=tmpObj[0]+";"+tmpObj[1]+";"+tmpObj[2]+";"+tmpObj[3]+";"+tmpObj[4];
                        ArrayList<String> valuesArray=new ArrayList<String>();
                        //case of same key
                        if(hMap.containsKey(tmpObj[3]))
                            {
                            valuesArray=(ArrayList<String>)(hMap.get(tmpObj[3])).clone();
                            }

                        valuesArray.add(valuesArrayS);
                        hMap.put(tmpObj[3],valuesArray);
                        tmpObj=null;
                        valuesArray=null;
                        }

return hMap;

After then I use a recursive function:

HashMap<String,ArrayList<String>> getChildren(input parent)

for creating the data structure needed. The plan is to let the hMap available (read only) for more than one thread using the function getChildren.
I tested this program with an input file of 90 MB and it seemed to work properly. However, running it with the real file with more than 380 MB lead to:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I need some help in memory resource management

回答1:

From the "dirt-simple approach" side of things: Based on your problem statement, you don't need to keep id1, id2, or id3 around. Assuming that's the case, how about replacing your HashMap<String, ArrayList<String>> with a HashMap<Integer, ArrayList<Integer>>? You can use Integer.parseInt() to do the string-to-int conversion, and an Integer should always be smaller than the corresponding String.

Other suggestions: replace your ArrayList with a HashSet if you don't care about duplicates.

Per outofBounds' answer, you don't need to clone an ArrayList every time you want to add an item to it.



回答2:

Do check out increasing your memory, as suggested by others. Also, you can store your data within the table better as suggested by Sbodd and others.

However, you may be running afoul of memory fragmentation. Hash maps use arrays. Big hash maps use big arrays. You are not specifying the size of your hashmap, so every time it decides it needs to be bigger, it discards its old array and allocates a new one. After a while, your memory will fill up with discarded hash table arrays and you get an OutOfMemoryException even though you technically have plenty of free memory. (90% of your memory could be available, but in pieces too small to use.)

The garbage collector (GC) will work continuously to combine all these free bits into blocks big enough to use. If your program ran slowly enough, you would not have a problem, but your program is running full tilt and the GC is going to get behind. The GC will throw the exception if it cannot assemble a free block big enough fast enough; the mere fact that the memory exists will not stop it. (This means that a program that could run won't, but it keeps the JVM from running real slow and looking real bad to users.)

Given that you know how big your hash map has to be, I'd set the size up front. Even if the size isn't precisely right, it may solve your memory problem without increasing the heap size and will definitely make your program run faster (or as fast as your file read lets it--use big file buffers).

If you have no real idea how big your table might be, use a TreeMap. It's a bit slower but does not allocate huge arrays and is hence a lot kinder to the GC. I find them a lot more flexible and useful. You might even look at the ConcurrentSkipTreeMap, which is slower than the TreeMap, but lets you add and read and delete from multiple threads simultaneously.

But your best bet is something like:

hMap = new HashMap<String,ArrayList<String>>( 10000000 );


回答3:

Inside your While loop u can reduce some Space something like this

String [] tmpObj=myReader.split(delimiter);
// String = String + String takes more Space than String.format(...)
//String valuesArrayS=tmpObj[0]+";"+tmpObj[1]+";"+tmpObj[2]+";"+tmpObj[3]+";"+tmpObj[4];

// Just Adding if thers is no List for a Key
if(!hMap.containsKey(tmpObj[3]){
    hMap.put(tmpObj[3], new ArrayList<String>());
}
// Gettin the list from the Map and adding the new stuff
List<String> values = hMap.get(tmpObj[3]);
values.add(String.format("%s;%s;%s;%s;%s",tmpObj[0], tmpObj[1], tmpObj[2], tmpObj[3], tmpObj[4]));

no need to Clone the List



回答4:

You are really testing the boundaries of what one can doing with 1GB of memory.

You could:

  1. Increase Heap Space. 32 bit windows will limit you to ~1.5GB, but you still have a little more wiggle room it might be enough to put you over the top.
  2. Build some type of pre-processor utility that Pre-partitions the file in sizes you know to work and operates on them one at a time, perhaps hierachically.
  3. Try re-structuring your program. It has lots of splitting and concatenating going on. In java strings are immutable and when you split strings and concatenate with + operators you are creating new Strings all the time(9 out of 10 cases this doesn't matter, but in your case where you are working with a very limited set of resources it might make a difference)

As a side less helpful note. The real issue here is that you just don't have the resources to tackle this task and optimization is only going to take you so far. Its like asking how to better tunnel through a mountain with a garden trowel. The real answer is probably the one you don't want to hear which is throw away the trowel and invest in some industrial equipment

On a second more helpful note(and fun if you're like me) - you may try hooking jVisualVM up to your application and trying to understand where you heap is going or use jhat and the -XX:+HeapDumpOnOutOfMemoryError jvm flag to see what was happening with the heap at crash time.