“OutOfMemoryError: GC overhead limit exceeded”: pa

2020-04-23 03:24发布

问题:

I try to parse large json file (more 600Mo) with Java. My json file look like that:

{
    "0" : {"link_id": "2381317", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "42", "type": "Gamer", "website": "http://www.google.com",  "name": "troll", "country": "United Kingdom", "sp": "Management Consulting" },
    "1" : {"link_id": "2381316", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "41", "type": "Gamer", "website": "http://www.google2.com",  "name": "troll2", "country": "United Kingdom", "sp": "Management Consulting" }
    [....]

    "345240" : {"link_id": "2381314", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "23", "type": "Gamer", "website": "http://www.google2.com",  "name": "troll2", "country": "United Kingdom", "sp": "Management Consulting" }
}

and my code looks like that:

public class dumpExtractor {

    private static final String filePath = "/home/troll/Documents/analyse/lol.json";

    public static void main(String[] args) {

    try {
        // read the json file
        FileReader reader = new FileReader(filePath);
        JSONParser jsonParser = new JSONParser();
        JSONObject jsonObject = (JSONObject) jsonParser.parse(reader);
        Iterator<JSONObject> iterator = jsonObject.values().iterator();

        while (iterator.hasNext()) {
        JSONObject jsonChildObject = iterator.next();
        System.out.println("==========================");
        String name = (String) jsonChildObject.get("name");
        System.out.println("Industry name: " + name);

        String type = (String) jsonChildObject.get("type");
        if (type != null && !type.isEmpty()) {
            System.out.println("type: " + type);
        }

        String sp = (String) jsonChildObject.get("sp");
        if (sp != null && !sp.isEmpty()) {
            System.out.println("sp: " + sp);
        }
        System.out.println("==========================");
        }
        System.out.println("done ! ");
    } catch (IOException ex) {
        ex.printStackTrace();
    } 
    }
}

I 've got this error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.HashMap.createEntry(HashMap.java:897)
    at java.util.HashMap.addEntry(HashMap.java:884)
    at java.util.HashMap.put(HashMap.java:505)
    at org.json.simple.parser.JSONParser.parse(Unknown Source)
    at org.json.simple.parser.JSONParser.parse(Unknown Source)

How I can fix that ?

Thanks in advance.

回答1:

If you have to read huge JSON Files you can't mantain in memory all informations. Extending memory can be a solution for a file of 1 Gb. If the files tomorrow is a 2 Gb Files?

The right approach to this problem is to parse the json element by element using a streaming parser. Basically instead of loading the whole json in memory and creating a whole big object representing it you need to read single elements of the json and converting them to objects step by step.

Here you find a nice article explaing how to do it with jackson library.



回答2:

You have two choices:

  1. Give more memory to the Java program by specifying the -Xmx argument, e.g. -Xmx1g to give it 1 Gb of memory.
  2. Use a "streaming" JSON parser. This will scale to infinitely large JSON files.

json-simple has a streaming API. See https://code.google.com/p/json-simple/wiki/DecodingExamples#Example_5_-_Stoppable_SAX-like_content_handler

There are other libraries with good streaming parser, e.g. Jackson.



回答3:

Increase the JVM heap space by setting the environment variables :

SET _JAVA_OPTIONS = -Xms512m -Xmx1024m

But it cant be a permanent solution as your file can be increased in future