Are there any good examples to references where se

2019-07-27 15:42发布

问题:

I'm using an antlr for a simple CSV parser. I'd like to use it on a 29gig file, but it runs out of memory on the ANTLRInputStream call:

    CharStream cs = new ANTLRInputStream(new BufferedInputStream(input,8192));
    CSVLexer lexer = new CSVLexer(cs);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    CSVParser parser = new CSVParser(tokens);
    ParseTree tree = parser.file();
    ParseTreeWalker walker = new ParseTreeWalker();
    walker.walk(myListener, tree);

I tried to change it to be an unbuffered stream

    CharStream cs= new UnbufferedCharStream(input)
    CSVLexer lexer = new CSVLexer(cs);
    lexer.setTokenFactory(new CommonTokenFactory(true));
    TokenStream tokens = new UnbufferedTokenStream(lexer);
    CSVParser parser = new CSVParser(tokens);

When I run the walker.walk() function it does not process any records. If I try something like

    parser.setBuildParseTree(false);
    parser.addParseListener(myListener);

It also fails. It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.

If I don't use unbuffered char stream but I do use unbuffered token stream it gives error: Unbuffered stream cannot know its size. I tried different permutations but usually there is a java heap error or a "GC overhead limit exceeded".

I'm using this csv grammar

回答1:

I already answered a similar question here: https://stackoverflow.com/a/26120662/4094678

It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.

Look for grammar actions in antlr book - like said in the linked answer, forget listener and visitor and building a parse tree. Even if this is not enough, split the file in a number of smaller ones and then parse each of them.
And of course as mentioned in the comments increase java vm memory.