I'm using an antlr for a simple CSV parser. I'd like to use it on a 29gig file, but it runs out of memory on the ANTLRInputStream call:
CharStream cs = new ANTLRInputStream(new BufferedInputStream(input,8192));
CSVLexer lexer = new CSVLexer(cs);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CSVParser parser = new CSVParser(tokens);
ParseTree tree = parser.file();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(myListener, tree);
I tried to change it to be an unbuffered stream
CharStream cs= new UnbufferedCharStream(input)
CSVLexer lexer = new CSVLexer(cs);
lexer.setTokenFactory(new CommonTokenFactory(true));
TokenStream tokens = new UnbufferedTokenStream(lexer);
CSVParser parser = new CSVParser(tokens);
When I run the walker.walk() function it does not process any records. If I try something like
parser.setBuildParseTree(false);
parser.addParseListener(myListener);
It also fails. It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.
If I don't use unbuffered char stream but I do use unbuffered token stream it gives error: Unbuffered stream cannot know its size. I tried different permutations but usually there is a java heap error or a "GC overhead limit exceeded".
I'm using this csv grammar