antlr generate ast for c and parse the ast

2020-02-13 04:09发布

I am doing static analyze on c program.And I search the antlr website ,there seems to be no appropriate grammar file that produce ast for c program.Does it mean I have to do it myself from the very start.Or is there a quicker method.I also need a tree parser that can traverse the ast created by the parser.

2条回答
三岁会撩人
2楼-- · 2020-02-13 04:31

The hardest part is writing the grammar. Mixing in rewrite rules to create an AST isn't that hard, and creating a tree grammar from a parser grammar that emits an AST isn't that hard too (compared to writing the parser grammar, that is).

Here's a previous Q&A that shows how to create a proper AST: How to output the AST built using ANTLR?

And I couldn't find a decent SO-Q&A that explains how to go about creating a tree grammar, so here's a link to my personal blog that explains this: http://bkiers.blogspot.com/2011/03/6-creating-tree-grammar.html

Good luck.

查看更多
▲ chillily
3楼-- · 2020-02-13 04:38

You indicated you want to do static analysis to detect buffer overflow.

First, writing a grammar for C is harder than it looks. There's all that stuff in the standard, and then there's what the real compilers actually accept. And you have to decide what to do about the preprocessor (and it varies from compiler to compiler!). If you don't get the grammar and preprocessing exactly right, you won't be able to parse real programs. (If you want to do toy languages, that's fine, but then you don't need a C grammar).

To do the analysis, you'll need far more machinery than an AST. You'll need symbol tables, control and data flow analysis, likely local and global points-to analysis, call graph extraction, and some type of range analysis.

People just don't seem to understand this.

** GETTING A PARSER IS A LONG WAY FROM DOING ANYTHING USEFUL WITH REAL LANGUAGES **

I'm shouting because I see this over, and over, and over.

If you want to get on with a specific program analysis or transformation task, unless you want to die of old age before you start your task, you better find a foundation that has most of what you need already. A foundation on a parser generator with a creaky grammar is not a foundation. (Don't get me wrong: ANTLR, YACC, JavaCC are all fine parser generators, and they're great for building a parser for a new language. They're great for implementing production parsers for real langauges when the investment gets made. But they produce parsers, and mostly people don't do the production part. And they don't provide the additional machinery by a long shot.)

Our DMS Software Reengineering Toolkit contains all the above machinery because it is almost always needed, and it is a royal headache to implement. (My team has 15 years invested so far.)

We've also instantiated that machinery is forms specifically useful for COBOL and Java, C, C++ (to somewhat lesser extent, the language is really hard), in a variety of dialects, so that others don't have to repeat this long process.

GCC and Clang are pretty mature for C and C++ as alternatives.

查看更多
登录 后发表回答