How to build a static code analysis tool?

2019-03-18 09:30发布

问题:

I m in process of understanding and building a static code analysis tool for a proprietary language from a big company. Reason for doing this , I have to review a rather large code base , and a static code analysis would help a lot and they do not have one for the language so far.

I would like to know how does one go about building a static code analysis tool , for e.g. Lint or SpLint for C.

Any books, articles , blogs , sites..etc would help.

Thanks.

回答1:

I know this is an old post, but the answers don't really seem that satisfactory. This article is a pretty good introduction to the technology behind the static analysis tools, and has several links to examples.

A good book is "Secure Programming with Static Analysis" by Brian Chest and Jacob West.



回答2:

You need good infrastructrure, such as a parser, a tree builder, tree analyzers, symbol table builders, flow analyzers, and then to get on with your specific task you need to code specific checks for the specific problems of interest to you, using all the infrastructure machinery.

Building all that foundation machinery is actually pretty hard, and it doesn't help you do your specific task. People don't write the operating system for every application they code; why should you build all the infrastructure? Like an OS, it is better if you simply acquire good infrastructure.

People will tell you to lex and yacc. That's kind of like suggesting you use the real time keneral part of the OS; useful, but far from all the infrastructure you really need.

Our DMS Software Reengineering Toolkit provides all the necessary infracture. It has been used to define many language front ends as well as many tools for such languages.

Such infrastructure would allow you to define your specific nonstandard language relatively quickly, and then get on with your task of coding your special checks.



回答3:

  1. Obviously you need a parser for the language. A good high level AST is useful.
  2. You need to enumerate a set of "mistakes" in the language. Without knowing more about the language in question, we can't help here. Examples: unallocated pointers in C, etc.
  3. Combine the AST with the mistakes in #2.