static and dynamic code analysis

2019-01-23 20:23发布

问题:

I found several questions about this topic, and all of them with lot of references, but still I don't have a clear idea about that, because most of the references speak about concrete tools and not about the concept in general of the analysis. Thus I have some questions:

About Static analysis: 1. I would like to have a reference, or a summary of which techniques are successful and have more relevance nowadays. 2. What really can they do about discovering bugs, can we make a summary or it is depending of the tool?

About symbolic execution: 1. Where could be enclose symbolic execution? I guess depending of the approach, I would like to know if they are dynamic analysis, or mix of static and dynamic analysis if it is possible to determine.

I found problems to differentiated the two different techniques in the tools, even I think I know the theoretical difference.

I'm actually working with C Thanks in advance

回答1:

I'm trying to give a short answer:

Static analysis looks at the syntactical structure of code and draws conclusions about the program behavior. These conclusions must not always be correct.

A typical example of static analysis is data flow analysis, where you compute sets like used, read, write for every statement. This will help to find e.g. uninitialized values.

You can also analyze the code regarding code-patterns. This way, these tools can be used to check if you are complying to a specific coding standard. A prominent coding standard example is MISRA. This coding standard is used for safety critical systems and avoids problematic constructs in C. This way you can already say a lot about the robustness of your applications against memory leaks, dangling pointers, etc.

Dynamic analysis is not looking at the syntax only, but takes state information into account. In symbolic execution, you are adding assumptions about the possible values of all variables to the statements.

The most expensive and powerful method of dynamic analysis is model checking, where you really look at all possible execution states of the system. You can think of a model checked system as a system that is tested with 100% coverage - but there are of course a lot of practical problems that prevent real systems to be checked that way.

These methods are very powerful, and you can gain a lot from the static code analysis tools especially when combined with a good coding standard.

A feature my software team found really impressive is e.g. that it will tell you in C++ when a class with virtual methods does not have a virtual destructor. Easy to check in fact, but really helpful.

The commercial tools are very expensive, but worth the money, once you learned how to use them. A typical problem in the beginning is that you will get a lot of false alarms, and don't know where to look for the real problem.

Note that nowadays g++ has some of this stuff already built-in, and that you can use something like pclint which is free.

Sorry - this is already getting quite long...hope it's interesting.



回答2:

The term "static analysis" means that the analysis does not actually run a code. On the other hand, "dynamic analysis" runs a code and also requires some kinds of real test inputs. That is the definition. Nothing more.

Static analysis employs various formal methods such as abstract interpretation, model checking, and symbolic execution. In general, abstract interpretation or model checking is suitable for software verification. Symbolic execution is more appropriate for the purpose of bug finding.

Symbolic execution is categorized into static analysis. However, there is a hybrid method called concolic execution which uses both symbolic execution and dynamic testing.

Added for Zane's comment:

Maybe my explanation was little confusing.

The difference between software verification and bug finding is whether the analysis is sound or not. For example, when we say the buffer overrun analyzer is sound, it means that the analyzer must report all possible buffer overruns. If the analyzer reports nothing, it proves the absence of buffer overruns in the target program. Because model checking is the method that guarantees soundness, it is mostly used for software verification.

On the other hands, symbolic execution which is actively used by today's most commercial static analyzers does not guarantee soundness since sound analysis inherently issues lots, lots of false positives. For the purpose of bug finding, it is more important to reduce false positives even if some true positives are also lost.

In summary,

  • soundness: there are no false negatives

  • completeness: there are no false positives

  • software verification: soundness is more important than completeness

  • bug finding: completeness is more important than soundness