Programming Language Properties that facilitate re

2019-04-07 02:06发布

问题:

What are common traits/properties of programming languages that facilitate (simplify) the development of widely automated source code analysis and re-engineering (transformation) tools?

I am mostly thinking in terms of programming language features that make it easier to develop static analysis and refactoring tools (i.e. compare Java vs. C++, the former of which has better support for refactoring).

In other words, a programming language that would be explicitly designed to provide support for automated static analysis and refactoring right from the beginning, what characteristics would it preferably feature?

For example, for Ada, there's the ASIS:

The Ada Semantic Interface Specification (ASIS) is a layered, open architecture providing vendor-independent access to the Ada Library Environment. It allows for the static analysis of Ada programs and libraries. ASIS, the Ada Semantic Interface Specification, is a library that gives applications access to the complete syntactic and semantic structure of an Ada compilation unit. This library is typically used by tools that need to perform some sort of static analysis on an Ada program.

ASIS information: ASIS provides a standard way for tools to extract data that are best collected by an Ada compiler or other source code analyzer. Tools which use ASIS are themselves written in Ada, and can be very easily ported between Ada compilers which support ASIS. Using ASIS, developers can produce powerful code analysis tools with a high degree of portability. They can also save the considerable expense of implementing the algorithms that extract semantic information from the source program. For example, ASIS tools already exist that generate source-code metrics, check a program's conformance to coding styles or restrictions, make cross-references, and globally analyze programs for validation and verification.

Also see, ASIS FAQ

Can you think of other programming languages that provide a similarly comprehensive and complete interface to working with source code specifically for analysis/transformation purposes?

I am thinking about specific implementation techniques to provide the low level hooks, for example core library functions that provide a way to inspect an AST or ASG at runtime.

回答1:

The biggest has to be static typing. This allows tools to have much more insight into what the code is doing. Without it refactoring becomes many times more difficult.



回答2:

I think this is still a largely unexplored problem. The notion of "language design for tooling" seems to only have entered the fringes of the mainstream recently, though I think research in this area is more than two decades old. I agree with two of the other answers, namely that "static typing" and "self-similarity" are useful properties of a language to make refactoring support easier.



回答3:

It is true that the particular programming language can make analysis easier. If you want the easist-to-analyze languages, pick a purely functional one.

But nobody in practice programs in purely functional langauges. (The Haskell guys are going to jump up and down when they see this, but seriously, Haskell is used only extremely rarely).

What makes a programming language analyzable is infrastructure designed to support analysis. Ada's ASIS above, is a great example. Don't confuse the fact that ASIS was written for Ada, or is written in Ada; what counts is that somebody serious wanted to analyze Ada and invested the effort to build Ada analysis machinery.

I believe that the right cure is to build general analysis infrastructure and amortize it across lots of languages. While we're at it, we should build general transformation infrastructure, too, because once you have an analysis, you'll want to use it to effect change. (Doctor visits don't end with diagnosis; they end with cures). And I've bet my career on it.

The result is an engine I think ideal for analysis, refactoring, reengineering, etc: our DMS Software Engineering Toolkit.

It has generic parsing, tree building, prettyprinting, tree manipulation, source-to-source rewriting, attribute grammar evaluations, control and data flow analysis. It has production quality front ends for a number of widely used dialects of C and C++, for Java, C#, COBOL, and PHP, and even for Verilog and VHDL (many other langauges too, but not quite at that level).

To give you some sense of its utility, it was used to convert JOVIAL code for the B-2 bomber into C... without us ever having seen the source code. See http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html

Now, assuming one has analysis infrastructure, what language features help?

Static types helps by limiting the set of possible values a variable can take, but only by adding a limited single-argument predicate, e.g., "X is an integer". I think what helps more are assertions in the code because they capture predicates with more than one argument, which establish relationships between state variables, that often cannot be found by inspecting the code (e.g., problem or domain specific information, e.g., "X > Y+3".) The analysis infrastructure (and frankly, the programmers that read the code) can ideally take advantage of such additional facts to provide a more effective analysis.

Such assertions are commonly coded with special keywords such as "assert", "pre(condition" and "post(condition" that are inspired with good reason from the theorem proving literature.

But even if you don't have assertions in your language, they are easy to encode anyway: just write an if statement with the condition containing the assertion denial, and the body doing something that calls an idiom indicating impossibility or violates the language semantics (e.g., deref an obviously null pointer), such as "if (x>0) fail();"

So what's really needed isn't assertions in the language, but programmers who are willing to write them. Alas, that seems to be sadly lacking.



回答4:

Reflection built into the language/type system. This makes static analysis and refactoring much less painful.

This is part of why Java and .NET tools are so commonplace and nice. This provides the tools with much better functionality in terms of understanding depdencies of source code quickly and reliably, which helps with the static analysis of source code.

In addition, you get the ability to do analysis of your compiled code, as well.



回答5:

There is a language sharing "code is data" paradigm. E.g. every line of code is just data in terms of this language. This make refactoring to be as basic action as primitive data operations. And the name of this language is Lisp. ;)

Seriously speaking, "language for programming" and "language for machine" are two different requirements. And a perfect language for analyzing could be nightmare for programmer. Even more, language designed for some analysis could be not programming language at all. (Last week I met the language for pointer analysis, and it has no textual representation and only two executable statements)

And again: first you have to define the task and then solve it. For example: if the task is "I want to write safe programs, e.g. I want to be sure that I will never try to mix integral and character operands", then you need a language with static types. Ok, "I need to know at runtime what I can do with external libraries" - reflection is your choice. "I need universal programming language for interchanging, transformations and analysis" - most likely, this is not what you really want.



回答6:

For refactoring: self-similarity

The ability to accept code transplants without intrusive alteration or bizarre reinterpretation. Examples:

  • Extract a snippet of C++ to a new procedure, by using reference parameters to give it modifing access to variables.
  • Python, Javascript and Lua methods really are just functions that have a 'self' parameter. *
  • In any number of languages, a function that creates/populates a struct can be (more or less trivially) converted to a constructor.

Counterexamples...

  • Ruby (modules, classes), methods lambda block and raw blocks: The differences in semantics are bewildering to say the least. (which is all I feel qualified to say for sure.)

For the (to my mind) wildly different case of automatic mangling I'm a lot less sure, but the freedom from side-effects offered by functional programming languages is really it. (Ok, so how could we offer the same thing in a language for the rest of us?)

* Python is almost like that. (I forgot what the gotcha is. Probably something to with if method was defined in class or grafted on, runtime.)



回答7:

IMO the most important property is that the language is completely specified and deterministic. For example, in C the behaviour of following code is not defined by the language specification:

x++ = x++ + ++x;

If the code's behaviour is undefined, but yet it compiles and does something, there is no safe way to automatically change it (i.e. refactor it) in a way that preserves that something.

The next important property is that it doesn't allow access to variables (fields) beyond its scope. Pointers make it possibe e.g. in C to access any variable's value simply by "guessing" the address. In a language like that, there are cases where it is not possible to tell where in the code a certain variable's value is read and/or changed. Again, there is no safe way to automatically refactor a program that might do something like that.