What are common traits/properties of programming languages that facilitate (simplify) the development of widely automated source code analysis and re-engineering (transformation) tools?
I am mostly thinking in terms of programming language features that make it easier to develop static analysis and refactoring tools (i.e. compare Java vs. C++, the former of which has better support for refactoring).
In other words, a programming language that would be explicitly designed to provide support for automated static analysis and refactoring right from the beginning, what characteristics would it preferably feature?
For example, for Ada, there's the ASIS:
The Ada Semantic Interface Specification (ASIS) is a layered, open architecture providing vendor-independent access to the Ada Library Environment. It allows for the static analysis of Ada programs and libraries.
ASIS, the Ada Semantic Interface Specification, is a library that gives applications access to the complete syntactic and semantic structure of an Ada compilation unit. This library is typically used by tools that need to perform some sort of static analysis on an Ada program.
ASIS information: ASIS provides a standard way for tools to extract data that are best collected by an Ada compiler or other source code analyzer. Tools which use ASIS are themselves written in Ada, and can be very easily ported between Ada compilers which support ASIS. Using ASIS, developers can produce powerful code analysis tools with a high degree of portability. They can also save the considerable expense of implementing the algorithms that extract semantic information from the source program. For example, ASIS tools already exist that generate source-code metrics, check a program's conformance to coding styles or restrictions, make cross-references, and globally analyze programs for validation and verification.
Also see, ASIS FAQ
Can you think of other programming languages that provide a similarly comprehensive and complete interface to working with source code specifically for analysis/transformation purposes?
I am thinking about specific implementation techniques to provide the low level hooks, for example core library functions that provide a way to inspect an AST or ASG at runtime.
The biggest has to be static typing. This allows tools to have much more insight into what the code is doing. Without it refactoring becomes many times more difficult.
I think this is still a largely unexplored problem. The notion of "language design for tooling" seems to only have entered the fringes of the mainstream recently, though I think research in this area is more than two decades old. I agree with two of the other answers, namely that "static typing" and "self-similarity" are useful properties of a language to make refactoring support easier.
It is true that the particular programming language can make analysis easier.
If you want the easist-to-analyze languages, pick a purely functional
one.
But nobody in practice programs in purely functional langauges.
(The Haskell guys are going to jump up and down when they see
this, but seriously, Haskell is used only extremely rarely).
What makes a programming language analyzable is infrastructure
designed to support analysis. Ada's ASIS above, is a great example.
Don't confuse the fact that ASIS was written for Ada, or is
written in Ada; what counts is that somebody serious wanted
to analyze Ada and invested the effort to build Ada analysis
machinery.
I believe that the right cure is to build general analysis infrastructure
and amortize it across lots of languages. While
we're at it, we should build general transformation infrastructure,
too, because once you have an analysis, you'll want to use
it to effect change. (Doctor visits don't end with diagnosis;
they end with cures). And I've bet my career on it.
The result is an engine I think ideal for analysis,
refactoring, reengineering, etc:
our DMS Software Engineering Toolkit.
It has generic parsing, tree building, prettyprinting,
tree manipulation, source-to-source rewriting, attribute
grammar evaluations, control and data flow analysis.
It has production quality front ends for a number of widely used dialects
of C and C++, for Java, C#, COBOL, and PHP, and even
for Verilog and VHDL (many other langauges too,
but not quite at that level).
To give you some sense of its utility, it was used
to convert JOVIAL code for the B-2 bomber into C...
without us ever having seen the source code.
See http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html
Now, assuming one has analysis infrastructure, what language
features help?
Static types helps by limiting the set of possible values a variable can take,
but only by adding a limited single-argument predicate, e.g., "X is an integer".
I think what helps more are assertions in the code because they capture
predicates with more than one argument, which establish relationships between state variables, that often cannot be found by inspecting
the code (e.g., problem or domain specific information, e.g., "X > Y+3".)
The analysis infrastructure (and frankly, the programmers that read the code)
can ideally take advantage of such additional facts to provide a more
effective analysis.
Such assertions are commonly coded with special keywords such as "assert",
"pre(condition" and "post(condition" that are inspired with good reason
from the theorem proving literature.
But even if you don't have assertions in your language, they are
easy to encode anyway: just write an if statement with the condition containing the assertion denial, and the body doing something that calls an idiom indicating
impossibility or violates the language semantics (e.g., deref an obviously null pointer),
such as "if (x>0) fail();"
So what's really needed isn't assertions in the language, but programmers
who are willing to write them. Alas, that seems to be sadly lacking.
Reflection built into the language/type system. This makes static analysis and refactoring much less painful.
This is part of why Java and .NET tools are so commonplace and nice. This provides the tools with much better functionality in terms of understanding depdencies of source code quickly and reliably, which helps with the static analysis of source code.
In addition, you get the ability to do analysis of your compiled code, as well.
There is a language sharing "code is data" paradigm. E.g. every line of code is just data in terms of this language. This make refactoring to be as basic action as primitive data operations. And the name of this language is Lisp. ;)
Seriously speaking, "language for programming" and "language for machine" are two different requirements. And a perfect language for analyzing could be nightmare for programmer. Even more, language designed for some analysis could be not programming language at all. (Last week I met the language for pointer analysis, and it has no textual representation and only two executable statements)
And again: first you have to define the task and then solve it. For example: if the task is "I want to write safe programs, e.g. I want to be sure that I will never try to mix integral and character operands", then you need a language with static types. Ok, "I need to know at runtime what I can do with external libraries" - reflection is your choice. "I need universal programming language for interchanging, transformations and analysis" - most likely, this is not what you really want.
For refactoring: self-similarity
The ability to accept code transplants without intrusive alteration or bizarre reinterpretation. Examples:
- Extract a snippet of C++ to a new procedure, by using reference parameters to give it modifing access to variables.
- Python, Javascript and Lua methods really are just functions that have a 'self' parameter. *
- In any number of languages, a function that creates/populates a struct can be (more or less trivially) converted to a constructor.
Counterexamples...
- Ruby (modules, classes), methods lambda block and raw blocks: The differences in semantics are bewildering to say the least. (which is all I feel qualified to say for sure.)
For the (to my mind) wildly different case of automatic mangling I'm a lot less sure, but the freedom from side-effects offered by functional programming languages is really it. (Ok, so how could we offer the same thing in a language for the rest of us?)
*
Python is almost like that. (I forgot what the gotcha is. Probably something to with if method was defined in class or grafted on, runtime.)
IMO the most important property is that the language is completely specified and deterministic. For example, in C the behaviour of following code is not defined by the language specification:
x++ = x++ + ++x;
If the code's behaviour is undefined, but yet it compiles and does something, there is no safe way to automatically change it (i.e. refactor it) in a way that preserves that something.
The next important property is that it doesn't allow access to variables (fields) beyond its scope. Pointers make it possibe e.g. in C to access any variable's value simply by "guessing" the address. In a language like that, there are cases where it is not possible to tell where in the code a certain variable's value is read and/or changed. Again, there is no safe way to automatically refactor a program that might do something like that.