Generating strings and executing them as programs

2019-04-17 04:50发布

问题:

This is a tough question to word and I'm not sure what the proper term for it would be (if any). I'm curious what languages allow you to "build up" a string during program execution, and then execute it as part of the program. The only language that I know of that allows you to do this is Snobol.

Reading the wikipedia entry for Tcl however, it sounds like it may be able to do this also?

I always thought this was a nifty feature even if it may not be used much. Thanks.

PS: Would tag this with Snobol, Spitbol, but don't have the reputation to create new tags.

回答1:

I'm curious what languages allow you to "build up" a string during program execution, and then execute it as part of the program.

Look for languages that support eval, or, more generally, runtime meta-programming. Pretty much every language supports an eval (even strongly, statically typed languages like Haskell). Many runtimes built for languages that are primarily implemented via bytecode interpretation (such as Lisp-like languages, Erlang or Java) support the ability to insert new (byte)code at runtime. Once you can insert new code dynamically, you can write eval, or do "monkey patching".

Even in language implementations without specific support for full meta-programming, or even dynamic linking, there are often ways to dynamically generate code under programmer control, either via reflection mechanisms or code generation support libraries (such as LLVM).

Beyond just a simple one-stage eval, more generally, languages that support multi-stage computation allow for generation of programs from one stage to the next, for arbitrary numbers of stages, making it possible to safely, arbitrarily nest evals.

To quote Taha, who's thesis on multi-stage programming models introduces much of the theory.

Program generation is a powerful and pervasive technique for the development of software. It has been used to improve code reuse, product reliability and maintainability, performance and resource utilization, and developer productivity

The languages you're looking for usually provide three primitives, in some form or another:

  • delay
  • splice
  • run

for delaying computation by one stage (e.g. quoting a fragment as a string), splicing it into a running program, and executing that fragment (in Lisp, back-quote, comma, and eval).

Lisp and eval

  • McCarthy, John, History of LISP, SIGPLAN Not. 1978. -- introduces eval

Generalizing eval to multi-stage programming

On multi-stage programming:

  • Taha, Multi-Stage Programming: Its Theory and Applications
  • Nielson, Flemming and Nielson, Hanne Riis, Two-level functional languages, -- introduced 2-level languages.
  • Taha, Walid and Sheard, Tim, Multi-stage programming with explicit annotations -- simple operators to support all runtime metaprogramming techniques.

Giving types to multi-stage programming

Formal descriptions of multi-stage computation are quite tricky, and involve unusual techniques (for programming languages) like modal logic.

Giving types to meta-programs:

  • Wickline, Philip and Lee, Peter and Pfenning, Frank and Davies, Rowan, Modal types as staging specifications for run-time code generation.

Security issues

The trickiness of formalzing the semantics of multi-stage programming explains why they're often confusing systems to work with, and why eval can open up so many security concerns: it becomes unclear what code is executing when, and exactly what data is being turned into code. Getting name capture from one stage to the next is tricky, leading to code injection attacks. Such complexity doesn't help security.



回答2:

Definitely can be done in a lot of interpreted scripting languages. And some languages are specifically designed for this. It can be done, to my knowledge, in:

  • Perl
  • PHP
  • Lisp (and dialects, like CL, Clojure, Scheme, etc.)
  • JavaScript


回答3:

It can be done in all Lisp dialects, where this feature originated under the name eval, as well as in Prolog (call/1) and any number of other languages. Most keep the name eval and most are dynamic languages.

That being said, this is hardly a nifty feature. I'd call it a major security issue, given how easy it is to abuse this feature. If you want dynamic code execution, then writing your own, restricted, micro-interpreter (or using something like Lua) is almost always a better idea.