Homoiconic and “unrestricted” self modifying code

2019-01-21 17:21发布

问题:

I will be forward in admiting that my knowledge of Lisp is extremely minimal. However I am extremely interested in the language and plan to begin seriously learning it in the near future. My understanding of these issues is no doubt flawed, so if I say anything which is blatently wrong, please comment and correct me rather than downvoting.

Truly Homoiconic and Self-modifiable languages

I'm looking for examples of programming languages which support both Homoiconicity (Code has the same representation as data) and unrestricted self modification (Unrestricted meaning that you can change every aspect of your running code, not merely emit new code or change function pointers/delegates.)

There are but three examples I have found so far which fit this criteria:

  1. Machine code. Homoiconic in that everything is a number. Unrestrictedly modifiable in that it includes pointers, which can be used to manipulate any memory address regardless of whether that address holds code or data.
  2. Malbolge. Same reasoning as Machine code. Every instruction modifies itself after being executed
  3. DNA. Not a programming language, but still interesting. It isn't self modifying in the same sense as machine code is; Where the actual instructions + data are modified in place. However it is self replicating and can mutate/evolve according to it's previous state (With side effects such as radiation screwing it up every now and then). This is just an indirect way of self modification anyway. In short, DNA can self modify, but it does so by reproducing itself in it's entirity along with the relevant mutations. A physical string of DNA is "immutable".

Why Lisp is not on this list

Lisp is not on that list because it appears to me that Lisp is only almost Homoiconic, and only supports restricted self modification. You can do something like

(+ 1 2 3)

which will do the same thing as

(eval '(+ 1 2 3))

In the first version (+ 1 2 3) is raw code, whereas in the second version it is data. By assuming the truth of this statement it can be argued that Lisp isn't even homiconic. The code has the same representation as data in the sense that they are both lists/trees/S-expressions. But the fact that you have to explicitly mark which of these lists/trees/S-expressions are code and which are data to me seems to say that Lisp is not homiconic after all. The representations are extremely similar, but they differ in the tiny detail that you have to actually say whether you are dealing with code or data. This is not in any way a bad thing (In fact anything else would be madness), but it highlights a difference between Lisp and machine code. In machine code you don't have to explicitly mark which numbers are instructions, which are pointers, and which are data. Everything is simply a number until an interpretation is actually required, at which point it could be any of those things.

It's an even stronger case against unrestricted self modification. Sure, you can take a list that represents some code and manipulate it. For example changing

'(+ 1 2 3)

to

'(+ 1 4 3)

And then you run it through eval. But when you do this you're just compiling some code and running it. You're not modifying existing code, you're just emitting and running new code. C# can do exactly the same thing using expression trees, even if in a less convenient format (which arises due to C# code having a different representation to its AST, as opposed to Lisp, which is its own AST). Can you actually take an entire source file and start modifying that entire source file as it is running, with the changes made to the source file having realtime effects on the program behaviour?

Unless there is some way to do this, Lisp is neither homiconic nor self modifying. (To put off an argument over definitions, Lisp is not homoiconic or self modifying to the same degree that machine code is.)

Ways to make Lisp Homoiconic/Unrestrictedly self-modifiable

I can see 3 potential ways to make Lisp as homoiconic/self-modifiable as machine code.

  1. Non-Von-Neumann architecture. If someone could invent some amazing hypothetical machine where the lowest level representation of programs is an AST which can be executed directly (No further compilation is necessary). On such a machine an AST would represent both executable instructions, as well as data. Unfortunately the problem won't have been solved, because the AST still has to be either code or data. The prescence of an eval function doesn't change this. In machine code you can flip back and forth between code and data as much as you want. Whereas with eval and Lisp once you've "evaled" some list from data to code and executed it, there's no way to get that list back as data again. In fact, that list is gone forever and has been replaced with it's value. We'd be missing something crucial, which just so happens to be pointers.
  2. List Labels. If it was a requirement that every list also have a unique label, it would be possible to do indirect self modification by running functions against a list with a given label. Combined with continuations this would finally allow for self modifying code in the same sense that machine code has it. The labels are equivilant to machine code memory addresses. As an example, consider a Lisp program where the top node of the AST has the label "main". Inside main you could then execute a function that takes a label, an Integer, an Atom, and copies the atom to the List with a label that matches the one supplied to the function, at the index specified by the Integer. Then just call with current continuation on main. There you go, self modifying code.
  3. Lisp Macros. I haven't taken the time to understand Lisp macros, and they may in fact do exactly what I'm thinking about.

Point 1. combined with 2. would produce a totally self-modifying Lisp. Provided that the magical Lisp machine described could be produced. 2. alone could produce a self-modifying Lisp, however the implementation on a Von Neumann architecture could be extremely innefficient.

The Questions

  1. Are the any languages other than machine code, dna and malbolge which can do total self modification and are homoiconic?
  2. (DO NOT bother answering if you did a tl;dr at the above text). Is lisp really homoiconic + self modifying? If you say so, can you quote exactly where in my argument I went astray?

Appendix

Languages with unrestricted self modification but no homiconicity

  1. Assembly. The code uses words as opposed to numbers, so loses homiconicity, but it still has pointers, which retains the total control over memory and allows for unrestricted self modification.
  2. Any language that uses raw pointers. For example C/C++/Objective C. Same argument as Assembly
  3. JIT languages that include virtual pointers. For example C#/.net running in an unsafe context. Same argument as Assembly.

Other concepts and languages that may be somehow relevant/interesting: Lisp, Ruby, Snobol, Forth and it's compile time metaprogramming, Smalltalk and it's reflection, the untyped lambda calculus with it's property that everything is a function (Which kind of implies that assuming we could invent a machine which executes lambda calculus directly, lambda calculus would be homoiconic and Von Neumann machine code would not when run on said machine. [And Godels theorem would be executable. Haha, scary thought :P])

回答1:

In the first version (+ 1 2 3) is raw code, whereas in the second version it is data. By assuming the truth of this statement it can be argued that Lisp isn't even homiconic. The code has the same representation as data in the sense that they are both lists/trees/S-expressions. But the fact that you have to explicitly mark which of these lists/trees/S-expressions are code and which are data to me seems to say that Lisp is not homiconic after all.

This is not true. In the first version, the list (+ 1 2 3), which is data, is being fed to the interpreter to be executed, i.e. to be interpreted as code. The fact that you have to mark s-expressions as being code or data in a specific context does not make Lisp non-homoiconic.

The point of homoiconicity is that all programs are data, not that all data are programs, so there is still a difference between the two. In Lisp, (1 2 3) is a valid list but not a valid program since an integer is not a function.

[If we look at the other great homoiconic programming language, Prolog, then we see the same phenomenon: we can build a data structure foo(X, 1, bar), but without a definition of foo, we can't execute it. Also, variables cannot be the names of predicates or facts, so X. is never a valid program.]

Lisp is self-modifying to a great degree. E.g., here's how to change the definition of a function:

[1]> (defun foo (x) (+ x 1))
FOO
[2]> (defun bar (x) (+ x 2))
BAR
[3]> (setf (symbol-function 'foo) #'bar)
#<FUNCTION BAR (X) (DECLARE (SYSTEM::IN-DEFUN BAR)) (BLOCK BAR (+ X 2))>
[4]> (foo 3)
5

Explanation: at [1], we defined the function foo to be the add-1 function. At [2], we defined bar to be the add-2 function. At [3], we reset foo to the add-2 function. At [4], we see that we've successfully modified foo.



回答2:

Lisp is a family of programming languages. Members of this family are widely different in their capabilities and implementation techniques. There are two different interpretations of this:

  • Lisp is a family of languages which share some overlapping feature set. This includes everything from the first Lisp, to Maclisp, Scheme, Logo, Common Lisp, Clojure and hundreds of different languages and their implementations.

  • Lisp also has a main branch of languages which also mostly have 'Lisp' in their name: Lisp, MacLisp, Common Lisp, Emacs Lisp, ...

Lots of research has been invested over time to improve the languages (make it more 'Functional' or make it more object-oriented, or both) and to improve implementation techniques.

Common Lisp for example supports compilation, various optimizations, and more to allow developers to use it in large projects where some balance between flexibility and features is needed. A compiled function is then machine code and no longer a data structure made of lists, symbols, strings and numbers.

Common Lisp allows implementations to create static code. At the same time it leaves some place for controlled runtime modifications (for example by using a runtime compiler, loading code, evaluating code, replacing code, ...).

OTOH if you have a Lisp implementation with an interpreter and additionally the interpreter might use Lisp data structures to represent source, then you are able to change running programs for example by changing the list structure. There are implementations of dialects of Lisp which allow that. One typical thing is the use of macros, which can be computed at runtime by such a system. Some other Lisps have so-called Fexprs, which are a similar mechanism (but which usually can't be effectively compiled).

In a Common Lisp based application (say, a CAD system written in Lisp) where much of the source information has been removed by a delivery tool, this would not be possible. One would have a single executable of machine code, which has much of the runtime flexibility removed.

Homoiconicity is also not a very well defined concept. For Lisp I like more that we say that source code can be turned into data, because the source code uses the external representation of s-expressions, which is a data serialization format. But not every s-expression is a valid Lisp program. Also an internal representation of source code as Lisp data is not 'iconic' in any way. Lisp has source code as external s-expressions and after reading the source code, it has an internal representation as Lisp data. READ reads it, PRINT prints it and EVAL evaluates it.

There are also other approaches in Lisp to provide access to the Lisp which is running your program and to the program:

  • the meta-object protocol in CLOS (the Common Lisp Object System) is such an example

  • 3Lisp provides an infinite tower of interpreters. There is a Lisp interpreter running your program. This Lisp interpreter is ran by another Lisp interpreter, which is again running in another one, and that one too...



回答3:

Macros can avoid quoting, if that's what you're after:

> (defmacro foo (x) (cdr x))
> (foo (+ - 5 2))
3

Is (+ - 5 2) code or data? At write-time it seems like data. After macro-expansion time it seems like code. And if the definition of foo was somewhere else, it could just as easily be (incorrectly) thought of as a function, in which case (+ - 5 2) will be thought of as code that behaves like data that behaves like code.



回答4:

I come across as a fanboy here, and I've said it before, but read Paul Graham's On Lisp if you want to learn about Lisp Macros. They're a big deal in terms of enabling modifications that would be otherwise infeasible. In addition, I think it's important to distinguish here between Lisp the language family, a given Lisp, and a given Lisp implementation.

I suppose the main issue I take with your argument comes in the first paragraph after "Why Lisp is not on this list.", and has to do with the REPL in Lisp. When you enter the exp (+ 1 2 3) into the interpreter, you are in fact calling the function EVAL on the list (+ 1 2 3). The "raw code" you describe is in fact "data", being fed to other lisp code, it's just data in a specific context.