What are the uses of self modifying code?

2019-01-16 07:42发布

问题:

Is there any real use for self modifying code?

I know that they can be used to build worms/viruses, but I was wondering whether there is some good reason that a programmer may have to use self modifying code.

Any ideas? Hypothetical situations are welcome too.

回答1:

Turns out that the Wikipedia entry on "self-modifying code" has a great list:

  1. Semi-automatic optimization of a state dependent loop.
  2. Runtime code generation, or specialization of an algorithm in runtime or loadtime (which is popular, for example, in the domain of real-time graphics) such as a general sort utility preparing code to perform the key comparison described in a specific invocation.
  3. Altering of inlined state of an object, or simulating the high-level construction of closures.
  4. Patching of subroutine address calling, as done usually at load time of dynamic libraries, or, on each invocation patching the subroutine's internal references to its parameters so as to use their actual addresses. Whether this is regarded as 'self-modifying code' or not is a case of terminology.
  5. Evolutionary computing systems such as genetic programming.
  6. Hiding of code to prevent reverse engineering, as through use of a disassembler or debugger.
  7. Hiding of code to evade detection by virus/spyware scanning software and the like.
  8. Filling 100% of memory (in some architectures) with a rolling pattern of repeating opcodes, to erase all programs and data, or to burn-in hardware.
  9. Compression of code to be decompressed and executed at runtime, e.g., when memory or disk space is limited.
  10. Some very limited instruction sets leave no option but to use self-modifying code to achieve certain functionality. For example, a "One Instruction Set Computer" machine that uses only the subtract-and-branch-if-negative "instruction" cannot do an indirect copy (something like the equivalent of "*a = **b" in the C programming language) without using self-modifying code.
  11. Altering instructions for fault-tolerance

On the point about thwarting hackers using self-modifying code:

Over the course of several firmware updates, DirectTV slowly assembled a program on their smart card to destroy cards that have been hacked to illegally receive unpaid channels. See Jeff's Coding Horror article on the Black Sunday Hack for more information.



回答2:

I've seen self-modifying code used for:

  1. speed optimisation, by having the program write more code for itself on the fly

  2. obsfucation, to make reverse engineering much harder



回答3:

In former times where RAM was limited, self modifying code was used to save memory. Nowadays for example application compression utilities like UPX are used to decompress/modify the own code after loading a compressed image of the application.



回答4:

Because the Commodore 64 doesn't have many registers and has a 1Mhz processor. When you need to read a memory address offset by a value it is easier to modify the source.

@Reader:
LDA $C000
STA $D020
INC Reader+1
JMP Reader

That's the last time I wrote self-modifying code anyway :-)



回答5:

Because it's really really cool, and sometimes that's reason enough.



回答6:

1960s-era assembly languages used self-modifying code to implement function calls without a stack.

Knuth, v1, 1ed p.182:

MAX100  STJ   EXIT   ;Subroutine linkage
        ENT3  100    ;M1. Initialize
        JMP   2F
1H      CMPA  X,3    ;M3. Compare
        JGE   *+3
2H      ENT2  0,3    ;M4. Change m
        LDA   X,3    ;(New maximum found)
        DEC3  1      ;M5. Decrease k
        J3P   1B     ;M2. All tested?
EXIT    JMP   *      ;Return to main program

In a larger program containing this coding as a subroutine, the single instruction "JMP MAX100" would cause register A to be set to the current maximum value of locations X + 1 through X + 100, and the position of the maximum would appear in rI2. Subroutine linkage in this case is achieved by the instructions "MAX100 STJ EXIT" and, later, "EXIT JMP *". Because of the way the J-register operates, the exit instruction will then jump to the location following the place where the original reference to MAX100 was made.

Edit: It may be hard to see what's going on, even with the brief explanation here. In the line MAX100 STJ EXIT, MAX100 is a label for the instruction (and thus for the procedure as a whole), STJ means STORE the jump register (where we just came from), EXIT means the memory location labeled 'EXIT' is the target of the STORE. EXIT, we see later is the label for the last instruction. So it's overwriting code! But, many instructions (including STJ here) implicitly overwrite only the operand portion of the instruction word. So the JMP remains untouched, and the * is a dummy token, since there's really nothing meaningful to put there, it'd only get overwritten.


Self-modifying code is also used where register-indirect addressing is not available, and yet the address you need is sitting right there in the register. PDP-1 LISP:

dap .+1  ;deposit address part of accumulator in (IP+1)
lac xy   ;load accumulator with (ADDRESS) [xy is a dummy symbol, just like * above]

These two instructions perform ACC := (ACC) by modifying the operand of the load instruction.

Modifications like these are relatively safe, and on antique architectures, they are necessary.



回答7:

Lots of reasons. Off the top of my head:

  • Runtime class construction and meta programming. For example, having a class factory that takes a connection to an SQL table and generates a client class specialized for that table (with accessors for the columns, find methods, etc.).

  • Then of course there's the famous bitblt example, and the regexp analogs.

  • Dynamically optimizing based on RT information a la tracing JITs

  • Subtype specialization of ada style generic functions in an accretive environment.

-- MarkusQ



回答8:

Artificial Intelligence?



回答9:

Dynamic linking is a kind of self-modification (patching absolute and/or relative jump locations) ... that's normally done by the O/S's program loader, though.



回答10:

Neural networks are kind of self-modifying code.

Then there are evolutionary algorithms which modify themselves.



回答11:

LOL - i've written self-modifying code on two occasions:

  1. when first learning assembly language, before i understood indirect indexed access
  2. accidentally, as pointer bugs in assembly language and C

i can imagine that there may be scenarios where self-modifying code would be more efficient than alternatives, but nothing obvious leaps to mind. In general, this is something to avoid - debugging nightmare, etc. - unless you are deliberately trying to obfuscate as mentioned above.



回答12:

Mike Abrash described the Pixomatic code generator for Dr. Dobb's Journal a while back: http://www.ddj.com/architect/184405807 . That's a software 3d dx7(?) compatible rasterizer.



回答13:

Applications which implement their own scripting languages often do this. For example, database servers often compile stored procedures (or queries) this way.



回答14:

Dynamic code generation in SwiftShader is a form of self modifying code that enables it to efficiently implement Direct3D 9 on the CPU.