How does backpatching work with markers?

2019-07-12 10:49发布

问题:

I've searched all over the Internet and could not find a proper explanation of how backpatching works?

Can you please explain me how does backpatching works? How does it work with the markers?

I know it has 2 main types of markers:

  1. with next-quad in it
  2. with next-list in it

I found this code, in which they are taking an input file and creating a file with RISKI language.

In their first roll they have:

PROGRAM : N FUNCTION M MAIN_FUNCTION

and you can see that N and M are markers (they are empty rolls).

回答1:

One-pass code generation has a small problem with generating code for conditionals. A typical if statement:

if CONDITION then ALTERNATIVE_1 else ALTERNATIVE_2

needs to be compiled into something like this:

  compute CONDITION
  JUMP_IF_TRUE label1
  JUMP_IF_FALSE label2

label1:
  code for ALTERNATIVE_1
  JUMP label3

label2:
  code for ALTERNATIVE_2
  JUMP label3

label3:
  next statement

But when the code for CONDITION is being generated, it is not known where label1 and label2 are, and when the code for ALTERNATIVE_1 and ALTERNATIVE_2 is being generated, it is not known where label3 is.

One way to do this is to use symbolic names for the the labels, as in the above pseudocode, and fill in the actual values when they are known. That requires storing a symbolic name in the jump statement, which complicates the datastructures (in particular, you can't just use a binary assembler code). It also requires a second pass, just to fill in jump targets.

A (possibly) simpler way is to just remember the address of the jump statements, and patch in the target address when it is known. This is known as "backpatching", because you go back and patch the generated code.

It turns out that in many cases, you end up with multiple branches to the same label. A typical case is "short-circuit" booleans like the C-family's && and || operators. For example, extending the original example:

if (CONDITION_1 and CONDITION_2) or CONDITION_3 then ALTERNATIVE_1 else ALTERNATIVE_2

  compute CONDITION_1
  JUMP_IF_TRUE label1
  JUMP_IF_FALSE label2

label1:
  compute CONDITION_2
  JUMP_IF_TRUE label3
  JUMP_IF_FALSE label2

label2:
  compute CONDITION_3
  JUMP_IF_TRUE label3
  JUMP_IF_FALSE label4

label3:
  code for ALTERNATIVE_1
  JUMP label5

label4:
  code for ALTERNATIVE_2
  JUMP label5

label5:
  next statement

It turns out that for simple languages, it is only necessary to remember two incomplete jump statements (often called "true" and "false"). Because there might be multiple jumps to the same target, each of these is actually a linked list of incomplete jump statements, where the target address is used to point to the next jump in the list. So the backpatching walks back through the list, patching in the correct target and using the original target to find the previous statement which needs to be patched.

What you call markers (which are an instance of what yacc/bison refers to as "Mid-Rule Productions") are not really related to backpatching. They can be used for many purposes. In one-pass code-generation, it is often necessary to perform some action in the middle of a rule, and backpatching is just one example.

For example, in the hypothetical if statement, it would be necessary to initialize the backpatch lists before the CONDITION is parsed and then backpatch at the beginning of the THEN and ELSE clauses. (Another backpatch will be triggered at the end of the parse of the entire if statement, but that one will be in the rule's finnal action.)

The easiest way to perform actions in the middle of a rule is to insert a mid-rule action, which is equivalent to inserting an empty "marker" production with an action, as in the example bison file you point to.