I've searched all over the Internet and could not find a proper explanation of how backpatching works?
Can you please explain me how does backpatching works? How does it work with the markers?
I know it has 2 main types of markers:
- with next-quad in it
- with next-list in it
I found this code, in which they are taking an input file and creating a file with RISKI language.
In their first roll they have:
PROGRAM : N FUNCTION M MAIN_FUNCTION
and you can see that N and M are markers (they are empty rolls).
One-pass code generation has a small problem with generating code for conditionals. A typical if
statement:
if CONDITION then ALTERNATIVE_1 else ALTERNATIVE_2
needs to be compiled into something like this:
compute CONDITION
JUMP_IF_TRUE label1
JUMP_IF_FALSE label2
label1:
code for ALTERNATIVE_1
JUMP label3
label2:
code for ALTERNATIVE_2
JUMP label3
label3:
next statement
But when the code for CONDITION
is being generated, it is not known where label1
and label2
are, and when the code for ALTERNATIVE_1
and ALTERNATIVE_2
is being generated, it is not known where label3
is.
One way to do this is to use symbolic names for the the labels, as in the above pseudocode, and fill in the actual values when they are known. That requires storing a symbolic name in the jump statement, which complicates the datastructures (in particular, you can't just use a binary assembler code). It also requires a second pass, just to fill in jump targets.
A (possibly) simpler way is to just remember the address of the jump statements, and patch in the target address when it is known. This is known as "backpatching", because you go back and patch the generated code.
It turns out that in many cases, you end up with multiple branches to the same label. A typical case is "short-circuit" booleans like the C-family's &&
and ||
operators. For example, extending the original example:
if (CONDITION_1 and CONDITION_2) or CONDITION_3 then ALTERNATIVE_1 else ALTERNATIVE_2
compute CONDITION_1
JUMP_IF_TRUE label1
JUMP_IF_FALSE label2
label1:
compute CONDITION_2
JUMP_IF_TRUE label3
JUMP_IF_FALSE label2
label2:
compute CONDITION_3
JUMP_IF_TRUE label3
JUMP_IF_FALSE label4
label3:
code for ALTERNATIVE_1
JUMP label5
label4:
code for ALTERNATIVE_2
JUMP label5
label5:
next statement
It turns out that for simple languages, it is only necessary to remember two incomplete jump statements (often called "true" and "false"). Because there might be multiple jumps to the same target, each of these is actually a linked list of incomplete jump statements, where the target address is used to point to the next jump in the list. So the backpatching walks back through the list, patching in the correct target and using the original target to find the previous statement which needs to be patched.
What you call markers (which are an instance of what yacc/bison refers to as "Mid-Rule Productions") are not really related to backpatching. They can be used for many purposes. In one-pass code-generation, it is often necessary to perform some action in the middle of a rule, and backpatching is just one example.
For example, in the hypothetical if
statement, it would be necessary to initialize the backpatch lists before the CONDITION
is parsed and then backpatch at the beginning of the THEN
and ELSE
clauses. (Another backpatch will be triggered at the end of the parse of the entire if
statement, but that one will be in the rule's finnal action.)
The easiest way to perform actions in the middle of a rule is to insert a mid-rule action, which is equivalent to inserting an empty "marker" production with an action, as in the example bison file you point to.