My collegue suggested me to write a visitor pattern to navigate the AST. Can anyone tell me more how would I start writing it?
As far as I understand, each Node in AST would have visit()
method (?) that would somehow get called (from where?). That about concludes my understanding.
To simplify everything, suppose I have nodes Root
, Expression
, Number
, Op
and the tree looks like this:
Root
|
Op(+)
/ \
/ \
Number(5) \
Op(*)
/ \
/ \
/ \
Number(2) Number(444)
Can anyone think of how the visitor pattern would visit this tree to produce output:
5 + 2 * 444
Thanks, Boda Cydo.
Wikipedia has a great overview of how the Visitor pattern works, although the sample implementation that they use is in Java. You can easily port that to Python, though, no?
Basically, you want to implement a mechanism for double dispatch. Each node in your AST would need to implement an accept()
method (NOT a visit()
method). The method takes, as an argument, a visitor object. In the implementation of this accept()
method, you call a visit()
method of the visitor object (there will be one for each AST node type; in Java, you'll use parameter overloading, in Python I suppose you can use different visit_*()
methods). The correct visitor will then be dispatched with the correct Node type as argument.
See the docs for ast.NodeVisitor
, e.g. a crude possibility might be:
import ast
class MyVisitor(ast.NodeVisitor):
def visit_BinaryOp(self, node):
self.visit(node.left)
print node.op,
self.visit(node.right)
def visit_Num(self, node):
print node.n,
of course this doesn't emit parentheses even where needed, etc, so there's actually more work done, but, it's a start;-).
The two variants to implement the Visitor pattern in Python that I encountered on the Internet most often:
- One-to-one translation of the example from the Desigh Patterns book by Gamma et al.
- Using additional modules for double-dispatch
Translated Example from the Desigh Patterns Book
This variant uses accept()
methods in the data structure classes and corresponding visit_Type()
methods in the visitors.
The data structure
class Operation(object):
def __init__(self, op, arg1, arg2):
self.op = op
self.arg1 = arg1
self.arg2 = arg2
def accept(self, visitor):
visitor.visitOperation(self)
class Integer(object):
def __init__(self, num):
self.num = num
def accept(self, visitor):
visitor.visitInteger(self)
class Float(object):
def __init__(self, num):
self.num = num
def accept(self, visitor):
visitor.visitFloat(self)
expression = Operation('+', Integer('5'),
Operation('*', Integer('2'), Float('444.1')))
Infix Print Visitor
class InfixPrintVisitor(object):
def __init__(self):
self.expression_string = ''
def visitOperation(self, operation):
operation.arg1.accept(self)
self.expression_string += ' ' + operation.op + ' '
operation.arg2.accept(self)
def visitInteger(self, number):
self.expression_string += number.num
def visitFloat(self, number):
self.expression_string += number.num
Prefix Print Visitor
class PrefixPrintVisitor(object):
def __init__(self):
self.expression_string = ''
def visitOperation(self, operation):
self.expression_string += operation.op + ' '
operation.arg1.accept(self)
self.expression_string += ' '
operation.arg2.accept(self)
def visitInteger(self, number):
self.expression_string += number.num
def visitFloat(self, number):
self.expression_string += number.num
Test
infixPrintVisitor = InfixPrintVisitor()
expression.accept(infixPrintVisitor)
print(infixPrintVisitor.expression_string)
prefixPrintVisitor = PrefixPrintVisitor()
expression.accept(prefixPrintVisitor)
print(prefixPrintVisitor.expression_string)
Output
5 + 2 * 444.1
+ 5 * 2 444.1
Using additional modules
This variant uses @functools.singledispatch()
decorator (available in the Python Standard Library since Python v3.4).
The data structure
class Operation(object):
def __init__(self, op, arg1, arg2):
self.op = op
self.arg1 = arg1
self.arg2 = arg2
class Integer(object):
def __init__(self, num):
self.num = num
class Float(object):
def __init__(self, num):
self.num = num
expression = Operation('+', Integer('5'),
Operation('*', Integer('2'), Float('444.1')))
Infix Print Visitor
from functools import singledispatch
@singledispatch
def visitor_print_infix(obj):
pass
@visitor_print_infix.register(Operation)
def __(operation):
return visitor_print_infix(operation.arg1) + ' ' \
+ operation.op + ' ' \
+ visitor_print_infix(operation.arg2)
@visitor_print_infix.register(Integer)
@visitor_print_infix.register(Float)
def __(number):
return number.num
Prefix Print Visitor
from functools import singledispatch
@singledispatch
def visitor_print_prefix(obj):
pass
@visitor_print_prefix.register(Operation)
def __(operation):
return operation.op + ' ' \
+ visitor_print_prefix(operation.arg1) + ' ' \
+ visitor_print_prefix(operation.arg2)
@visitor_print_prefix.register(Integer)
@visitor_print_prefix.register(Float)
def __(number):
return number.num
Test
print(visitor_print_infix(expression))
print(visitor_print_prefix(expression))
Output
5 + 2 * 444.1
+ 5 * 2 444.1
The reason I prefer this variant is that it eliminates the accept()
methods and completely separates the data structure from the operations implemented in the visitors. Extending the data structure with new elements does not require changing the visitors. The visitors ignore the unknown element types by default (see the definitions with the pass
keyword). A drawback of this method is that singledispatch
decorator cannot be used with instance methods directly, although, there are ways to make it work.
For Python before v3.4 multimethods module can be used similar to the singledispatch decorator. One drawback of the multimethods module is that the visitor method that is applied to a given data-structure element is selected not only based on the element's type but also on the order in which the methods are declared. Keeping the method definitions in the right order can be cumbersome and error prone for data structures with a complex inheritance hierarchy.