Plotting expression trees in R

2019-02-08 17:45发布

问题:

I know that I can create an expression tree in R using the substitute function. Let's say that I generate the following expression tree:

expT <- substitute(a+(2*b+c))

Is it possible to visualize the expression tree in R, producing something like:

I know that ( is also a function in R, but I would like to omit that in the plot.

回答1:

Here is an approach taking advantage of the function utils::getParseData and borrowing from a function written for the parser package and using igraph for the visuals. The linked function almost does what you wanted, but the data returned by the getParseData function has blank nodes with the numerical values/symbols/operators etc. on the leaves. This makes sense if you try to parse functions or ternary expressions or more complicated things.

This function simply creates an edgelist from the parse data.

## https://github.com/halpo/parser/blob/master/R/plot.parser.R
## Modified slightly to return graph instead of print/add attr
parser2graph <- function(y, ...){
    y$new.id <- seq_along(y$id)
    h <- graph.tree(0) + vertices(id = y$id, label= y$text)
    for(i in 1:nrow(y)){
        if(y[i, 'parent'])
            h <- h + edge(c(y[y$id == y[i, 'parent'], 'new.id'], y[i, 'new.id']))
    }
    h <- set_edge_attr(h, 'color', value='black')
    return(h)
}

The next function collapses the parse tree by removing all the '(){}' and remaining gaps. The idea is to first move all the labels up one level in the tree, then clip the leaves. And finally all the gaps from nested expressions ('(){}') are removed by creating/destroying edges. I colored the edges blue where levels of nesting from brackets/braces were removed.

## Function to collapse the parse tree (removing () and {})
parseTree <- function(string, ignore=c('(',')','{','}'), ...) {
    dat <- utils::getParseData(parse(text=string))
    g <- parser2graph(dat[!(dat$text %in% ignore), ])
    leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
    preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
    vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level
    g <- g - leaves                                                    # remove the leaves
    gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
    nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
    g <- add_edges(g, nebs, color='blue')                              # edges around the gaps
    g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
    plot(g, layout=layout.reingold.tilford, ...)
    title(string, cex.main=2.5)
}

An example, slightly more nested expression. The animation shows how original tree is collapsed.

## Example string
library(igraph)
string <- "(a/{5})+(2*b+c)"

parseTree(string,  # plus some graphing stuff
          vertex.color="#FCFDBFFF", vertex.frame.color=NA,
          vertex.label.font=2, vertex.label.cex=2.5,
          vertex.label.color="darkred", vertex.size=25,
          asp=.7, edge.width=3, margin=-.05)



回答2:

The following gets most of the way there. It mimics pryr:::tree to recursively examine the call tree, then assigns data.tree Nodes. I would have preferred igraph but it is intolerant of duplicate node names (e.g. + appearing twice). I also cannot get dendrogram to label any of the branches other than the root.

#install.packages("data.tree")
library(data.tree)

make_tree <- function(x) {
  if (is.atomic(x) && length(x) == 1) {
    as.character(deparse(x)[1])
  } else if (is.name(x)) {
    x <- as.character(x)
    if (x %in% c("(", ")")) {
      NULL
    } else {
      x
    }
  } else if (is.call(x)) {
    call_items <- as.list(x)
    node <- call_items[[1]]
    call_items <- call_items[-1]
    while (as.character(node) == "(" && length(call_items) > 0) {
      node <- call_items[[1]]
      call_items <- call_items[-1]
    }
    if (length(call_items) == 0) 
      return(make_tree(node))
    call_node <- Node$new(as.character(node))
    for (i in 1:length(call_items)) {
      res <- make_tree(call_items[[i]])
      if (is.environment(res))
        call_node$AddChildNode(res)
      else
        call_node$AddChild(res)
    }
    call_node
  } else
    typeof(x)
}

tree <- make_tree(quote(a+(2*b+c)))
print(tree)
plot(as.dendrogram(tree, edgetext = T), center = T, type = "triangle", yaxt = "n")

Which gives a reasonable text output:

      levelName
1 +            
2  ¦--a        
3  °--+        
4      ¦--*    
5      ¦   ¦--2
6      ¦   °--b
7      °--c    

and a graphic. The multiplication symbol doesn't appear in the mid-tree node (I can't figure out why) but otherwise, I think this does the job.



回答3:

Here's some code and results that may be helpful and least to the point of being able to "walk" the "parse tree":

> parse( text="a+(2*b+c)")
expression(a+(2*b+c))
> parse( text="a+(2*b+c)")[[1]]
a + (2 * b + c)
> parse( text="a+(2*b+c)")[[1]][[1]]
`+`
> parse( text="a+(2*b+c)")[[1]][[2]]
a
> parse( text="a+(2*b+c)")[[1]][[3]]
(2 * b + c)
> parse( text="a+(2*b+c)")[[1]][[4]]
Error in parse(text = "a+(2*b+c)")[[1]][[4]] : subscript out of bounds
> parse( text="a+(2*b+c)")[[1]][[3]][[1]]
`(`
> parse( text="a+(2*b+c)")[[1]][[3]][[2]]
2 * b + c
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[1]]
`+`
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]]
2 * b
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[3]]
c
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]][[1]]
`*`
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]][[2]]
[1] 2
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]][[3]]
b

I thought that I had seen a posting in R-help or r-devel by Thomas Lumley or Luke Tierney that did this, but have so far failed to locate it. I did find a posting by @G.Grothendieck that programmatically pulls apart a parse tree that you might build upon:

 e <- parse(text = "a+(2*b+c)") 
my.print <- function(e) { 
  L <- as.list(e) 
  if (length(L) == 0) return(invisible()) 
  if (length(L) == 1) 
     print(L[[1]]) 
     else sapply(L, my.print) 
return(invisible()) } 
my.print(e[[1]])
#----- output-----
`+`
a
`(`
`+`
`*`
[1] 2
b
c


回答4:

It’s definitely possible but I am not aware of an existing function to do so. That said, it’s a nice exercise. Have a look at Walking the AST with recursive functions (and do read the whole chapter) for basic instructions on how to operate on an expression tree.

From that, the rest is “relatively” straightforward:

  • For each node, determine the symbol to be printed.
  • Maintain a (relative) coordinate for the current node. When recursing the expression, this coordinate gets updated depending on what you do; for instance, you know that the arguments of a function call need to be centred below its call, so you can update the y coordinate accordingly, and then calculate x depending on how many arguments there are. Operators are just a special case of that.

Finally, you can use the symbols alongside their coordinates thus calculated to plot them, relative to each other.