I know that I can create an expression tree in R using the substitute
function. Let's say that I generate the following expression tree:
expT <- substitute(a+(2*b+c))
Is it possible to visualize the expression tree in R, producing something like:
I know that (
is also a function in R, but I would like to omit that in the plot.
Here is an approach taking advantage of the function utils::getParseData
and borrowing from a function written for the parser
package and using igraph
for the visuals. The linked function almost does what you wanted, but the data returned by the getParseData
function has blank nodes with the numerical values/symbols/operators etc. on the leaves. This makes sense if you try to parse functions or ternary expressions or more complicated things.
This function simply creates an edgelist from the parse data.
## https://github.com/halpo/parser/blob/master/R/plot.parser.R
## Modified slightly to return graph instead of print/add attr
parser2graph <- function(y, ...){
y$new.id <- seq_along(y$id)
h <- graph.tree(0) + vertices(id = y$id, label= y$text)
for(i in 1:nrow(y)){
if(y[i, 'parent'])
h <- h + edge(c(y[y$id == y[i, 'parent'], 'new.id'], y[i, 'new.id']))
}
h <- set_edge_attr(h, 'color', value='black')
return(h)
}
The next function collapses the parse tree by removing all the '(){}' and remaining gaps. The idea is to first move all the labels up one level in the tree, then clip the leaves. And finally all the gaps from nested expressions ('(){}') are removed by creating/destroying edges. I colored the edges blue where levels of nesting from brackets/braces were removed.
## Function to collapse the parse tree (removing () and {})
parseTree <- function(string, ignore=c('(',')','{','}'), ...) {
dat <- utils::getParseData(parse(text=string))
g <- parser2graph(dat[!(dat$text %in% ignore), ])
leaves <- V(g)[!degree(g, mode='out')] # tree leaves
preds <- sapply(leaves, neighbors, g=g, mode="in") # their predecessors
vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves) # bump labels up a level
g <- g - leaves # remove the leaves
gaps <- V(g)[!nchar(vertex_attr(g, 'label'))] # gaps where ()/{} were
nebs <- c(sapply(gaps, neighbors, graph=g, mode='all')) # neighbors of gaps
g <- add_edges(g, nebs, color='blue') # edges around the gaps
g <- g - V(g)[!nchar(vertex_attr(g, 'label'))] # remove leaves/gaps
plot(g, layout=layout.reingold.tilford, ...)
title(string, cex.main=2.5)
}
An example, slightly more nested expression. The animation shows how original tree is collapsed.
## Example string
library(igraph)
string <- "(a/{5})+(2*b+c)"
parseTree(string, # plus some graphing stuff
vertex.color="#FCFDBFFF", vertex.frame.color=NA,
vertex.label.font=2, vertex.label.cex=2.5,
vertex.label.color="darkred", vertex.size=25,
asp=.7, edge.width=3, margin=-.05)
The following gets most of the way there. It mimics pryr:::tree
to recursively examine the call tree, then assigns data.tree
Nodes. I would have preferred igraph
but it is intolerant of duplicate node names (e.g. +
appearing twice). I also cannot get dendrogram
to label any of the branches other than the root.
#install.packages("data.tree")
library(data.tree)
make_tree <- function(x) {
if (is.atomic(x) && length(x) == 1) {
as.character(deparse(x)[1])
} else if (is.name(x)) {
x <- as.character(x)
if (x %in% c("(", ")")) {
NULL
} else {
x
}
} else if (is.call(x)) {
call_items <- as.list(x)
node <- call_items[[1]]
call_items <- call_items[-1]
while (as.character(node) == "(" && length(call_items) > 0) {
node <- call_items[[1]]
call_items <- call_items[-1]
}
if (length(call_items) == 0)
return(make_tree(node))
call_node <- Node$new(as.character(node))
for (i in 1:length(call_items)) {
res <- make_tree(call_items[[i]])
if (is.environment(res))
call_node$AddChildNode(res)
else
call_node$AddChild(res)
}
call_node
} else
typeof(x)
}
tree <- make_tree(quote(a+(2*b+c)))
print(tree)
plot(as.dendrogram(tree, edgetext = T), center = T, type = "triangle", yaxt = "n")
Which gives a reasonable text output:
levelName
1 +
2 ¦--a
3 °--+
4 ¦--*
5 ¦ ¦--2
6 ¦ °--b
7 °--c
and a graphic. The multiplication symbol doesn't appear in the mid-tree node (I can't figure out why) but otherwise, I think this does the job.
Here's some code and results that may be helpful and least to the point of being able to "walk" the "parse tree":
> parse( text="a+(2*b+c)")
expression(a+(2*b+c))
> parse( text="a+(2*b+c)")[[1]]
a + (2 * b + c)
> parse( text="a+(2*b+c)")[[1]][[1]]
`+`
> parse( text="a+(2*b+c)")[[1]][[2]]
a
> parse( text="a+(2*b+c)")[[1]][[3]]
(2 * b + c)
> parse( text="a+(2*b+c)")[[1]][[4]]
Error in parse(text = "a+(2*b+c)")[[1]][[4]] : subscript out of bounds
> parse( text="a+(2*b+c)")[[1]][[3]][[1]]
`(`
> parse( text="a+(2*b+c)")[[1]][[3]][[2]]
2 * b + c
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[1]]
`+`
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]]
2 * b
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[3]]
c
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]][[1]]
`*`
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]][[2]]
[1] 2
> parse( text="a+(2*b+c)")[[1]][[3]][[2]][[2]][[3]]
b
I thought that I had seen a posting in R-help or r-devel by Thomas Lumley or Luke Tierney that did this, but have so far failed to locate it. I did find a posting by @G.Grothendieck that programmatically pulls apart a parse tree that you might build upon:
e <- parse(text = "a+(2*b+c)")
my.print <- function(e) {
L <- as.list(e)
if (length(L) == 0) return(invisible())
if (length(L) == 1)
print(L[[1]])
else sapply(L, my.print)
return(invisible()) }
my.print(e[[1]])
#----- output-----
`+`
a
`(`
`+`
`*`
[1] 2
b
c
It’s definitely possible but I am not aware of an existing function to do so. That said, it’s a nice exercise. Have a look at Walking the AST with recursive functions (and do read the whole chapter) for basic instructions on how to operate on an expression tree.
From that, the rest is “relatively” straightforward:
- For each node, determine the symbol to be printed.
- Maintain a (relative) coordinate for the current node. When recursing the expression, this coordinate gets updated depending on what you do; for instance, you know that the arguments of a function call need to be centred below its call, so you can update the
y
coordinate accordingly, and then calculate x
depending on how many arguments there are. Operators are just a special case of that.
Finally, you can use the symbols alongside their coordinates thus calculated to plot them, relative to each other.