“for” loop only adds the final ggplot layer

2019-01-01 13:30发布

Summary: When I use a "for" loop to add layers to a violin plot (in ggplot), the only layer that is added is the one created by the final loop iteration. Yet in explicit code that mimics the code that the loop would produce, all the layers are added.

Details: I am trying to create violin graphs with overlapping layers, to show the extent that estimate distributions do or do not overlap for several survey question responses, stratified by place. I want to be able to include any number of places, so I have one column in by dataframe for each place, and am trying to use a "for" loop to generate one ggplot layer per place. But the loop only adds the layer from the loop's final iteration.

This code illustrates the problem, and some suggested approaches that failed:

library(ggplot2) 

# Create a dataframe with 500 random normal values for responses to 3 survey questions from two cities
topic <- c("Poverty %","Mean Age","% Smokers")
place <- c("Chicago","Miami")
n <- 500
mean <- c(35,  40,58,  50, 25,20)
var  <- c( 7, 1.5, 3, .25, .5, 1)
df <- data.frame( topic=rep(topic,rep(n,length(topic)))
                 ,c(rnorm(n,mean[1],var[1]),rnorm(n,mean[3],var[3]),rnorm(n,mean[5],var[5]))
                 ,c(rnorm(n,mean[2],var[2]),rnorm(n,mean[4],var[4]),rnorm(n,mean[6],var[6]))
                )
names(df)[2:dim(df)[2]] <- place  # Name those last two columns with the corresponding place name.
head(df) 

# This "for" loop seems to only execute the final loop (i.e., where p=3)
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
  g <- g + geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3)
}
g

# But mimicing what the for loop does in explicit code works fine, resulting in both "place"s being displayed in the graph.
g <- ggplot(df, aes(factor(topic), df[,2]))
g <-   g + geom_violin(aes(y = df[,2], colour = place[2-1]), alpha = 0.3)
g <-   g + geom_violin(aes(y = df[,3], colour = place[3-1]), alpha = 0.3)
g

## per http://stackoverflow.com/questions/18444620/set-layers-in-ggplot2-via-loop , I tried 
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
  df1 <- df[,c(1,p)]
  g <- g + geom_violin(aes(y = df1[,2], colour = place[p-1]), alpha = 0.3)
}
g
# but got the same undesired result

# per http://stackoverflow.com/questions/15987367/how-to-add-layers-in-ggplot-using-a-for-loop , I tried
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in names(df)[-1]) {
  cat(p,"\n")
  g <- g + geom_violin(aes_string(y = p, colour = p), alpha = 0.3)  # produced this error: Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
  # g <- g + geom_violin(aes_string(y = p            ), alpha = 0.3)  # produced this error: Error: stat_ydensity requires the following missing aesthetics: y
}
g
# but that failed to produce any graphic, per the errors noted in the "for" loop above

3条回答
其实,你不懂
2楼-- · 2019-01-01 14:13

You can do it w/o a loop:

df.2 <- melt(df)
gg <- ggplot(df.2, aes(x=topic, y=value))
gg <- gg + geom_violin(position="identity", aes(color=variable), alpha=0.3)
gg

enter image description here

查看更多
流年柔荑漫光年
3楼-- · 2019-01-01 14:28

The reason this is happening is due to ggplot's "lazy evaluation". This is a common problem when ggplot is used this way (making the layers separately in a loop, rather than having ggplot to it for you, as in @hrbrmstr's solution).

ggplot stores the arguments to aes(...) as expressions, and only evaluates them when the plot is rendered. So, in your loops, something like

aes(y = df[,p], colour = place[p-1])

gets stored as is, and evaluated when you render the plot, after the loop completes. At this point, p=3 so all the plots are rendered with p=3.

So the "right" way to do this is to use melt(...) in the reshape2 package so convert your data from wide to long format, and let ggplot manage the layers for you. I put "right" in quotes because in this particular case there is a subtlety. When calculating the distributions for the violins using the melted data frame, ggplot uses the grand total (for both Chicago and Miami) as the scale. If you want violins based on frequency scaled individually, you need to use loops (sadly).

The way around the lazy evaluation problem is to put any reference to the loop index in the data=... definition. This is not stored as an expression, the actual data is stored in the plot definition. So you could do this:

g <- ggplot(df,aes(x=topic))
for (p in 2:length(df)) {
  gg.data <- data.frame(topic=df$topic,value=df[,p],city=names(df)[p])
  g <- g + geom_violin(data=gg.data,aes(y=value, color=city))
}
g

which gives the same result as yours. Note that the index p does not show up in aes(...).


Update: A note about scale="width" (mentioned in a comment). This causes all the violins to have the same width (see below), which is not the same scaling as in OP's original code. IMO this is not a great way to visualize the data, as it suggests there is much more data in the Chicago group.

ggplot(gg) +geom_violin(aes(x=topic,y=value,color=variable),
                        alpha=0.3,position="identity",scale="width")

查看更多
千与千寻千般痛.
4楼-- · 2019-01-01 14:29

Just avoid using the for loop then. How about lapply instead:

g <- g + lapply(2:ncol(df), function(p) {
  geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3)
})

EDIT: This really doesn't work. I had p <- 2 in my workspace before running it and then it produced a graph with just the Chicago data. Anyway, the principle should still work (although melt is probably a better option):

g <- ggplot(df, aes(x=factor(topic)))
g + lapply(place, function(p) {
  geom_violin(aes_string(y = p), alpha = 0.3, color = which(p==place))
})
查看更多
登录 后发表回答