loop generate plots for variables in a data frame

2019-07-14 08:30发布

So I have a data frame, set up a bit like this:

Sample V1  V2 V3  Group1 Group2
bob    12  32  12  G1      G2
susan  43  23  54  G2      G2
mary   23  65  34  G1      G2

I am able to do a grouped boxplot of each variable (V1, V2, V3) individually, grouped by Group1 and Group2 variables, but my real dataset has WAY more variables, and will be tedious to code individually. Is there a way that I can loop the process, and automate plot generation and export? For loops are still a bit of an obscure topic for me.

Here is the code I use to generate an individual plot:

png(filename= "filename.jpg")
ggplot(aes(y=data$V1, x=data$Group1, fill=data$Group2), data=data) + geomboxplot()
dev.off()

Thanks!

1条回答
啃猪蹄的小仙女
2楼-- · 2019-07-14 09:15

Here are several approaches for you. I'm guessing there is a duplicate, but if you're just starting out it's not always easy to apply those answers to your data.

library(reshape2)
library(ggplot2)
###create some data
set.seed(100)
n = 500

dat <- data.frame(sample = sample(LETTERS[1:10],n,T),
                  V1 = sample(50,n,T),
                  V2 = sample(50,n,T),
                  V3 = sample(50,n,T),
                  Group1 = paste0("G",sample(3,n,T)),
                  Group2 = paste0("G",sample(5,n,T)))

approach 1: melt and facet

dat_m <- melt(dat,measure.vars = c("V1","V2","V3"))

p1 <- ggplot(dat_m, aes(x = Group1,y = value,fill = Group2))+
  geom_boxplot() + facet_wrap(~variable)
p1

enter image description here

As you can see, this is not feasible when you have too many grouping variables.

approach 2: different plots/images per variable, still using the long data. I have split the long data by variable, and created a plot for each chunk. The current code plots to the console; file-saving code is commented out.

lapply(split(dat_m, dat_m$variable), function(chunk){
  myfilename <- sprintf("plot_%s.png", unique(chunk$variable))

  p <- ggplot(chunk, aes(x = Group1,y = value,fill = Group2)) +
    geom_boxplot() + labs(title = myfilename)
  p
#   #png(filename = myfilename)
#   print(p)
#   dev.off()

})

And a third approach is to use the strings of columns you're interested in:

#vector of columns you want to plot
mycols <- c("V1","V2","V3")

#plotting for each column. Not that I've put the 'fixed' variable
#inside aes in the main call to ggplot, and the 'varying' variable
#inside aes_string in the call to boxplot

lapply(mycols, function(cc){
  myfilename <- sprintf("plot_%s.png",cc)
  p <- ggplot(dat, aes(x = Group1,fill = Group2)) +
    geom_boxplot(aes_string(y = cc)) + labs(title = myfilename)
  p
  #   #png(filename = myfilename)
  #   print(p)
  #   dev.off()
})
查看更多
登录 后发表回答