In R, plotting wide form data with ggplot2 or base

2019-07-20 19:53发布

I have a data frame that looks like this (though thousands of times larger).

df<-data.frame(sample(1:100,10,replace=F),sample(1:100,10,replace=F),runif(10,0,1),runif(10,0,1),runif(10,0,1), rep(c("none","summer","winter","sping","allyear"),2))
names(df)<-c("Mother","ID","Wavelength1","Wavelength2","Wavelength3","WaterTreatment")
df
   Mother ID Wavelength1 Wavelength2 Wavelength3 WaterTreatment
1       2 34   0.9143670  0.03077356  0.82859497           none
2      24 75   0.6173382  0.05958151  0.66552338         summer
3      62 77   0.2655572  0.63731302  0.30267893         winter
4      30 98   0.9823510  0.45690437  0.40818031          sping
5       4 11   0.7503750  0.93737900  0.24909228        allyear
6      55 76   0.6451885  0.60138475  0.86044856           none
7      97 21   0.5711019  0.99732068  0.04706894         summer
8      87 14   0.7699293  0.81617911  0.18940531         winter
9      92 30   0.5855559  0.70152698  0.73375917          sping
10     93 44   0.1040359  0.85259166  0.37882469        allyear

I want to plot wavelength values on the y axis, and wavelength on the x. I have two ways of doing this:

First method which works, but uses base plot and requires more code than should be necessary:

colors=c("red","blue","green","orange","yellow")
plot(0,0,xlim=c(1,3),ylim=c(0,1),type="l")
for (i in 1:10) {
  if      (df$WaterTreatment[i]=="none"){
    a<-1
  } else if (df$WaterTreatment[i]=="allyear") {
    a<-2
  }else if (df$WaterTreatment[i]=="summer") {
    a<-3
  }else if (df$WaterTreatment[i]=="winter") {
    a<-4
  }else if (df$WaterTreatment[i]=="spring") {
    a<-5
  }
  lines(seq(1,3,1),df[i,3:5],type="l",col=colors[a])
}

Second method: I attempt to melt the data to put it in long form, then use ggplot2. The plot it produces is not correct because there is a line for each water treatment, rather than a line for each "Mother" "ID" (the unique identifier, what were the rows in the original data frame).

require(reshape2)
require(data.table)
df_m<-melt(df,id.var=c("Mother","ID","WaterTreatment"))
df_m$variable<-as.numeric(df_m$variable)  #sets wavelengths to numeric
qplot(x=df_m$variable,y=df_m$value,data=df_m,color=df_m$WaterTreatment,geom = 'line')

There is probably something simple I'm missing about ggplot2 that would fix the plotting of the lines. I'm a newbie with ggplot, but am working to get more familiar with it and would like to use it in this application.

But more broadly, is there an efficient way to plot this type of wide form data in ggplot2? The time it takes to transform/melt the data is enormous and I'm wondering if it is worth it, or if there is some kind of work around that can eliminate the redundant cells created when melting.

Thanks for your help, if you need more clarity on this question please let me know and I can edit.

标签: r ggplot2 melt
2条回答
淡お忘
2楼-- · 2019-07-20 20:37

It looks like you want a separate line for each ID, but you want the lines colored based on the value of WaterTreatment. If so, you can do it like this in ggplot:

ggplot(df_m, aes(x=variable, y=value, group=ID, colour=WaterTreatment)) + 
       geom_line() + geom_point()

You can also use faceting to make it easier to see the different levels of WaterTreatment

ggplot(df_m, aes(x=variable, y=value, group=ID, colour=WaterTreatment)) + 
    geom_line() + geom_point() + 
    facet_grid(WaterTreatment ~ .)

To answer your general question: ggplot is set up to work most easily and powerfully with a "long" (i.e., melted) data frame. I guess you could work with a "wide" data frame and plot separate layers for each combination of factors you want to plot. But that would be a lot of extra work compared to a single melt command to get your data into the right format.

查看更多
一夜七次
3楼-- · 2019-07-20 20:47

I'd like to point out that you are basically re-inventing an existing base plotting function, namely matplot. This could replace your plot and for-loop:

matplot(1:3, t( df[ ,3:5] ), type="l",col=colors[ as.numeric(df$WaterTreatment)] )

With that in mind you might want to search SO for: [r] matplot ggplot2, as I did, and see if this see if this or any of the other hits are effective.

查看更多
登录 后发表回答