I have a data frame that looks like this (though thousands of times larger).
df<-data.frame(sample(1:100,10,replace=F),sample(1:100,10,replace=F),runif(10,0,1),runif(10,0,1),runif(10,0,1), rep(c("none","summer","winter","sping","allyear"),2))
names(df)<-c("Mother","ID","Wavelength1","Wavelength2","Wavelength3","WaterTreatment")
df
Mother ID Wavelength1 Wavelength2 Wavelength3 WaterTreatment
1 2 34 0.9143670 0.03077356 0.82859497 none
2 24 75 0.6173382 0.05958151 0.66552338 summer
3 62 77 0.2655572 0.63731302 0.30267893 winter
4 30 98 0.9823510 0.45690437 0.40818031 sping
5 4 11 0.7503750 0.93737900 0.24909228 allyear
6 55 76 0.6451885 0.60138475 0.86044856 none
7 97 21 0.5711019 0.99732068 0.04706894 summer
8 87 14 0.7699293 0.81617911 0.18940531 winter
9 92 30 0.5855559 0.70152698 0.73375917 sping
10 93 44 0.1040359 0.85259166 0.37882469 allyear
I want to plot wavelength values on the y axis, and wavelength on the x. I have two ways of doing this:
First method which works, but uses base plot and requires more code than should be necessary:
colors=c("red","blue","green","orange","yellow")
plot(0,0,xlim=c(1,3),ylim=c(0,1),type="l")
for (i in 1:10) {
if (df$WaterTreatment[i]=="none"){
a<-1
} else if (df$WaterTreatment[i]=="allyear") {
a<-2
}else if (df$WaterTreatment[i]=="summer") {
a<-3
}else if (df$WaterTreatment[i]=="winter") {
a<-4
}else if (df$WaterTreatment[i]=="spring") {
a<-5
}
lines(seq(1,3,1),df[i,3:5],type="l",col=colors[a])
}
Second method: I attempt to melt the data to put it in long form, then use ggplot2. The plot it produces is not correct because there is a line for each water treatment, rather than a line for each "Mother" "ID" (the unique identifier, what were the rows in the original data frame).
require(reshape2)
require(data.table)
df_m<-melt(df,id.var=c("Mother","ID","WaterTreatment"))
df_m$variable<-as.numeric(df_m$variable) #sets wavelengths to numeric
qplot(x=df_m$variable,y=df_m$value,data=df_m,color=df_m$WaterTreatment,geom = 'line')
There is probably something simple I'm missing about ggplot2 that would fix the plotting of the lines. I'm a newbie with ggplot, but am working to get more familiar with it and would like to use it in this application.
But more broadly, is there an efficient way to plot this type of wide form data in ggplot2? The time it takes to transform/melt the data is enormous and I'm wondering if it is worth it, or if there is some kind of work around that can eliminate the redundant cells created when melting.
Thanks for your help, if you need more clarity on this question please let me know and I can edit.
It looks like you want a separate line for each ID, but you want the lines colored based on the value of WaterTreatment. If so, you can do it like this in ggplot:
You can also use faceting to make it easier to see the different levels of WaterTreatment
To answer your general question: ggplot is set up to work most easily and powerfully with a "long" (i.e., melted) data frame. I guess you could work with a "wide" data frame and plot separate layers for each combination of factors you want to plot. But that would be a lot of extra work compared to a single
melt
command to get your data into the right format.I'd like to point out that you are basically re-inventing an existing base plotting function, namely
matplot
. This could replace your plot and for-loop:With that in mind you might want to search SO for:
[r] matplot ggplot2
, as I did, and see if this see if this or any of the other hits are effective.