I have a dataset which looks like this
VegType 87MIN 87MAX 87Q25 87Q50 87Q75 96MIN 96MAX 96Q25 96Q50 96Q75 00MIN 00MAX 00Q25 00Q50 00Q75
1 0.02 0.32 0.11 0.12 0.13 0.02 0.26 0.08 0.09 0.10 0.02 0.28 0.10 0.11 0.12
2 0.02 0.45 0.12 0.13 0.13 0.02 0.20 0.09 0.10 0.11 0.02 0.26 0.11 0.12 0.12
3 0.02 0.29 0.13 0.14 0.14 0.02 0.27 0.11 0.11 0.12 0.02 0.26 0.12 0.13 0.13
4 0.02 0.41 0.13 0.13 0.14 0.02 0.58 0.10 0.11 0.12 0.02 0.34 0.12 0.13 0.13
5 0.02 0.42 0.12 0.13 0.14 0.02 0.46 0.10 0.11 0.11 0.02 0.28 0.12 0.12 0.13
6 0.02 0.32 0.13 0.14 0.14 0.02 0.52 0.12 0.12 0.13 0.02 0.29 0.13 0.14 0.14
7 0.02 0.55 0.12 0.13 0.14 0.02 0.24 0.10 0.11 0.11 0.02 0.37 0.12 0.12 0.13
8 0.02 0.55 0.12 0.13 0.14 0.02 0.19 0.10 0.11 0.12 0.02 0.22 0.11 0.12 0.13
In reality I have 26 variables and 5 years (87,96 and 00 in the column names are years). In an ideal world I would like to have a lattice-like graph with 26 plots, one per variable, with each plot containing 5 boxes, i.e. one per year. I understand that it is not possible to do this is lattice because lattice won't accept predefined statistics. Is there a fairly unpainful way to do this in R with predefined stats? I have used bxp
for simple boxplots plotting all the variables for one year in a single plot e.g.
Yr01 = read.csv('dat.csv',header=T)
bxp(list(stats=dat01, n=rep(26, ncol(dat01))),ylim=c(0.07,0.2))
but I don't know how to go from there to what I need.
This can be done, at least using ggplot2
, but you'll have to reshape
your data quite a bit. And you really have to have a data where the quantiles actually make sense!! Your quantile values are all messed up! For example, Var1
has 01Max = 0.26
and 01Q75 = .67
First, I'll recreate a valid data:
n <- c("01Min", "01Max", "01Med", "01Q25", "01Q75", "02Min",
"02Max", "02Med", "02Q25", "02Q75")
v1 <- c(0.03, 0.76, 0.41, 0.13, 0.67, 0.10, 0.43, 0.27, 0.2, 0.33)
v2 <- c(0.03, 0.28, 0.14, 0.08, 0.20, 0.02, 0.77, 0.13, 0.06, 0.44)
df <- data.frame(v1=v1, v2=v2)
df <- as.data.frame(t(df))
names(df) <- n
df <- cbind(var=c("v1","v2"), df)
> df
# var 01Min 01Max 01Med 01Q25 01Q75 02Min 02Max 02Med 02Q25 02Q75
# v1 v1 0.03 0.76 0.41 0.13 0.67 0.10 0.43 0.27 0.20 0.33
# v2 v2 0.03 0.28 0.14 0.08 0.20 0.02 0.77 0.13 0.06 0.44
Next, we'll reshape the data:
df.m <- melt(df, id="var")
# look for a bunch of numbers from the start of the string and capture it
# in the first variable: () captures the pattern. And replace it with the
# captured pattern with the variable "\\1"
df.m$year <- gsub("^([0-9]+)(.*$)", "\\1", df.m$variable)
# the same but instead refer to the captured pattern in the second
# paranthesis using "\\2"
df.m$quan <- gsub("^([0-9]+)(.*)$", "\\2", df.m$variable)
df.f <- dcast(df.m, var+year ~ quan, value.var="value")
To get to this format:
> df.f
# var year Max Med Min Q25 Q75
# 1 v1 01 0.76 0.41 0.03 0.13 0.67
# 2 v1 02 0.43 0.27 0.10 0.20 0.33
# 3 v2 01 0.28 0.14 0.03 0.08 0.20
# 4 v2 02 0.77 0.13 0.02 0.06 0.44
Now, we can plot by directly providing the quantile values to corresponding parameters using the corresponding column names
as follows:
p <- ggplot(df.f, aes(x=var, ymin=`Min`, lower=`Q25`, middle=`Med`,
upper=`Q75`, ymax=`Max`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
# if you want facetting:
p + facet_wrap( ~ var, scales="free")
You can now accomplish your task of plotting all years
for each var
in a separate plot using a lapply
with this code and subsetting
as follows:
lapply(levels(df.f$var), function(x) {
p <- ggplot(df.f[df.f$var == x, ],
aes(x=var, ymin=`Min`, lower=`Q25`,
middle=`Med`, upper=`Q75`, ymax=`Max`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
ggsave(paste0(x, ".pdf"), last_plot())
Edit: Your data is different from the earlier data you provided in some aspects. So, here's the version of the code for your new data:
# change var to VegType everywhere
df.m <- melt(df, id="VegType")
df.m$year <- gsub("^X([0-9]+)(.*$)", "\\1", df.m$variable) # pattern has a X
df.m$quan <- gsub("^X([0-9]+)(.*)$", "\\2", df.m$variable) # pattern has a X
df.f <- dcast(df.m, VegType+year ~ quan, value.var="value")
df.f$VegType <- factor(df.f$VegType) # convert integer to factor
p <- ggplot(df.f, aes(x=VegType, ymin=`MIN`, lower=`Q25`, middle=`Q50`,
upper=`Q75`, ymax=`MAX`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
You can facet/write as separate plots using same code as before.