I have measurements for different treatments of an experiment that ran over several rounds, like so:
set.seed(1)
df <- data.frame(treatment = rep(c('baseline', 'treatment 1', 'treatment 2'),
times=5),
round = rep(1:5, each=3),
measurement1 = rep(1:5, each=3) + rnorm(15),
measurement2 = rep(1:5, each=3) + rnorm(15))
df
# treatment round measurement1 measurement2
# 1 baseline 1 0.3735462 0.9550664
# 2 treatment 1 1 1.1836433 0.9838097
# 3 treatment 2 1 0.1643714 1.9438362
# 4 baseline 2 3.5952808 2.8212212
# 5 treatment 1 2 2.3295078 2.5939013
# 6 treatment 2 2 1.1795316 2.9189774
# 7 baseline 3 3.4874291 3.7821363
# 8 treatment 1 3 3.7383247 3.0745650
# 9 treatment 2 3 3.5757814 1.0106483
# 10 baseline 4 3.6946116 4.6198257
# 11 treatment 1 4 5.5117812 3.9438713
# 12 treatment 2 4 4.3898432 3.8442045
# 13 baseline 5 4.3787594 3.5292476
# 14 treatment 1 5 2.7853001 4.5218499
# 15 treatment 2 5 6.1249309 5.4179416
What I would like is a data.frame
that contains the differences in the two measurements between each of the treatments and the baseline for each round. That is, grouped by round
, I would like the respective measurement in the baseline
treatment
subtracted from each of the two measurements.
I'd prefer a dplyr
solution if one exists but will accept anything that borders on elegant.
You can use mutate_each
for that:
mydf %>%
group_by(round) %>%
mutate_each(funs(. - .[treatment=="baseline"]), -treatment) %>%
filter(treatment!="baseline")
which gives:
Source: local data frame [10 x 4]
Groups: round [5]
treatment round measurement1 measurement2
(fctr) (int) (dbl) (dbl)
1 treatment1 1 1.558820 -0.6584485
2 treatment2 1 -0.068677 1.3364462
3 treatment1 2 1.769312 -0.2732490
4 treatment2 2 0.801357 -1.4852449
5 treatment1 3 -1.064394 -1.1513703
6 treatment2 3 2.433222 -0.7939903
7 treatment1 4 0.448744 0.1394982
8 treatment2 4 -1.066922 -1.1410085
9 treatment1 5 1.182761 -0.8311095
10 treatment2 5 0.138005 0.2622119
If you want to add the differences to your dataframe (just as @akrun did in his dplyr / tidyr alternative), you could also do:
mydf %>%
group_by(round) %>%
mutate(diff1 = measurement1 - measurement1[treatment=="baseline"],
diff2 = measurement2 - measurement2[treatment=="baseline"]) %>%
filter(treatment!="baseline")
which gives:
Source: local data table [10 x 6]
treatment round measurement1 measurement2 diff1 diff2
(fctr) (int) (dbl) (dbl) (dbl) (dbl)
1 treatment1 1 2.630392 -0.104258 1.558820 -0.6584485
2 treatment2 1 1.002895 1.890637 -0.068677 1.3364462
3 treatment1 2 3.822473 3.147443 1.769312 -0.2732490
4 treatment2 2 2.854518 1.935447 0.801357 -1.4852449
5 treatment1 3 1.520553 3.291122 -1.064394 -1.1513703
6 treatment2 3 5.018169 3.648502 2.433222 -0.7939903
7 treatment1 4 4.956380 4.544908 0.448744 0.1394982
8 treatment2 4 3.440714 3.264401 -1.066922 -1.1410085
9 treatment1 5 4.672056 5.082310 1.182761 -0.8311095
10 treatment2 5 3.627300 6.175631 0.138005 0.2622119
We can use data.table
library(data.table)
setDT(df)[order(round,treatment), tail(.SD,2)- head(.SD,1)[rep(1,2)],
round , .SDcols=3:4]
Or another option with data.table
is
setDT(df)[, lapply(.SD[, grep("^measurement", names(.SD)),
with =FALSE], function(x) x[treatment!="baseline"]-
x[treatment=="baseline"]) , round]
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df, var, val, measurement1:measurement2) %>%
spread(treatment, val) %>%
mutate(diff1 = `treatment 1` - baseline,
diff2 = `treatment 2` - baseline)