I have a data frame that looks like this:
#df
ID DRUG FED AUC0t Tmax Cmax
1 1 0 100 5 20
2 1 1 200 6 25
3 0 1 NA 2 30
4 0 0 150 6 65
Ans so on. I want to summarize some statistics on AUC, Tmax and Cmax by drug DRUG
and FED STATUS FED
. I use dplyr. For example: for the AUC:
CI90lo <- function(x) quantile(x, probs=0.05, na.rm=TRUE)
CI90hi <- function(x) quantile(x, probs=0.95, na.rm=TRUE)
summary <- df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
However, the output is not grouped by DRUG and FED. It gives only one line containing the statistics of all by not faceted on DRUG and FED.
Any idea why? and how can I make it do the right thing?
I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.
This is what happens with plyr loaded last.
library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
mean low high min max sd
1 150 105 195 100 200 50
Now remove plyr and try again and you get the grouped summary.
detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
Source: local data frame [4 x 8]
Groups: DRUG
DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN
A variant of aosmith's answer that might help some folks out. Direct R to call dplyr's functions directly. Good trick when one package interferes with another.
df %>%
dplyr::group_by(DRUG,FED) %>%
dplyr::summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
Or you could consider using data.table
library(data.table)
setDT(df) # set the data frame as data table
df[, list(mean = mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high = CI90hi(AUC0t),
min = as.double(min(AUC0t, na.rm=TRUE)),
max = as.double(max(AUC0t, na.rm=TRUE)),
sd = sd(AUC0t, na.rm=TRUE)),
by=list(DRUG, FED)]
# DRUG FED mean low high min max sd
# 1: 1 0 100 100 100 100 100 NA
# 2: 1 1 200 200 200 200 200 NA
# 3: 0 1 NaN NA NA Inf -Inf NA
# 4: 0 0 150 150 150 150 150 NA
# Warning messages:
# 1: In min(AUC0t, na.rm = TRUE) :
# no non-missing arguments to min; returning Inf
# 2: In max(AUC0t, na.rm = TRUE) :
# no non-missing arguments to max; returning -Inf
Try sqldf is best way and easy to learn for grouping the data.
Below is example to your need.all kinds of data sample grouping sqldf library is very helpful.
install.packages("sqldf")
library(sqldf)
dat1 <- sqldf("select x,y,
y/sum(y) as Z
from dat
group by x")