I am trying to understand dplyr. I am splitting values in my data frame by group, bins and by sign, and I am trying to get a mean value for each group/bin/sign combination. I would like to output a data frame with these counts per each group/bin/sign combination, and the total numbers per each group. I think I have it but sometimes I get different values in base R compared to the output of ddplyr. Am I doing this correctly? It is also very contorted...is there a more direct way?
library(ggplot2)
df <- data.frame(
id = sample(LETTERS[1:3], 100, replace=TRUE),
tobin = rnorm(1000),
value = rnorm(1000)
)
df$tobin[sample(nrow(df), 10)]=0
df$bin = cut_interval(abs(df$tobin), length=1)
df$sign = ifelse(df$tobin==0, "NULL", ifelse(df$tobin>0, "-", "+"))
# Find mean of value by group, bin, and sign using dplyr
library(dplyr)
res <- df %>% group_by(id, bin, sign) %>%
summarise(Num = length(bin), value=mean(value,na.rm=TRUE))
res %>% group_by(id) %>%
summarise(total= sum(Num))
res=data.frame(res)
total=data.frame(total)
res$total = total[match(res$id, total$id),"total"]
res[res$id=="A" & res$bin=="[0,1]" & res$sign=="NULL",]
# Check in base R if mean by group, bin, and sign is correct # Sometimes not?
groupA = df[df$id=="A" & df$bin=="[0,1]" & df$sign=="NULL",]
mean(groupA$value, na.rm=T)
I am going crazy because it doesn't work on my data, and this command just repeats the mean of the whole dataset:
ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))
Where mean is equal to mean(value,na.rm=TRUE), completely ignoring the grouping...All the groups are factors, and the value is numeric...
This however works:
with(df, aggregate(df$value, by = list(id, bin, sign), FUN = function(x) c(mean(x))))
Please help me..