I'm trying to use dplyr
to calculate grouped correlations, but something is clearly wrong since the code below works only in the console:
require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))
gp = group_by(xx, group)
summarize(gp, cor(a, b))
group cor(a, b)
1 1 -0.02073084
2 2 0.12803353
3 3 0.06236264
4 4 -0.06181904
If i use the same code in RStudio, i get:
cor(a, b)
1 0.02739193
What's happening?
What you experience is related to having both plyr
and dplyr
loaded at the same time. Since both packages have summarize
functions, there can be conflicts if you don't specify explicitly which package you want to use. For the example data, this means:
require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))
Using dplyr
as intended:
gp = group_by(xx, group)
dplyr::summarize(gp, cor(a, b))
#Source: local data frame [4 x 2]
#
# group cor(a, b)
#1 1 -0.02073084
#2 2 0.12803353
#3 3 0.06236264
#4 4 -0.06181904
Or using plyr
gp = group_by(xx, group)
plyr::summarize(gp, cor(a, b))
# cor(a, b)
#1 0.02739193
So either avoid loading both packages or specify the package by using package::function.