I am using r 3.3.3, dplyr 0.7.4, and Hmisc 4.1-1. I noticed that the order I load packages effects whether or not a dplyr::summaries function wold work or not. I understand that loading packages in a different order would mask certain functions but I am using the package::function() syntax to avoid that issue. The exact issue revolves around labeled variables. I know that there has been issues in the past with tidyverse and variable labels but none seem to address why this particular situation is occurring.
First example that works - I load only Hmisc then dplyr and I am able to summaries the data-
#this works fine
library(Hmisc)
library(dplyr)
Hmisc::label(iris$Petal.Width) <- "Petal Width"
sumpct <- iris %>%
dplyr::group_by(Species) %>%
dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
A50 =round(100*ecdf(Petal.Width)(.50),1),
mns = mean(Petal.Width),
lowermean = mean(Petal.Width)-sd(Petal.Width),
lowermedian = median(Petal.Width) - sd(Petal.Width))
Second example below breaks. I start a new session and load tidyverse after Hmisc and still use the package::function() syntax but this throws the error :
Error in summarise_impl(.data, dots) : Evaluation error:
x
andlabels
must be same type.
Second example:
###restart session
#this example does not work
library(Hmisc)
library(tidyverse)
Hmisc::label(iris$Petal.Width) <- "Petal Width"
sumpct <- iris %>%
dplyr::group_by(Species) %>%
dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
A50 =round(100*ecdf(Petal.Width)(.50),1),
mns = mean(Petal.Width),
lowermean = mean(Petal.Width)-sd(Petal.Width),
lowermedian = median(Petal.Width) - sd(Petal.Width))
However, the third example does work where I just restart the session and load tidyverse before Hmisc
Third example:
###switch order of loading packages and this works
library(tidyverse)
library(Hmisc)
Hmisc::label(iris$Petal.Width) <- "Petal Width"
sumpct <- iris %>%
dplyr::group_by(Species) %>%
dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
A50 =round(100*ecdf(Petal.Width)(.50),1),
mns = mean(Petal.Width),
lowermean = mean(Petal.Width)-sd(Petal.Width),
lowermedian = median(Petal.Width) - sd(Petal.Width))
So my question is why does the order in which I load packages matter when I am using the package::function() syntax specifically with respect to labeled variables and tidyverse?
Update: session info below for the error:
sessionInfo()
R version 3.3.3 (2017-03-06) Running under: Windows 7 x64 attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] bindrcpp_0.2 forcats_0.3.0
stringr_1.3.0 dplyr_0.7.4 [5] purrr_0.2.4 readr_1.1.1
tidyr_0.8.0 tibble_1.4.2 [9] tidyverse_1.2.1 Hmisc_4.1-1
ggplot2_2.2.1 Formula_1.2-2 [13] survival_2.41-3 lattice_0.20-35loaded via a namespace (and not attached): [1] reshape2_1.4.3
splines_3.3.3 haven_1.1.1 [4] colorspace_1.3-2
htmltools_0.3.6 base64enc_0.1-3 [7] rlang_0.2.0
pillar_1.2.1 foreign_0.8-69 [10] glue_1.2.0
RColorBrewer_1.1-2 readxl_1.0.0 [13] modelr_0.1.1
plyr_1.8.4 bindr_0.1.1 [16] cellranger_1.1.0
munsell_0.4.3 gtable_0.2.0 [19] rvest_0.3.2
htmlwidgets_1.0 psych_1.7.8 [22] latticeExtra_0.6-28 knitr_1.20 parallel_3.3.3 [25] htmlTable_1.11.2
broom_0.4.3 Rcpp_0.12.16 [28] acepack_1.4.1
scales_0.5.0 backports_1.1.2 [31] checkmate_1.8.5
jsonlite_1.5 gridExtra_2.3 [34] mnormt_1.5-5
hms_0.4.2 digest_0.6.15 [37] stringi_1.1.7
grid_3.3.3 cli_1.0.0 [40] tools_3.3.3
magrittr_1.5 lazyeval_0.2.1 [43] cluster_2.0.6
crayon_1.3.4 pkgconfig_2.0.1 [46] Matrix_1.2-12
xml2_1.2.0 data.table_1.10.4-3 [49] lubridate_1.7.3
assertthat_0.2.0 httr_1.3.1 [52] rstudioapi_0.7
R6_2.2.2 rpart_4.1-13 [55] nnet_7.3-12
nlme_3.1-131.1
UPDATE: As of haven version 2.0.0 this issue has been resolved, as the haven "labelled" class was renamed to
"haven_labelled"
to avoid conflicts with Hmisc.tl;dr: Order matters.
For a more detailed answer, let's first reproduce the error:
After removing elements piece by piece from the original
summarise
example, I managed to reduce reproducing the error to just these lines of code:We can have a look at the traceback to see if we can locate a function that could be causing the error:
The
[.labelled
call looks suspicious. Why is it even called?Ah, setting a label for
Petal.Width
withHmisc::label
also added the S3 class. We can inspect where the method is defined withgetAnywhere
:Indeed, both
haven
andHmisc
define the method. And sincehaven
is loaded afterHmisc
, its definition is found first, and thus gets used:haven
expectslabelled
objects to have alabels
attribute, whichHmisc::label
doesn't create:And that's where the error comes from.
But wait: why is
haven
even loaded? It's not attached withlibrary(tidyverse)
. Turns out, thathaven
is listed as an imported package intidyverse
, which causes it to be loaded when the package is attached (see e.g. here). And loading a package, among other things, registers its S3 methods: which is where the conflict comes from.As it is, if you want to use both
Hmisc
andtidyverse
, order matters. To address the issue further would likely require source level changes in the packages' use of thelabelled
S3 class.Created on 2018-03-21 by the reprex package (v0.2.0).