Evaluation Error when tidyverse is loaded after Hm

2019-03-25 14:31发布

问题:

I am using r 3.3.3, dplyr 0.7.4, and Hmisc 4.1-1. I noticed that the order I load packages effects whether or not a dplyr::summaries function wold work or not. I understand that loading packages in a different order would mask certain functions but I am using the package::function() syntax to avoid that issue. The exact issue revolves around labeled variables. I know that there has been issues in the past with tidyverse and variable labels but none seem to address why this particular situation is occurring.

First example that works - I load only Hmisc then dplyr and I am able to summaries the data-

#this works fine
library(Hmisc)
library(dplyr)

Hmisc::label(iris$Petal.Width) <- "Petal Width"

sumpct <- iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
            A50 =round(100*ecdf(Petal.Width)(.50),1),
            mns = mean(Petal.Width),
            lowermean = mean(Petal.Width)-sd(Petal.Width),
            lowermedian = median(Petal.Width) - sd(Petal.Width))

Second example below breaks. I start a new session and load tidyverse after Hmisc and still use the package::function() syntax but this throws the error :

Error in summarise_impl(.data, dots) : Evaluation error: x and labels must be same type.

Second example:

###restart session 
#this example does not work

library(Hmisc)
library(tidyverse)


Hmisc::label(iris$Petal.Width) <- "Petal Width"

sumpct <- iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
                   A50 =round(100*ecdf(Petal.Width)(.50),1),
                   mns = mean(Petal.Width),
                   lowermean = mean(Petal.Width)-sd(Petal.Width),
                   lowermedian = median(Petal.Width) - sd(Petal.Width))

However, the third example does work where I just restart the session and load tidyverse before Hmisc

Third example:

###switch order of loading packages and this works

library(tidyverse)
library(Hmisc)


Hmisc::label(iris$Petal.Width) <- "Petal Width"

sumpct <- iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
                   A50 =round(100*ecdf(Petal.Width)(.50),1),
                   mns = mean(Petal.Width),
                   lowermean = mean(Petal.Width)-sd(Petal.Width),
                   lowermedian = median(Petal.Width) - sd(Petal.Width)) 

So my question is why does the order in which I load packages matter when I am using the package::function() syntax specifically with respect to labeled variables and tidyverse?

Update: session info below for the error:

sessionInfo()

R version 3.3.3 (2017-03-06) Running under: Windows 7 x64 attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.2 forcats_0.3.0
stringr_1.3.0 dplyr_0.7.4 [5] purrr_0.2.4 readr_1.1.1
tidyr_0.8.0 tibble_1.4.2 [9] tidyverse_1.2.1 Hmisc_4.1-1
ggplot2_2.2.1 Formula_1.2-2 [13] survival_2.41-3 lattice_0.20-35

loaded via a namespace (and not attached): [1] reshape2_1.4.3
splines_3.3.3 haven_1.1.1 [4] colorspace_1.3-2
htmltools_0.3.6 base64enc_0.1-3 [7] rlang_0.2.0
pillar_1.2.1 foreign_0.8-69 [10] glue_1.2.0
RColorBrewer_1.1-2 readxl_1.0.0 [13] modelr_0.1.1
plyr_1.8.4 bindr_0.1.1 [16] cellranger_1.1.0
munsell_0.4.3 gtable_0.2.0 [19] rvest_0.3.2
htmlwidgets_1.0 psych_1.7.8 [22] latticeExtra_0.6-28 knitr_1.20 parallel_3.3.3 [25] htmlTable_1.11.2
broom_0.4.3 Rcpp_0.12.16 [28] acepack_1.4.1
scales_0.5.0 backports_1.1.2 [31] checkmate_1.8.5
jsonlite_1.5 gridExtra_2.3 [34] mnormt_1.5-5
hms_0.4.2 digest_0.6.15 [37] stringi_1.1.7
grid_3.3.3 cli_1.0.0 [40] tools_3.3.3
magrittr_1.5 lazyeval_0.2.1 [43] cluster_2.0.6
crayon_1.3.4 pkgconfig_2.0.1 [46] Matrix_1.2-12
xml2_1.2.0 data.table_1.10.4-3 [49] lubridate_1.7.3
assertthat_0.2.0 httr_1.3.1 [52] rstudioapi_0.7
R6_2.2.2 rpart_4.1-13 [55] nnet_7.3-12
nlme_3.1-131.1

回答1:

UPDATE: As of haven version 2.0.0 this issue has been resolved, as the haven "labelled" class was renamed to "haven_labelled" to avoid conflicts with Hmisc.


tl;dr: Order matters.

For a more detailed answer, let's first reproduce the error:

library(Hmisc)
#> Loading required package: lattice
#> Loading required package: survival
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
library(tidyverse)
#> Warning: package 'forcats' was built under R version 3.4.4

After removing elements piece by piece from the original summarise example, I managed to reduce reproducing the error to just these lines of code:

Hmisc::label(iris$Petal.Width) <- "Petal Width"
head(iris)
#> Error: `x` and `labels` must be same type

We can have a look at the traceback to see if we can locate a function that could be causing the error:

traceback()
#> 8: stop("`x` and `labels` must be same type", call. = FALSE)
#> 7: labelled(NextMethod(), attr(x, "labels"))
#> 6: `[.labelled`(xj, i)
#> 5: xj[i]
#> 4: `[.data.frame`(x, seq_len(n), , drop = FALSE)
#> 3: x[seq_len(n), , drop = FALSE]
#> 2: head.data.frame(iris)
#> 1: head(iris)

The [.labelled call looks suspicious. Why is it even called?

lapply(iris, class)
#> $Sepal.Length
#> [1] "numeric"
#> 
#> $Sepal.Width
#> [1] "numeric"
#> 
#> $Petal.Length
#> [1] "numeric"
#> 
#> $Petal.Width
#> [1] "labelled" "numeric" 
#> 
#> $Species
#> [1] "factor"

Ah, setting a label for Petal.Width with Hmisc::label also added the S3 class. We can inspect where the method is defined with getAnywhere:

getAnywhere("[.labelled")
#> 2 differing objects matching '[.labelled' were found
#> in the following places
#>   registered S3 method for [ from namespace haven
#>   namespace:Hmisc
#>   namespace:haven
#> Use [] to view one of them

Indeed, both haven and Hmisc define the method. And since haven is loaded after Hmisc, its definition is found first, and thus gets used:

getAnywhere("[.labelled")[1]
#> function (x, ...) 
#> {
#>     labelled(NextMethod(), attr(x, "labels"))
#> }
#> <environment: namespace:haven>

haven expects labelled objects to have a labels attribute, which Hmisc::label doesn't create:

attr(iris$Petal.Width, "labels")
#> NULL

And that's where the error comes from.


But wait: why is haven even loaded? It's not attached with library(tidyverse). Turns out, that haven is listed as an imported package in tidyverse, which causes it to be loaded when the package is attached (see e.g. here). And loading a package, among other things, registers its S3 methods: which is where the conflict comes from.

As it is, if you want to use both Hmisc and tidyverse, order matters. To address the issue further would likely require source level changes in the packages' use of the labelled S3 class.

Created on 2018-03-21 by the reprex package (v0.2.0).