Pipe '.' dot causes trouble in glm call

2019-08-26 09:25发布

问题:

dplyr's pipe does not pass the name of objects passed down the chain. This is well known. However, it leads to unexpected complications after you fit a glm model. Functions using glm objects expect the call to contain the correct name of the object containing data.

    #sample data
    p_load(ISLR)
    mydata = ISLR::Default

    #fit glm
    fitted=
    mydata %>% 
      select(default, income) %>%
      glm(default~.,data=.,family=binomial) 

    #dot in call
    fitted$call

    #pscl's pR2 pseudo r2 function does not work
    p_load(pscl)
    pR2(fitted)

How to fix this behavior? I want to keep using pipes, including the select function. I also want to obtained a glm objected in fitted than can be used with pR2 or other function that need a working call.

One can re-arrange the data-preprocessing into the glm call, but it takes away the elegance of the code.

fitted=
  glm(default~.,
      data=mydata %>%
        select(default, income),
      family=binomial) 

回答1:

1) Since you are explicitly writing out all the variables in the select anyways you can just as easily write them out in the formula instead and get rid of the select -- you can keep the select if you like but it does seem pointless if the variables are already explicitly given in the formula. Then this works:

library(dplyr)
library(magrittr)
library(pscl)
library(ISLR)

fitted <- Default %$% glm(default ~ income, family=binomial)

fitted %>% pR2

2) Another possibilty is to invert it so that instead of putting glm inside the pipe put the pipe inside glm:

fitted <- 
  glm(default ~ ., data = Default %>% select(income, default), family = binomial)

fitted %>% pR2

3) A third approach is to generate the formula argument of glm rather than the data argument.

fitted <- Default %>% 
  select(starts_with("inc")) %>% 
  names %>% 
  reformulate("default") %>%
  glm(data = Default, family = binomial)

fitted %>% pR2

Replace the glm line with this if it is important that the Call: line in the output look nice.

{ do.call("glm", list(., data = quote(Default), family = quote(binomial))) }

or using purrr:

{ invoke("glm", list(., data = expr(Default), family = expr(binomial))) }


标签: r dplyr