R sapply vs apply vs lapply + as.data.frame

2019-08-30 11:49发布

问题:

I'm working with some Date columns and trying to cleanse for obviously incorrect dates. I've written a function using the safe.ifelse function mentioned here.

Here's my toy data set:

df1 <- data.frame(id = 1:25
    , month1 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month'  )
    , month2 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month'  )
    , month3 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month'  )
    , letter1 = letters[1:25]
    )

This works fine for a single column:

df1$month1 <- safe.ifelse(df1$month1 > as.Date('2013-10-01'), as.Date('2013-10-01'), df1$month1)

As I have multiple columns I'd like to use a function and apply to take care of all Date columns at once:

capDate <- function(x){
today1 <- Sys.Date()
    safe.ifelse <- function(cond, yes, no){ class.y <- class(yes)
                                  X <- ifelse(cond,yes,no)
                                  class(X) <-class.y; return(X)}

    x <- safe.ifelse(as.Date(x) > as.Date(today1), as.Date(today1), as.Date(x))
 }

However when I try to use sapply()

df1[,dateCols1] <- sapply(df1[,dateCols1], capDate)

or apply()

df1[,dateCols1] <- apply(df1[,dateCols1],2, capDate))

the Date columns lose their Date formatting. The only way I've found to get around this is by using lapply() and then converting back to a data.frame(). Can anyone explain this?

df1[,dateCols1] <- as.data.frame(lapply(df1[,dateCols1], capDate))

回答1:

Both sapply and apply convert the result to matrices. as.data.frame(lapply(...)) is a safe way to loop over data frame columns.

as.data.frame(
  lapply(
    df1, 
    function(column) 
    {
      if(inherits(column, "Date")) 
      {
        pmin(column, Sys.Date())
      } else column
    }
  )
)

It's a little cleaner with ddply from plyr.

library(plyr)
ddply(
  df1, 
  .(id), 
  colwise(
    function(column) 
    {
      if(inherits(column, "Date")) 
      { 
        pmin(column, Sys.Date()) 
      } else column
    }
  )
)