What is the difference between as.tibble(), as_dat

2019-03-25 03:38发布

问题:

I remember reading somewhere that as.tibble() is an alias for as_data_frame(), but I don't know what exactly an alias is in programming terminology. Is it similar to a wrapper?

So I guess my question probably comes down to the difference in possible usages between tbl_df() and as_data_frame(): what are the differences between them, if any?

More specifically, given a (non-tibble) data frame df, I often turn it into a tibble by using:

df <- tbl_df(df)

Wouldn't

df <- as_data_frame(df)

do the same thing? If so, are there other cases where the two functions tbl_df() and as_data_frame() can not be used interchangeably to get the same result?

The R documentation says that

tbl_df() forwards the argument to as_data_frame()

does that mean that tbl_df() is a wrapper or alias for as_data_frame()? R documentation doesn't seem to say anything about as.tibble() and I forgot where I read that it was an alias for as_data_frame(). Also, apparently as_tibble() is another alias for as_data_frame().

If these four functions really are all the same function, what is the sense in giving one function four different names? Isn't that more confusing than helpful?

回答1:

To answer your question of "whether it is confusing", I think so :) .

as.tibble and as_tibble are the same; both simply call the S3 method as_tibble:

> as.tibble
function (x, ...) 
{
    UseMethod("as_tibble")
}
<environment: namespace:tibble>

as_data_frame and tbl_df are not exactly the same; tbl_df calls as_data_frame:

> tbl_df
function (data) 
{
    as_data_frame(data)
}
<environment: namespace:dplyr>

Note tbl_df is in dplyr while as_data_frame is in the tibble package:

> as_data_frame
function (x, ...) 
{
    UseMethod("as_data_frame")
}
<environment: namespace:tibble>

but of course it calls the same function, so they are "the same", or aliases as you say.

Now, we can look at the differences between the generic methods as_tibble and as_data_frame. First, we look at the methods of each:

> methods(as_tibble)
[1] as_tibble.data.frame* as_tibble.default*    as_tibble.list* as_tibble.matrix*     as_tibble.NULL*      
[6] as_tibble.poly*       as_tibble.table*      as_tibble.tbl_df* as_tibble.ts*        
see '?methods' for accessing help and source code
> methods(as_data_frame)
[1] as_data_frame.data.frame* as_data_frame.default*  as_data_frame.grouped_df* as_data_frame.list*      
[5] as_data_frame.matrix*     as_data_frame.NULL*       as_data_frame.table*      as_data_frame.tbl_cube*  
[9] as_data_frame.tbl_df*    
see '?methods' for accessing help and source code

If you check out the code for as_tibble, you can see that the definitions for many of the as_data_frame methods as well. as_tibble defines two additional methods which aren't defined for as_data_frame, as_tibble.ts and as_tibble.poly. I'm not really sure why they couldn't be also defined for as_data_frame.

as_data_frame has two additional methods, which are both defined in dplyr: as_data_frame.tbl_cube and as_data_frame.grouped_df.

as_data_frame.tbl_cube use the weaker checking of as.data.frame (yes, bear with me) to then call as_data_frame:

> getAnywhere(as_data_frame.tbl_cube)
function (x, ...) 
{
    as_data_frame(as.data.frame(x, ..., stringsAsFactors = FALSE))
}
<environment: namespace:dplyr>

while as_data_frame.grouped_df ungroups the passed dataframe.

Overall, it seems that as_data_frame should be seen as providing additional functionality over as_tibble, unless you are dealing with ts or poly objects.



回答2:

According to the introduction to tibble, it seems like tibbles supersede tbl_df.

I’m pleased to announce tibble, a new package for manipulating and printing data frames in R. Tibbles are a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. The name comes from dplyr: originally you created these objects with tbl_df(), which was most easily pronounced as “tibble diff”.

[...]This package extracts out the tbl_df class associated functions from dplyr.

To add to the confusion, tbl_df now calls as_tibble, which is the preferred alias for as_data_frame and as.tibble: (Hadley Wickham's comment on the issue, and as_tibble docs)

> tbl_df
function (data) 
{
    as_tibble(data, .name_repair = "check_unique")
}

According to the help description of tbl_df(), it is deprecated and tibble::as_tibble() should be used instead. as_data_frame and as.tibble help pages both redirect to as_tibble.

When calling class on a tibble, the class name still shows up as tbl_df:

> as_tibble(mtcars) %>% class
[1] "tbl_df"     "tbl"        "data.frame"