different behavior for group_by for data.table vs.

2020-05-03 12:32发布

问题:

When dplyr::mutate is used on a grouped data.table, the grouping is subsequently lost. This behavior does not occur for data.frame. Is this a bug? I am using dplyr_0.4.1 and data.table_1.9.4.

require(data.table)
require(dplyr)

by_cyl_df <- group_by( mtcars, cyl ) %>%
    dplyr::mutate( . , 
        maxmpg = max( mpg )
    )
groups( by_cyl_df )

[[1]] cyl

by_cyl_dt   <- group_by( as.data.table(mtcars), cyl ) %>%
    dplyr::mutate( . , 
        maxmpg = max( mpg )
    )
groups( by_cyl_dt )

NULL

回答1:

This is an open dplyr issue. After a mutate, the groups are dropped. If you look at the classes you can see this happening.

by_cyl_dt_gg   <- group_by( as.data.table(mtcars), cyl )

class(by_cyl_dt_gg)
# [1] "grouped_dt" "tbl_dt"     "tbl"        "data.table" "data.frame"
class(by_cyl_dt_gg %>% mutate(max=max(mpg)))
# [1] "tbl_dt"     "tbl"        "data.table" "data.frame"

And since it's no longer grouped (the groups_dt class is dropped), the groups function returns NULL for this type of object

> dplyr:::groups.tbl_dt
function (x) 
{
    NULL
}
<environment: namespace:dplyr>