summarizing counts of a factor with dplyr

I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.

Here is some example input:

library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))

  owner  obs1  obs2
1     0 quiet  loud
2     0  loud  loud
3     1 quiet quiet
4     1  loud quiet

I was looking for output that looks like this:

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))

  owner observation quiet loud
1     0        obs1     1    1
2     0        obs2     0    2
3     1        obs1     1    1
4     1        obs2     2    0

Melting gets me partway there:

melted = tbl_df(melt(df, id=c("owner")))

  owner variable value
1     0     obs1 quiet
2     0     obs1  loud
3     1     obs1 quiet
4     1     obs1  loud
5     0     obs2  loud
6     0     obs2  loud
7     1     obs2 quiet
8     1     obs2 quiet

But what's the last step? If 'value' was a numeric, I'd just go:

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))

Thanks so much!

标签： r dplyr

3条回答

何必那么认真

2楼-- · 2020-02-19 08:10

In 2017 the answer is

library(dplyr)
library(tidyr)

gather(df, key, value, -owner) %>%
  group_by(owner, key, value) %>%
  tally %>% 
  spread(value, n, fill = 0)

Which gives output

Source: local data frame [4 x 4]
Groups: owner, key [4]

  owner   key  loud quiet
* <dbl> <chr> <dbl> <dbl>
1     0  obs1     1     1
2     0  obs2     2     0
3     1  obs1     1     1
4     1  obs2     0     2

In 2019 the answer is:

gather(df, key, value, -owner) %>% 
    count(owner, key, value) %>% 
    spread(value, n, fill = 0)

0人赞添加讨论(0) 举报

SAY GOODBYE

3楼-- · 2020-02-19 08:16

You could use tidyr with dplyr

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2

0人赞添加讨论(0) 举报

beautiful°

4楼-- · 2020-02-19 08:29

If you wanted to forego the dplyr, you can split into lists.

df <- split(df, list(df[[obs1]], df[[obs2]])

If you wanted the count, you just create an sapply or lapply call to run through the lists and get the count of each one. Or literally any other function you want.

0人赞添加讨论(0) 举报

summarizing counts of a factor with dplyr

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间