summarizing counts of a factor with dplyr

2020-02-19 07:39发布

I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.

Here is some example input:

library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))

  owner  obs1  obs2
1     0 quiet  loud
2     0  loud  loud
3     1 quiet quiet
4     1  loud quiet

I was looking for output that looks like this:

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))

  owner observation quiet loud
1     0        obs1     1    1
2     0        obs2     0    2
3     1        obs1     1    1
4     1        obs2     2    0

Melting gets me partway there:

melted = tbl_df(melt(df, id=c("owner")))

  owner variable value
1     0     obs1 quiet
2     0     obs1  loud
3     1     obs1 quiet
4     1     obs1  loud
5     0     obs2  loud
6     0     obs2  loud
7     1     obs2 quiet
8     1     obs2 quiet

But what's the last step? If 'value' was a numeric, I'd just go:

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))

Thanks so much!

标签: r dplyr
3条回答
何必那么认真
2楼-- · 2020-02-19 08:10

In 2017 the answer is

library(dplyr)
library(tidyr)

gather(df, key, value, -owner) %>%
  group_by(owner, key, value) %>%
  tally %>% 
  spread(value, n, fill = 0)

Which gives output

Source: local data frame [4 x 4]
Groups: owner, key [4]

  owner   key  loud quiet
* <dbl> <chr> <dbl> <dbl>
1     0  obs1     1     1
2     0  obs2     2     0
3     1  obs1     1     1
4     1  obs2     0     2

In 2019 the answer is:

gather(df, key, value, -owner) %>% 
    count(owner, key, value) %>% 
    spread(value, n, fill = 0)
查看更多
SAY GOODBYE
3楼-- · 2020-02-19 08:16

You could use tidyr with dplyr

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2
查看更多
beautiful°
4楼-- · 2020-02-19 08:29

If you wanted to forego the dplyr, you can split into lists.

df <- split(df, list(df[[obs1]], df[[obs2]])

If you wanted the count, you just create an sapply or lapply call to run through the lists and get the count of each one. Or literally any other function you want.

查看更多
登录 后发表回答