R: Count unique values by category

2020-02-08 09:29发布

I have data in R that looks like this:

 Cnty   Yr   Plt       Spp  DBH Ht Age
 1  185 1999 20001 Bitternut  8.0 54  47
 2  185 1999 20001 Bitternut  7.2 55  50
 3   31 1999 20001    Pignut  7.4 71  60
 4   31 1999 20001    Pignut 11.4 85 114
 5  189 1999 20001        WO 14.5 80  82
 6  189 1999 20001        WO 12.1 72  79

I would like to know the quantity of unique species (Spp) in each county (Cnty). "unique(dfname$Spp)" gives me a total count of unique species in the data frame, but I would like it by county.

Any help is appreciated! Sorry for the weird formatting, this is my first ever question on SO.

Thanks.

7条回答
祖国的老花朵
2楼-- · 2020-02-08 09:51

I wanted to add on to what A Handcart And Mohair mentioned. For those of you wanting to get the results of the code below into a data frame(helpful in R studio)...

with(mydf, table(Spp, Cnty))
#            Cnty
# Spp         185 189 31
#   Bitternut   2   1  0
#   Pignut      2   1  1
#   WO          1   0  2
ftable(mydf, row.vars="Spp", col.vars=c("Cnty", "Yr"))
#           Cnty  185       189        31     
#           Yr   1999 2000 1999 2000 1999 2000
# Spp                                         
# Bitternut         1    1    0    1    0    0
# Pignut            2    0    0    1    0    1
# WO                0    1    0    0    2    0

You'll need to put the as.data.frame.matrix modifier in front of your code like so:

as.data.frame.matrix(with(mydf, table(Spp, Cnty)))

I was pretty new to R when I came upon this post, and it took me a long time to figure that out, so I thought I'd share.

查看更多
老娘就宠你
3楼-- · 2020-02-08 09:52

As Justin mentioned aggregate is probably what you want. If you call your data frame foo, then the following should give you what you want, namely the number of individuals per species assuming that each row with Butternut represents a unique individual belonging to the butternut species. Note I used foo$Age to calculate the length of the vector, i.e. the number of individuals (row) belonging to each species, but you could use foo$Ht or foo$DBH etc.

aggregate(foo$Age, by = foo[c('Spp','Cnty')], length)

Cheers,

Danny

查看更多
We Are One
4楼-- · 2020-02-08 09:55

We can now use the tally function to make this easier.

tally(group_by(mydf, Spp, Cnty))

        Spp   Cnty     n
     <fctr> <fctr> <int>
1 Bitternut    185     2
2 Bitternut    189     1
3    Pignut    185     2
4    Pignut    189     1
5    Pignut     31     1
6        WO    185     1
7        WO     31     2
查看更多
ら.Afraid
5楼-- · 2020-02-08 10:06

I've tried to make your sample data a little bit more interesting. Your sample data presently has just one unique "Spp" per "Cnty".

set.seed(1)
mydf <- data.frame(
  Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
  Yr = c(rep(c("1999", "2000"), times = c(3, 2)), 
         "1999", "1999", "2000", "2000", "2000"),
  Plt = "20001",
  Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
  DBH = runif(10, 0, 15)
)
mydf
#    Cnty   Yr   Plt       Spp       DBH
# 1   185 1999 20001 Bitternut  3.089619
# 2   185 1999 20001    Pignut  2.648351
# 3   185 1999 20001    Pignut 10.305343
# 4   185 2000 20001        WO  5.761556
# 5   185 2000 20001 Bitternut 11.547621
# 6    31 1999 20001        WO  7.465489
# 7    31 1999 20001        WO 10.764278
# 8    31 2000 20001    Pignut 14.878591
# 9   189 2000 20001    Pignut  5.700528
# 10  189 2000 20001 Bitternut 11.661678

Next, as suggested, tapply is a good candidate here. Combine unique and length to get the data you are looking for.

with(mydf, tapply(Spp, Cnty, FUN = function(x) length(unique(x))))
# 185 189  31 
#   3   2   2 
with(mydf, tapply(Spp, list(Cnty, Yr), FUN = function(x) length(unique(x))))
#     1999 2000
# 185    2    2
# 189   NA    2
# 31     1    1

If you're interested in simple tabulation (not of unique values), then you can explore table and ftable:

with(mydf, table(Spp, Cnty))
#            Cnty
# Spp         185 189 31
#   Bitternut   2   1  0
#   Pignut      2   1  1
#   WO          1   0  2
ftable(mydf, row.vars="Spp", col.vars=c("Cnty", "Yr"))
#           Cnty  185       189        31     
#           Yr   1999 2000 1999 2000 1999 2000
# Spp                                         
# Bitternut         1    1    0    1    0    0
# Pignut            2    0    0    1    0    1
# WO                0    1    0    0    2    0
查看更多
该账号已被封号
6楼-- · 2020-02-08 10:07

A simple solution using the data.table approach.

library(data.table)

output <- setDT(mydf)[ , .(count=.N) , by = .(Spp,Cnty)]

in case you want to reshape the output into a nicer table format:

library(tidyr)

spread(data=a, key =Spp, count)

#   Cnty Bitternut Pignut WO
# 1:  185         2      2  1
# 2:  189         1      1 NA
# 3:   31        NA      1  2

# or perhaps like this:

spread(data=a, key =Cnty, count)

#          Spp 185 189 31
# 1: Bitternut   2   1 NA
# 2:    Pignut   2   1  1
# 3:        WO   1  NA  2
查看更多
等我变得足够好
7楼-- · 2020-02-08 10:14
set.seed(1)
mydf <- data.frame(
  Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
  Yr = c(rep(c("1999", "2000"), times = c(3, 2)), 
         "1999", "1999", "2000", "2000", "2000"),
  Plt = "20001",
  Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
  DBH = runif(10, 0, 15)
)
mydf

The dplyr::count() function looks like a simple solution:

library(dplyr)
count(mydf, Spp, Cnty)
# A tibble: 7 x 3
# Spp       Cnty      n
# <fct>     <fct> <int>
# 1 Bitternut 185       2
# 2 Bitternut 189       1
# 3 Pignut    185       2
# 4 Pignut    189       1
# 5 Pignut    31        1
# 6 WO        185       1
# 7 WO        31        2
查看更多
登录 后发表回答