calculating the frequency of occurrences in every

I'm trying to count the frequency of a specific value in every column.

Basically, I am looking at how different bacterial isolates (represented by each row) respond to treatment with different antibiotics (represented each column). "1" means the isolate is resistant to the antibiotic, while "0" means the isolate is susceptible to the antibiotic.

antibiotic1 <- c(1, 1, 0, 1, 0, 1, NA, 0, 1)
antibiotic2 <- c(0, 0, NA, 0, 1, 1, 0, 0, 0)
antibiotic3 <- c(0, 1, 1, 0, 0, NA, 1, 0, 0)

ab <- data.frame(antibiotic1, antibiotic2, antibiotic3)

ab
       antibiotic1 antibiotic2 antibiotic3
1           1           0           0
2           1           0           1
3           0          NA           1
4           1           0           0
5           0           1           0
6           1           1          NA
7          NA           0           1
8           0           0           0
9           1           0           0

So looking at the first row, isolate 1 is resistant to antibiotic 1, sensitive to antibiotic 2, and sensitive to antibiotic 3.

I want to calculate the % of isolates resistant to each antibiotic. i.e. sum the number of "1"s in each column and divide by the number of isolates in each column (excluding NAs from my denominator).

I know how to get counts:

apply(ab, 2, count)

$antibiotic1
   x   freq
1  0    3
2  1    5
3 NA    1

$antibiotic2
   x freq
1  0    6
2  1    2
3 NA    1

$antibiotic3
   x freq
1  0    5
2  1    3
3 NA    1

But my actual dataset contains many different antibiotics and hundreds of isolates, so I want to be able to run a function across all columns at the same time to yield a dataframe.

I've tried

counts <- ldply(ab, function(x) sum(x=="1")/(sum(x=="1") +  sum(x=="0")))

but that yields NAs:

          .id V1
1 antibiotic1 NA
2 antibiotic2 NA
3 antibiotic3 NA

I've also tried:

library(dplyr)
ab %>%
 summarise_each(n = n())) %>%
 mutate(prop.resis = n/sum(n))

but get an error message that reads:

Error in n() : This function should not be called directly

Any advice would be much appreciated.

标签： r apply

4条回答

够拽才男人

2楼-- · 2019-07-20 19:31

I would just vectorize this using colMeans

colMeans(ab, na.rm = TRUE)
# antibiotic1 antibiotic2 antibiotic3 
#       0.625       0.250       0.375

As a side note, this can be easily generalized to calculate the frequency of any number. If, for instance, you were looking for the frequency of the number 2 in all columns, you could simply modify to colMeans(ab == 2, na.rm = TRUE)

Or similarly, just (this avoids to matrix conversion with a trade off with by column evaluation)

sapply(ab, mean, na.rm = TRUE)
# antibiotic1 antibiotic2 antibiotic3 
#       0.625       0.250       0.375

0人赞添加讨论(0) 举报

beautiful°

3楼-- · 2019-07-20 19:31

Here's one way to do it:

antibiotic1 antibiotic2 antibiotic3
1           0           0
1           0           1
0          NA           1
1           0           0
0           1           0
1           1          NA
NA          0           1
0           0           0
1           0           0

dat <- read.table(file="clipboard",header=T)
sapply(dat, function(x) prop.table(table(x,useNA = "no"))[[2]])

antibiotic1 antibiotic2 antibiotic3 
      0.625       0.250       0.375

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2019-07-20 19:41

More simply, using base R, you could do

apply(sapply(ab, table), 2, prop.table)

This gives you the proportion of 1 and 0 for each antibiotic excluding NA

#   antibiotic1 antibiotic2 antibiotic3
# 0       0.375        0.75       0.625
# 1       0.625        0.25       0.375

If you're interested only in the proportion of 1, select the second row by adding [2, ] an the end of the line.

0人赞添加讨论(0) 举报

欢心

5楼-- · 2019-07-20 19:53

another answer to the question, is this what you want?

antibiotic1 <- c(1, 1, 0, 1, 0, 1, NA, 0, 1)
antibiotic2 <- c(0, 0, NA, 0, 1, 1, 0, 0, 0)
antibiotic3 <- c(0, 1, 1, 0, 0, NA, 1, 0, 0)

ab <- data.frame(antibiotic1, antibiotic2, antibiotic3)


result <- vector()
for (i in 1:dim(ab)[2]) {
    print(sum(ab[i],na.rm = TRUE)/dim(na.omit(ab[i]))[1])        
    result <- c(result,sum(ab[i],na.rm = TRUE)/dim(na.omit(ab[i]))[1])
}

result

0.625 0.250 0.375

0人赞添加讨论(0) 举报

calculating the frequency of occurrences in every

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间