Count most frequent word in row by R

2019-07-25 06:55发布

问题:

There is a table shown below

   Name     Mon    Tue     Wed    Thu     Fri    Sat    Sun

1 John     Apple  Orange  Apple  Banana  Apple  Apple  Orange
2 Ricky    Banana Apple   Banana Banana  Banana Banana Apple
3 Alex     Apple  Orange  Orange Apple   Apple  Orange Orange
4 Robbin   Apple  Apple   Apple  Apple   Apple  Banana Banana
5 Sunny    Banana Banana  Apple  Apple   Apple  Banana Banana

So , I want to count the most frequent Fruit for each person and add those value in new column.

For example.

   Name     Mon    Tue     Wed    Thu     Fri    Sat    Sun      Max_Acc  Count

1 John     Apple  Orange  Apple  Banana  Apple  Apple  Orange     Apple       4
2 Ricky    Banana Apple   Banana Banana  Banana Banana Apple      Banana      5
3 Alex     Apple  Orange  Orange Apple   Apple  Orange Orange     Orange      4
4 Robbin   Apple  Apple   Apple  Apple   Apple  Banana Banana     Apple       5
5 Sunny    Banana Banana  Apple  Apple   Apple  Banana Banana     Banana      4

I am facing problem in finding rows. I can find Frequency in column by using table() function.

>table(df$Mon)

 Apple  Banana
  3      2

But here i want name of most frequent fruit in new column.

回答1:

If we need the "Count" and "Names" corresponding to the max "Count", we loop through the rows of the dataset (using apply with MARGIN = 1), use table to get the frequency, extract the maximum value from it and the names corresponding to the maximum value, rbind it and cbind with the original dataset.

cbind(df1, do.call(rbind, apply(df1[-1], 1, function(x) {
              x1 <- table(x)
             data.frame(Count = max(x1), Names=names(x1)[which.max(x1)])})))

#    Name    Mon    Tue    Wed    Thu    Fri    Sat    Sun Count  Names
#1   John  Apple Orange  Apple Banana  Apple  Apple Orange     4  Apple
#2  Ricky Banana  Apple Banana Banana Banana Banana  Apple     5 Banana
#3   Alex  Apple Orange Orange  Apple  Apple Orange Orange     4 Orange
#4 Robbin  Apple  Apple  Apple  Apple  Apple Banana Banana     5  Apple
#5  Sunny Banana Banana  Apple  Apple  Apple Banana Banana     4 Banana

Or we can use data.table

library(data.table)
setDT(df1)[, c("Names", "Count") := {tbl <- table(unlist(.SD))
                    .(names(tbl)[which.max(tbl)], max(tbl))}, by = Name]


回答2:

Another approach would be to loop over all unique fruits as follows

fruits_unique <- unique(unlist(dat[-1]))
occurence <- sapply(fruits_unique, function(x) rowSums(dat[,-1] == x)) 
# Using this data to create the resulting columns
ind <- apply(occurence,1,which.max)
dat$Names <- fruits_unique[ind]
dat$count <- occurence[cbind(seq_along(ind), ind)]

Result:

    Name    Mon    Tue    Wed    Thu    Fri    Sat    Sun  Names Count
1   John  Apple Orange  Apple Banana  Apple  Apple Orange  Apple     4
2  Ricky Banana  Apple Banana Banana Banana Banana  Apple Banana     5
3   Alex  Apple Orange Orange  Apple  Apple Orange Orange Orange     4
4 Robbin  Apple  Apple  Apple  Apple  Apple Banana Banana  Apple     5
5  Sunny Banana Banana  Apple  Apple  Apple Banana Banana Banana     4