Is there a built-in function for finding the mode?

2018-12-31 04:02发布

In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there is a standard library function that implements the statistical mode for a vector (or list)?

29条回答
萌妹纸的霸气范
2楼-- · 2018-12-31 04:45

Mode can't be useful in every situations. So the function should address this situation. Try the following function.

Mode <- function(v) {
  # checking unique numbers in the input
  uniqv <- unique(v)
  # frquency of most occured value in the input data
  m1 <- max(tabulate(match(v, uniqv)))
  n <- length(tabulate(match(v, uniqv)))
  # if all elements are same
  same_val_check <- all(diff(v) == 0)
  if(same_val_check == F){
    # frquency of second most occured value in the input data
    m2 <- sort(tabulate(match(v, uniqv)),partial=n-1)[n-1]
    if (m1 != m2) {
      # Returning the most repeated value
      mode <- uniqv[which.max(tabulate(match(v, uniqv)))]
    } else{
      mode <- "Two or more values have same frequency. So mode can't be calculated."
    }
  } else {
    # if all elements are same
    mode <- unique(v)
  }
  return(mode)
}

Output,

x1 <- c(1,2,3,3,3,4,5)
Mode(x1)
# [1] 3

x2 <- c(1,2,3,4,5)
Mode(x2)
# [1] "Two or more varibles have same frequency. So mode can't be calculated."

x3 <- c(1,1,2,3,3,4,5)
Mode(x3)
# [1] "Two or more values have same frequency. So mode can't be calculated."
查看更多
ら面具成の殇う
3楼-- · 2018-12-31 04:49

Another possible solution:

Mode <- function(x) {
    if (is.numeric(x)) {
        x_table <- table(x)
        return(as.numeric(names(x_table)[which.max(x_table)]))
    }
}

Usage:

set.seed(100)
v <- sample(x = 1:100, size = 1000000, replace = TRUE)
system.time(Mode(v))

Output:

   user  system elapsed 
   0.32    0.00    0.31 
查看更多
回忆,回不去的记忆
4楼-- · 2018-12-31 04:50

This builds on jprockbelly's answer, by adding a speed up for very short vectors. This is useful when applying mode to a data.frame or datatable with lots of small groups:

Mode <- function(x) {
   if ( length(x) <= 2 ) return(x[1])
   if ( anyNA(x) ) x = x[!is.na(x)]
   ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}
查看更多
骚的不知所云
5楼-- · 2018-12-31 04:51

Calculating Mode is mostly in case of factor variable then we can use

labels(table(HouseVotes84$V1)[as.numeric(labels(max(table(HouseVotes84$V1))))])

HouseVotes84 is dataset available in 'mlbench' package.

it will give max label value. it is easier to use by inbuilt functions itself without writing function.

查看更多
还给你的自由
6楼-- · 2018-12-31 04:52

While I like Ken Williams simple function, I would like to retrieve the multiple modes if they exist. With that in mind, I use the following function which returns a list of the modes if multiple or the single.

rmode <- function(x) {
  x <- sort(x)  
  u <- unique(x)
  y <- lapply(u, function(y) length(x[x==y]))
  u[which( unlist(y) == max(unlist(y)) )]
} 
查看更多
登录 后发表回答