Is there a built-in function for finding the mode?-第3页回答

In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there is a standard library function that implements the statistical mode for a vector (or list)?

标签： r statistics r-faq

29条回答

笑指拈花

2楼-- · 2018-12-31 04:35

Here, another solution:

freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])

0人赞添加讨论(0) 举报

永恒的永恒

3楼-- · 2018-12-31 04:36

Here is a function to find the mode:

mode <- function(x) {
  unique_val <- unique(x)
  counts <- vector()
  for (i in 1:length(unique_val)) {
    counts[i] <- length(which(x==unique_val[i]))
  }
  position <- c(which(counts==max(counts)))
  if (mean(counts)==max(counts)) 
    mode_x <- 'Mode does not exist'
  else 
    mode_x <- unique_val[position]
  return(mode_x)
}

0人赞添加讨论(0) 举报

高级女魔头

4楼-- · 2018-12-31 04:36

I was looking through all these options and started to wonder about their relative features and performances, so I did some tests. In case anyone else are curious about the same, I'm sharing my results here.

Not wanting to bother about all the functions posted here, I chose to focus on a sample based on a few criteria: the function should work on both character, factor, logical and numeric vectors, it should deal with NAs and other problematic values appropriately, and output should be 'sensible', i.e. no numerics as character or other such silliness.

I also added a function of my own, which is based on the same rle idea as chrispy's, except adapted for more general use:

library(magrittr)

Aksel <- function(x, freq=FALSE) {
    z <- 2
    if (freq) z <- 1:2
    run <- x %>% as.vector %>% sort %>% rle %>% unclass %>% data.frame
    colnames(run) <- c("freq", "value")
    run[which(run$freq==max(run$freq)), z] %>% as.vector   
}

set.seed(2)

F <- sample(c("yes", "no", "maybe", NA), 10, replace=TRUE) %>% factor
Aksel(F)

# [1] maybe yes  

C <- sample(c("Steve", "Jane", "Jonas", "Petra"), 20, replace=TRUE)
Aksel(C, freq=TRUE)

# freq value
#    7 Steve

I ended up running five functions, on two sets of test data, through microbenchmark. The function names refer to their respective authors:

Chris' function was set to method="modes" and na.rm=TRUE by default to make it more comparable, but other than that the functions were used as presented here by their authors.

In matter of speed alone Kens version wins handily, but it is also the only one of these that will only report one mode, no matter how many there really are. As is often the case, there's a trade-off between speed and versatility. In method="mode", Chris' version will return a value iff there is one mode, else NA. I think that's a nice touch. I also think it's interesting how some of the functions are affected by an increased number of unique values, while others aren't nearly as much. I haven't studied the code in detail to figure out why that is, apart from eliminating logical/numeric as a the cause.

0人赞添加讨论(0) 举报

高级女魔头

5楼-- · 2018-12-31 04:36

Another simple option that gives all values ordered by frequency is to use rle:

df = as.data.frame(unclass(rle(sort(mySamples))))
df = df[order(-df$lengths),]
head(df)

0人赞添加讨论(0) 举报

只靠听说

6楼-- · 2018-12-31 04:37

I've written the following code in order to generate the mode.

MODE <- function(dataframe){
    DF <- as.data.frame(dataframe)

    MODE2 <- function(x){      
        if (is.numeric(x) == FALSE){
            df <- as.data.frame(table(x))  
            df <- df[order(df$Freq), ]         
            m <- max(df$Freq)        
            MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1]))

            if (sum(df$Freq)/length(df$Freq)==1){
                warning("No Mode: Frequency of all values is 1", call. = FALSE)
            }else{
                return(MODE1)
            }

        }else{ 
            df <- as.data.frame(table(x))  
            df <- df[order(df$Freq), ]         
            m <- max(df$Freq)        
            MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1])))

            if (sum(df$Freq)/length(df$Freq)==1){
                warning("No Mode: Frequency of all values is 1", call. = FALSE)
            }else{
                return(MODE1)
            }
        }
    }

    return(as.vector(lapply(DF, MODE2)))
}

Let's try it:

MODE(mtcars)
MODE(CO2)
MODE(ToothGrowth)
MODE(InsectSprays)

0人赞添加讨论(0) 举报

只若初见

7楼-- · 2018-12-31 04:37

An easy way to calculate MODE of a vector 'v' containing discrete values is:

names(sort(table(v)))[length(sort(table(v)))]

0人赞添加讨论(0) 举报

Is there a built-in function for finding the mode?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间