In R, mean()
and median()
are standard functions which do what you'd expect. mode()
tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there is a standard library function that implements the statistical mode for a vector (or list)?
相关问题
- R - Quantstart: Testing Strategy on Multiple Equit
- Using predict with svyglm
- Reshape matrix by rows
- Extract P-Values from Dunnett Test into a Table by
- split data frame into two by column value [duplica
相关文章
- How to convert summary output to a data frame?
- How to plot smoother curves in R
- Paste all possible diagonals of an n*n matrix or d
- ess-rdired: I get this error “no ESS process is as
- How to use doMC under Windows or alternative paral
- dyLimit for limited time in Dygraphs
- Saving state of Shiny app to be restored later
- How to insert pictures into each individual bar in
Here, another solution:
Here is a function to find the mode:
I was looking through all these options and started to wonder about their relative features and performances, so I did some tests. In case anyone else are curious about the same, I'm sharing my results here.
Not wanting to bother about all the functions posted here, I chose to focus on a sample based on a few criteria: the function should work on both character, factor, logical and numeric vectors, it should deal with NAs and other problematic values appropriately, and output should be 'sensible', i.e. no numerics as character or other such silliness.
I also added a function of my own, which is based on the same
rle
idea as chrispy's, except adapted for more general use:I ended up running five functions, on two sets of test data, through
microbenchmark
. The function names refer to their respective authors:Chris' function was set to
method="modes"
andna.rm=TRUE
by default to make it more comparable, but other than that the functions were used as presented here by their authors.In matter of speed alone Kens version wins handily, but it is also the only one of these that will only report one mode, no matter how many there really are. As is often the case, there's a trade-off between speed and versatility. In
method="mode"
, Chris' version will return a value iff there is one mode, else NA. I think that's a nice touch. I also think it's interesting how some of the functions are affected by an increased number of unique values, while others aren't nearly as much. I haven't studied the code in detail to figure out why that is, apart from eliminating logical/numeric as a the cause.Another simple option that gives all values ordered by frequency is to use
rle
:I've written the following code in order to generate the mode.
Let's try it:
An easy way to calculate MODE of a vector 'v' containing discrete values is: