I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to
mean
andmedian
.For a matrix, or array, as the others have stated,
mean
andmedian
will return a single value. However,var
will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array,var
goes back to returning a single value.sd
on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better,mad
returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce usingas.vector()
first. Having fun yet?For a
data.frame
,mean
is deprecated, but will again act on the columns separately.median
requires that you coerce to a vector first, orunlist
. As before,var
will return the covariances, andsd
is again deprecated but will return the standard deviation of the columns.mad
requires that you coerce to a vector orunlist
. In general for adata.frame
if you want something to act on all values, you generally will justunlist
it first.Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
By default,
mean
andmedian
etc work over an entire array or matrix.E.g.:
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.
You can use
library dplyr
via install.packages('dplyr') and then