If I have a list like this:
results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
I want to calculate the variance of this list in Python which is the average of the squared differences from the mean.
How can I go about this? Accessing the elements in the list to do the computations is confusing me for getting the square differences.
Well, there are two ways for defining the variance. You have the variance n that you use when you have a full set, and the variance n-1 that you use when you have a sample.
The difference between the 2 is whether the value
m = sum(xi) / n
is the real average or whether it is just an approximation of what the average should be.Example1 : you want to know the average height of the students in a class and its variance : ok, the value
m = sum(xi) / n
is the real average, and the formulas given by Cleb are ok (variance n).Example2 : you want to know the average hour at which a bus passes at the bus stop and its variance. You note the hour for a month, and get 30 values. Here the value
m = sum(xi) / n
is only an approximation of the real average, and that approximation will be more accurate with more values. In that case the best approximation for the actual variance is the variance n-1Ok, it has nothing to do with Python, but it does have an impact on statistical analysis, and the question is tagged statistics and variance
Note: ordinarily, statistical libraries like numpy use the variance n for what they call
var
orvariance
, and the variance n-1 for the function that gives the standard deviation.You can use numpy's built-in function
var
:This gives you
28.822364260579157
If - for whatever reason - you cannot use
numpy
and/or you don't want to use a built-in function for it, you can also calculate it "by hand" using e.g. a list comprehension:which gives you the identical result.
If you are interested in the standard deviation, you can use numpy.std:
@Serge Ballesta explained very well the difference between variance
n
andn-1
. In numpy you can easily set this parameter using the optionddof
; its default is0
, so for then-1
case you can simply do:The "by hand" solution is given in @Serge Ballesta's answer.
Both approaches yield
32.024849178421285
.You can set the parameter also for
std
:Numpy has a method which will do this for you and that's the easiest way. Or you could write your own function.
OR
Starting
Python 3.4
, the standard library comes with thevariance
function (sample variance or variance n-1) as part of thestatistics
module:The population variance (or variance n) can be obtained using the
pvariance
function:Also note that if you already know the mean of your list, the
variance
andpvariance
functions take a second argument (respectivelyxbar
andmu
) in order to spare recomputing the mean of the sample (which is part of the variance computation).Numpy is indeed the most elegant and fast way to do it.
I think the actual question was about how to access the individual elements of a list to do such a calculation yourself, so below an example:
gives you: