I'm having a difficult time understand what these statistics functions do and how they work. I'm having an even more difficult time understanding how stdev works vs stdevp and the var equivelant. Can someone please break these down into dumb for me?
问题:
回答1:
In statistics Standard Deviation and Variance are measures of how much a metric in a population deviate from the mean (usually the average.) The Standard Deviation is defined as the square root of the Variance and the Variance is defined as the average of the squared difference from the mean, i.e.:
For a population of size n: x1, x2, ..., xn with mean: xmean
Stdevp = sqrt( ((x1-xmean)^2 + (x2-xmean)^2 + ... + (xn-xmean)^2)/n )
When values for the whole population are not available (most of the time) it is customary to apply Bessel's correction to get a better estimate of the actual standard deviation for the whole population. Bessel's correction is merely dividing by n-1 instead of by n when computing the variance, i.e:
Stdev = sqrt( ((x1-xmean)^2 + (x2-xmean)^2 + ... + (xn-xmean)^2)/(n-1) )
Note that for large enough data-sets it won't really matter which function is used.
You can verify my answer by running the following T-SQL script:
-- temporary data set with values 2, 3, 4
declare @t table([val] int);
insert into @t values
(2),(3),(4);
select avg(val) as [avg], -- equals to 3.0
-- Estimation of the population standard devisation using a sample and Bessel's Correction:
-- ((x1 - xmean)^2 + (x2 - xmean)^2 + ... + (xn-xmean)^2)/(n-1)
stdev(val) as [stdev],
sqrt( (square(2-3.0) + square(3-3) + square(4-3))/2) as [stdev calculated], -- calculated with value 2, 3, 4
-- Population standard deviation:
-- ((x1 - xmean)^2 + (x2 - xmean)^2 + ... + (xn-xmean)^2)/n
stdevp(val) as [stdevp],
sqrt( (square(2-3.0) + square(3-3) + square(4-3))/3) as [stdevp calculated] -- calculated with value 2, 3, 4
from @t;
Further reading wikipedia articles for: standard deviation and Bessel's Correction.
回答2:
STDDEV is used for computing the standard deviation of a data set. STDDEVP is used to compute the standard deviation of a population from which your data is a sample.
If your input is the entire population, then the population standard deviation is computed with STDDEV. More typically, your data set is a sample of a much larger population. In this case the standard deviation of the data set would not represent the true standard deviation of the population since it will usually be biased too low. A better estimate for the standard deviation of the population based on a sample is obtained with STDDEVP.
The situation with VAR and VARP is the same.
For a more thorough discussion of the topic, please see this Wikipedia article.