可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Using Python, assume I'm running through a known quantity of items I, and have the ability to time how long it takes to process each one t, as well as a running total of time spent processing T and the number of items processed so far c. I'm currently calculating the average on the fly A = T / c but this can be skewed by say a single item taking an extraordinarily long time to process (a few seconds compared to a few milliseconds).

I would like to show a running Standard Deviation. How can I do this without keeping a record of each t?

回答1:

I use Welford's Method, which gives more accurate results. This link points to John D. Cook's overview. Here's a paragraph from it that summarizes why it is a preferred approach:

This better way of computing variance goes back to a 1962 paper by B. P. Welford and is presented in Donald Knuth’s Art of Computer Programming, Vol 2, page 232, 3rd edition. Although this solution has been known for decades, not enough people know about it. Most people are probably unaware that computing sample variance can be difficult until the first time they compute a standard deviation and get an exception for taking the square root of a negative number.

回答2:

As outlined in the Wikipedia article on the standard deviation, it is enough to keep track of the following three sums:

s0 = sum(1 for x in samples)
s1 = sum(x for x in samples)
s2 = sum(x*x for x in samples)

These sums are easily updated as new values arrive. The standard deviation can be calculated as

std_dev = math.sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1)))

Note that this way of computing the standard deviation can be numerically ill-conditioned if your samples are floating point numbers and the standard deviation is small compared to the mean of the samples. If you expect samples of this type, you should resort to Welford's method (see the accepted answer).

回答3:

Based on Welford's algorithm:

import numpy as np

class OnlineVariance(object):
    """
    Welford's algorithm computes the sample variance incrementally.
    """

    def __init__(self, iterable=None, ddof=1):
        self.ddof, self.n, self.mean, self.M2 = ddof, 0, 0.0, 0.0
        if iterable is not None:
            for datum in iterable:
                self.include(datum)

    def include(self, datum):
        self.n += 1
        self.delta = datum - self.mean
        self.mean += self.delta / self.n
        self.M2 += self.delta * (datum - self.mean)

    @property
    def variance(self):
        return self.M2 / (self.n - self.ddof)

    @property
    def std(self):
        return np.sqrt(self.variance)

Update the variance with each new piece of data:

N = 100
data = np.random.random(N)
ov = OnlineVariance(ddof=0)
for d in data:
    ov.include(d)
std = ov.std
print(std)

Check our result against the standard deviation computed by numpy:

assert np.allclose(std, data.std())

Computing Standard Deviation in a stream

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Computing Standard Deviation in a stream

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮