compute mean in python for a generator

I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.

The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?

It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.

here's an example:

import numpy 

def my_mean(values):
    n = 0
    Sum = 0.0
    try:
        while True:
            Sum += next(values)
            n += 1
    except StopIteration: pass
    return float(Sum)/n

X = [k for k in range(1,7)]
Y = (k for k in range(1,7))

print numpy.mean(X)
print my_mean(Y)

these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.

I really like the idea of working with generators, but details like this seem to spoil things.

标签： python generator mean

10条回答

闹够了就滚

2楼-- · 2020-06-09 05:48

Your approach is a good one, but you should instead use the for x in y idiom instead of repeatedly calling next until you get a StopIteration. This works for both lists and generators:

def my_mean(values):
    n = 0
    Sum = 0.0

    for value in values:
        Sum += value
        n += 1
    return float(Sum)/n

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2020-06-09 05:48

Try:

import itertools

def mean(i):
    (i1, i2) = itertools.tee(i, 2)
    return sum(i1) / sum(1 for _ in i2)

print mean([1,2,3,4,5])

tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.

(Note that 'tee' will still use intermediate storage).

0人赞添加讨论(0) 举报

我命由我不由天

4楼-- · 2020-06-09 05:51

def my_mean(values):
    n = 0
    sum = 0
    for v in values:
        sum += v
        n += 1
    return sum/n

The above is very similar to your code, except by using for to iterate values you are good no matter if you get a list or an iterator. The python sum method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.

(Also notice that since you are using python3, you don't need float(sum)/n)

0人赞添加讨论(0) 举报

太酷不给撩

5楼-- · 2020-06-09 05:54

Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.

def my_mean(values):
    n = 0
    Sum = 0.0
    for v in values:
        Sum += v
        n += 1
    return Sum / n

0人赞添加讨论(0) 举报

来，给爷笑一个

6楼-- · 2020-06-09 06:00

You can use reduce without knowing the size of the array:

from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)

0人赞添加讨论(0) 举报

【Aperson】

7楼-- · 2020-06-09 06:01

def my_mean(values):
    total = 0
    for n, v in enumerate(values, 1):
        total += v
    return total / n

print my_mean(X)
print my_mean(Y)

There is statistics.mean() in Python 3.4 but it calls list() on the input:

def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    return _sum(data)/n

where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).

0人赞添加讨论(0) 举报

1 2 下一页

compute mean in python for a generator

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间