I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.
The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?
It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.
here's an example:
import numpy
def my_mean(values):
n = 0
Sum = 0.0
try:
while True:
Sum += next(values)
n += 1
except StopIteration: pass
return float(Sum)/n
X = [k for k in range(1,7)]
Y = (k for k in range(1,7))
print numpy.mean(X)
print my_mean(Y)
these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.
I really like the idea of working with generators, but details like this seem to spoil things.
Your approach is a good one, but you should instead use the
for x in y
idiom instead of repeatedly callingnext
until you get aStopIteration
. This works for both lists and generators:Try:
tee
will duplicate your iterator for any iterablei
(e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.(Note that 'tee' will still use intermediate storage).
The above is very similar to your code, except by using
for
to iteratevalues
you are good no matter if you get a list or an iterator. The pythonsum
method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.(Also notice that since you are using python3, you don't need
float(sum)/n
)Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.
You can use reduce without knowing the size of the array:
There is
statistics.mean()
in Python 3.4 but it callslist()
on the input:where
_sum()
returns an accurate sum (math.fsum()
-like function that in addition tofloat
also supportsFraction
,Decimal
).