Is there any pythonic way to find average of speci

2020-03-01 03:16发布

问题:

I want to write this code as pythonic. My real array much bigger than this example.

( 5+10+20+3+2 ) / 5

print(np.mean(array,key=lambda x:x[1])) TypeError: mean() got an unexpected keyword argument 'key'

array = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]

sum = 0
for i in range(len(array)):
    sum = sum + array[i][1]

average = sum / len(array)
print(average)

import numpy as np
print(np.mean(array,key=lambda x:x[1]))

How can avoid this? I want to use second example.

I'm using Python 3.7

回答1:

If you are using Python 3.4 or above, you could use the statistics module:

from statistics import mean

average = mean(value[1] for value in array)

Or if you're using a version of Python older than 3.4:

average = sum(value[1] for value in array) / len(array)

These solutions both use a nice feature of Python called a generator expression. The loop

value[1] for value in array

creates a new sequence in a timely and memory efficient manner. See PEP 289 -- Generator Expressions.

If you're using Python 2, and you're summing integers, we will have integer division, which will truncate the result, e.g:

>>> 25 / 4
6

>>> 25 / float(4)
6.25

To ensure we don't have integer division we could set the starting value of sum to be the float value 0.0. However, this also means we have to make the generator expression explicit with parentheses, otherwise it's a syntax error, and it's less pretty, as noted in the comments:

average = sum((value[1] for value in array), 0.0) / len(array)

It's probably best to use fsum from the math module which will return a float:

from math import fsum

average = fsum(value[1] for value in array) / len(array)


回答2:

If you do want to use numpy, cast it to a numpy.array and select the axis you want using numpy indexing:

import numpy as np

array = np.array([('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)])
print(array[:,1].astype(float).mean())
# 8.0

The cast to a numeric type is needed because the original array contains both strings and numbers and is therefore of type object. In this case you could use float or int, it makes no difference.



回答3:

If you're open to more golf-like solutions, you can transpose your array with vanilla python, get a list of just the numbers, and calculate the mean with

sum(zip(*array)[1])/len(array)


回答4:

With pure Python:

from operator import itemgetter

acc = 0
count = 0

for value in map(itemgetter(1), array):
    acc += value
    count += 1

mean = acc / count

An iterative approach can be preferable if your data cannot fit in memory as a list (since you said it was big). If it can, prefer a declarative approach:

data = [sub[1] for sub in array]
mean = sum(data) / len(data)

If you are open to using numpy, I find this cleaner:

a = np.array(array)

mean = a[:, 1].astype(int).mean()


回答5:

you can use map instead of list comprehension

sum(map(lambda x:int(x[1]), array)) / len(array)

or functools.reduce (if you use Python2.X just reduce not functools.reduce)

import functools
functools.reduce(lambda acc, y: acc + y[1], array, 0) / len(array)


回答6:

You can simply use:

print(sum(tup[1] for tup in array) / len(array))

Or for Python 2:

print(sum(tup[1] for tup in array) / float(len(array)))

Or little bit more concisely for Python 2:

from math import fsum

print(fsum(tup[1] for tup in array) / len(array))


回答7:

You could use map:

np.mean(list(map(lambda x: x[1], array)))



回答8:

Just find the average using sum and number of elements of the list.

array = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]
avg = float(sum(value[1] for value in array)) / float(len(array))
print(avg)
#8.0


回答9:

The problem here is that you cannot directly compute the mean of the list of tuples as an ndarray because all values will be cast to str.

Onw way around this however would be to define a structured array from the list of tuples, so that you can associate a different datatype to each element in the tuples.

So you can define a structured array from the list of tuples with:

l = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]
a = np.array(l, dtype=([('str', '<U1'), ('num', '<i4')]))

And then simply take the np.mean of the numerical field, i.e the second element in the tuples:

np.mean(a['num'])
# 8.0