可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to write this code as pythonic. My real array much bigger than this example.
( 5+10+20+3+2 ) / 5
print(np.mean(array,key=lambda x:x[1]))
TypeError: mean() got an unexpected keyword argument 'key'
array = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]
sum = 0
for i in range(len(array)):
sum = sum + array[i][1]
average = sum / len(array)
print(average)
import numpy as np
print(np.mean(array,key=lambda x:x[1]))
How can avoid this?
I want to use second example.
I'm using Python 3.7
回答1:
If you are using Python 3.4 or above, you could use the statistics
module:
from statistics import mean
average = mean(value[1] for value in array)
Or if you're using a version of Python older than 3.4:
average = sum(value[1] for value in array) / len(array)
These solutions both use a nice feature of Python called a generator expression. The loop
value[1] for value in array
creates a new sequence in a timely and memory efficient manner. See PEP 289 -- Generator Expressions.
If you're using Python 2, and you're summing integers, we will have integer division, which will truncate the result, e.g:
>>> 25 / 4
6
>>> 25 / float(4)
6.25
To ensure we don't have integer division we could set the starting value of sum
to be the float
value 0.0
. However, this also means we have to make the generator expression explicit with parentheses, otherwise it's a syntax error, and it's less pretty, as noted in the comments:
average = sum((value[1] for value in array), 0.0) / len(array)
It's probably best to use fsum
from the math
module which will return a float
:
from math import fsum
average = fsum(value[1] for value in array) / len(array)
回答2:
If you do want to use numpy
, cast it to a numpy.array
and select the axis you want using numpy
indexing:
import numpy as np
array = np.array([('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)])
print(array[:,1].astype(float).mean())
# 8.0
The cast to a numeric type is needed because the original array contains both strings and numbers and is therefore of type object
. In this case you could use float
or int
, it makes no difference.
回答3:
If you're open to more golf-like solutions, you can transpose your array with vanilla python, get a list of just the numbers, and calculate the mean with
sum(zip(*array)[1])/len(array)
回答4:
With pure Python:
from operator import itemgetter
acc = 0
count = 0
for value in map(itemgetter(1), array):
acc += value
count += 1
mean = acc / count
An iterative approach can be preferable if your data cannot fit in memory as a list
(since you said it was big). If it can, prefer a declarative approach:
data = [sub[1] for sub in array]
mean = sum(data) / len(data)
If you are open to using numpy
, I find this cleaner:
a = np.array(array)
mean = a[:, 1].astype(int).mean()
回答5:
you can use map
instead of list comprehension
sum(map(lambda x:int(x[1]), array)) / len(array)
or functools.reduce
(if you use Python2.X just reduce
not functools.reduce
)
import functools
functools.reduce(lambda acc, y: acc + y[1], array, 0) / len(array)
回答6:
You can simply use:
print(sum(tup[1] for tup in array) / len(array))
Or for Python 2:
print(sum(tup[1] for tup in array) / float(len(array)))
Or little bit more concisely for Python 2:
from math import fsum
print(fsum(tup[1] for tup in array) / len(array))
回答7:
You could use map
:
np.mean(list(map(lambda x: x[1], array)))
回答8:
Just find the average using sum and number of elements of the list.
array = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]
avg = float(sum(value[1] for value in array)) / float(len(array))
print(avg)
#8.0
回答9:
The problem here is that you cannot directly compute the mean of the list of tuples as an ndarray
because all values will be cast to str
.
Onw way around this however would be to define a structured array from the list of tuples, so that you can associate a different datatype to each element in the tuples.
So you can define a structured array from the list of tuples with:
l = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]
a = np.array(l, dtype=([('str', '<U1'), ('num', '<i4')]))
And then simply take the np.mean
of the numerical field, i.e the second element in the tuples:
np.mean(a['num'])
# 8.0