可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a list of vectors (in Python) that I want to normalize, while at the same time removing the vectors that originally had small norms.
The input list is, e.g.
a = [(1,1),(1,2),(2,2),(3,4)]
And I need the output to be (x*n, y*n)
with n = (x**2+y**2)**-0.5
If I just needed the norms, for example, that would be easy with a list comprehension:
an = [ (x**2+y**2)**0.5 for x,y in a ]
It would be also easy to store just a normalized x, too, for example, but what I want is to have this temporary variable "n", to use in two calculations, and then throw it away.
I can't just use a lambda function too because I also need the n to filter the list. So what is the best way?
Right now I am using this nested list comprehension here (with an expression in the inner list):
a = [(1,1),(1,2),(2,2),(3,4)]
[(x*n,y*n) for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]
# Out[14]:
# [(0.70710678118654757, 0.70710678118654757),
# (0.60000000000000009, 0.80000000000000004)]
The inner list generates tuples with an extra value (n), and then I use these values for the calculations and filtering. Is this really the best way? Are there any terrible inefficiencies I should be aware of?
回答1:
Is this really the best way?
Well, it does work efficiently and if you really, really want to write oneliners then it's the best you can do.
On the other hand, a simple 4 line function would do the same much clearer:
def normfilter(vecs, min_norm):
for x,y in vecs:
n = (x**2.+y**2.)**-0.5
if min_norm < n:
yield (x*n,y*n)
normalized = list(normfilter(vectors, 0.4))
Btw, there is a bug in your code or description - you say you filter out short vectors but your code does the opposite :p
回答2:
Starting Python 3.8
, and the introduction of assignment expressions (PEP 572) (:=
operator), it's possible to use a local variable within a list comprehension in order to avoid calling multiple times the same expression:
In our case, we can name the evaluation of (x**2.+y**2.)**-.5
as a variable n
while using the result of the expression to filter the list if n
is inferior than 0.4
; and thus re-use n
to produce the mapped value:
# vectors = [(1, 1), (1, 2), (2, 2), (3, 4)]
[(x*n, y*n) for x, y in vectors if (n := (x**2.+y**2.)**-.5) < .4]
# [(0.7071067811865476, 0.7071067811865476), (0.6000000000000001, 0.8)]
回答3:
This suggests using a forloop might be the fastest way. Be sure to check the timeit results on your own machine, as these results can vary depending on a number of factors (hardware, OS, Python version, length of a
, etc.).
a = [(1,1),(1,2),(2,2),(3,4)]
def two_lcs(a):
an = [ ((x**2+y**2)**0.5, x,y) for x,y in a ]
an = [ (x*n,y*n) for n,x,y in an if n < 0.4 ]
return an
def using_forloop(a):
result=[]
for x,y in a:
n=(x**2+y**2)**0.5
if n<0.4:
result.append((x*n,y*n))
return result
def using_lc(a):
return [(x*n,y*n)
for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]
yields these timeit results:
% python -mtimeit -s'import test' 'test.using_forloop(test.a)'
100000 loops, best of 3: 3.29 usec per loop
% python -mtimeit -s'import test' 'test.two_lcs(test.a)'
100000 loops, best of 3: 4.52 usec per loop
% python -mtimeit -s'import test' 'test.using_lc(test.a)'
100000 loops, best of 3: 6.97 usec per loop
回答4:
Stealing the code from unutbu, here is a larger test including a numpy version and the iterator version. Notice that converting the list to numpy can cost some time.
import numpy
# a = [(1,1),(1,2),(2,2),(3,4)]
a=[]
for k in range(1,10):
for j in range(1,10):
a.append( (float(k),float(j)) )
npa = numpy.array(a)
def two_lcs(a):
an = [ ((x**2+y**2)**-0.5, x,y) for x,y in a ]
an = [ (x*n,y*n) for n,x,y in an if n < 5.0 ]
return an
def using_iterator(a):
def normfilter(vecs, min_norm):
for x,y in vecs:
n = (x**2.+y**2.)**-0.5
if n < min_norm:
yield (x*n,y*n)
return list(normfilter(a, 5.0))
def using_forloop(a):
result=[]
for x,y in a:
n=(x**2+y**2)**-0.5
if n<5.0:
result.append((x*n,y*n))
return result
def using_lc(a):
return [(x*n,y*n)
for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 5.0]
def using_numpy(npa):
n = (npa[:,0]**2+npa[:,1]**2)**-0.5
where = n<5.0
npa = npa[where]
n = n[where]
npa[:,0]=npa[:,0]*n
npa[:,1]=npa[:,1]*n
return( npa )
and the result...
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.two_lcs(test.a)'
10000 loops, best of 3: 65.8 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_lc(test.a)'
10000 loops, best of 3: 65.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_forloop(test.a)'
10000 loops, best of 3: 64.1 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_iterator(test.a)'
10000 loops, best of 3: 59.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_numpy(test.npa)'
10000 loops, best of 3: 48.7 usec per loop