What is vectorization? [closed]

2020-01-23 05:43发布

import numpy as np from timeit import Timer li = list(range(500000)) nump_arr = np.array(li) def python_for(): return [num + 1 for num in li] def numpy_add(): return nump_arr + 1 print(min(Timer(python_for).repeat(10, 10))) print(min(Timer(numpy_add).repeat(10, 10))) # 0.725692612368003 # 0.010465986942008954

回答2:

Here's a definition from Wes McKinney:

Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise.

Vectorized version:

>>> import numpy as np
>>> arr = np.array([[1., 2., 3.], [4., 5., 6.]])
>>> arr * arr
array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]])

The same thing with loops on a native Python (nested) list:

>>> arr = arr.tolist()
>>> res = [[0., 0., 0.], [0., 0., 0.]]
>>> for idx1, row in enumerate(arr):
        for idx2, val2 in enumerate(row):
            res[idx1][idx2] = val2 * val2
>>> res
[[1.0, 4.0, 9.0], [16.0, 25.0, 36.0]]

How do these two operations compare? The NumPy version takes 436 ns; the Python version takes 3.52 µs (3520 ns). This large difference in "small" times is called microperformance, and it becomes important when you're working with larger data or repeating operations thousands or millions of times.