可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I need to evaluate b-splines in python. To do so i wrote the code below which works very well.
import numpy as np
import scipy.interpolate as si
def scipy_bspline(cv,n,degree):
""" bspline basis function
c = list of control points.
n = number of points on the curve.
degree = curve degree
"""
# Create a range of u values
c = cv.shape[0]
kv = np.clip(np.arange(c+degree+1)-degree,0,c-degree)
u = np.linspace(0,c-degree,n)
# Calculate result
return np.array(si.splev(u, (kv,cv.T,degree))).T
Giving it 6 control points and asking it to evaluate 100k points on curve is a breeze:
# Control points
cv = np.array([[ 50., 25., 0.],
[ 59., 12., 0.],
[ 50., 10., 0.],
[ 57., 2., 0.],
[ 40., 4., 0.],
[ 40., 14., 0.]])
n = 100000 # 100k Points
degree = 3 # Curve degree
points_scipy = scipy_bspline(cv,n,degree) #cProfile clocks this at 0.012 seconds
Here's a plot of "points_scipy":
Now, to make this faster i could compute a basis for all 100k points on the curve, store that in memory, and when i need to draw a curve all i would need to do is multiply the new control point positions with the stored basis to get the new curve. To prove my point i wrote a function that uses DeBoor's algorithm to compute my basis:
def basis(c, n, degree):
""" bspline basis function
c = number of control points.
n = number of points on the curve.
degree = curve degree
"""
# Create knot vector and a range of samples on the curve
kv = np.array([0]*degree + range(c-degree+1) + [c-degree]*degree,dtype='int') # knot vector
u = np.linspace(0,c-degree,n) # samples range
# Cox - DeBoor recursive function to calculate basis
def coxDeBoor(u, k, d):
# Test for end conditions
if (d == 0):
if (kv[k] <= u and u < kv[k+1]):
return 1
return 0
Den1 = kv[k+d] - kv[k]
Den2 = 0
Eq1 = 0
Eq2 = 0
if Den1 > 0:
Eq1 = ((u-kv[k]) / Den1) * coxDeBoor(u,k,(d-1))
try:
Den2 = kv[k+d+1] - kv[k+1]
if Den2 > 0:
Eq2 = ((kv[k+d+1]-u) / Den2) * coxDeBoor(u,(k+1),(d-1))
except:
pass
return Eq1 + Eq2
# Compute basis for each point
b = np.zeros((n,c))
for i in xrange(n):
for k in xrange(c):
b[i][k%c] += coxDeBoor(u[i],k,degree)
b[n-1][-1] = 1
return b
Now lets use this to compute a new basis, multiply that by the control points and confirm that we're getting the same results as splev:
b = basis(len(cv),n,degree) #5600011 function calls (600011 primitive calls) in 10.975 seconds
points_basis = np.dot(b,cv) #3 function calls in 0.002 seconds
print np.allclose(points_basis,points_scipy) # Returns True
My extremely slow function returned 100k basis values in 11 seconds, but since these values only need to be computed once, calculating the points on curve ended up being 6 times faster this way than doing it via splev.
The fact that I was able to get the exact same results from both my method and splev leads me to believe that internally splev probably calculates a basis like i do, except much faster.
So my goal is to find out how to calculate my basis fast, store it in memory and just use np.dot() to compute the new points on curve, and my question is: Is it possible to use spicy.interpolate to get the basis values that (i presume) splev uses to compute it's result? And if so, how?
[ADDENDUM]
Following unutbu's and ev-br's very useful insight on how scipy calculates a spline's basis, I looked up the fortran code and wrote a equivalent to the best of my abilities:
def fitpack_basis(c, n=100, d=3, rMinOffset=0, rMaxOffset=0):
""" fitpack's spline basis function
c = number of control points.
n = number of points on the curve.
d = curve degree
"""
# Create knot vector
kv = np.array([0]*d + range(c-d+1) + [c-d]*d, dtype='int')
# Create sample range
u = np.linspace(rMinOffset, rMaxOffset + c - d, n) # samples range
# Create buffers
b = np.zeros((n,c)) # basis
bb = np.zeros((n,c)) # basis buffer
left = np.clip(np.floor(u),0,c-d-1).astype(int) # left knot vector indices
right = left+d+1 # right knot vector indices
# Go!
nrange = np.arange(n)
b[nrange,left] = 1.0
for j in xrange(1, d+1):
crange = np.arange(j)[:,None]
bb[nrange,left+crange] = b[nrange,left+crange]
b[nrange,left] = 0.0
for i in xrange(j):
f = bb[nrange,left+i] / (kv[right+i] - kv[right+i-j])
b[nrange,left+i] = b[nrange,left+i] + f * (kv[right+i] - u)
b[nrange,left+i+1] = f * (u - kv[right+i-j])
return b
Testing against unutbu's version of my original basis function:
fb = fitpack_basis(c,n,d) #22 function calls in 0.044 seconds
b = basis(c,n,d) #81 function calls (45 primitive calls) in 0.013 seconds ~5 times faster
print np.allclose(b,fb) # Returns True
My function is 5 times slower, but still relatively fast. What I like about it is that it lets me use sample ranges that go beyond the boundaries, which is something of use in my application. For example:
print fitpack_basis(c,5,d,rMinOffset=-0.1,rMaxOffset=.2)
[[ 1.331 -0.3468 0.0159 -0.0002 0. 0. ]
[ 0.0208 0.4766 0.4391 0.0635 0. 0. ]
[ 0. 0.0228 0.4398 0.4959 0.0416 0. ]
[ 0. 0. 0.0407 0.3621 0.5444 0.0527]
[ 0. 0. -0.0013 0.0673 -0.794 1.728 ]]
So for that reason I will probably use fitpack_basis since it's relatively fast. But i would love suggestions for improving its performance and hopefully get closer to unutbu's version of the original basis function i wrote.
回答1:
fitpack_basis
uses a double loop which iteratively modifies elements in bb
and b
. I don't see a way to use NumPy to vectorize these loops since the value
of bb
and b
at each stage of the iteration depends on the values from
previous iterations. In situations like this sometimes Cython can be used to
improve the performance of the loops.
Here is a Cython-ized version of fitpack_basis
, which runs as fast as
bspline_basis. The main ideas
used to boost speed using Cython is to declare the type of every variable, and
rewrite all uses of NumPy fancy indexing as loops using plain integer indexing.
See this
page for
instructions on how to build the code and run it from python.
import numpy as np
cimport numpy as np
cimport cython
ctypedef np.float64_t DTYPE_f
ctypedef np.int64_t DTYPE_i
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def cython_fitpack_basis(int c, int n=100, int d=3,
double rMinOffset=0, double rMaxOffset=0):
""" fitpack's spline basis function
c = number of control points.
n = number of points on the curve.
d = curve degree
"""
cdef Py_ssize_t i, j, k, l
cdef double f
# Create knot vector
cdef np.ndarray[DTYPE_i, ndim=1] kv = np.array(
[0]*d + range(c-d+1) + [c-d]*d, dtype=np.int64)
# Create sample range
cdef np.ndarray[DTYPE_f, ndim=1] u = np.linspace(
rMinOffset, rMaxOffset + c - d, n)
# basis
cdef np.ndarray[DTYPE_f, ndim=2] b = np.zeros((n,c))
# basis buffer
cdef np.ndarray[DTYPE_f, ndim=2] bb = np.zeros((n,c))
# left knot vector indices
cdef np.ndarray[DTYPE_i, ndim=1] left = np.clip(np.floor(u), 0, c-d-1).astype(np.int64)
# right knot vector indices
cdef np.ndarray[DTYPE_i, ndim=1] right = left+d+1
for k in range(n):
b[k, left[k]] = 1.0
for j in range(1, d+1):
for l in range(j):
for k in range(n):
bb[k, left[k] + l] = b[k, left[k] + l]
b[k, left[k]] = 0.0
for i in range(j):
for k in range(n):
f = bb[k, left[k]+i] / (kv[right[k]+i] - kv[right[k]+i-j])
b[k, left[k]+i] = b[k, left[k]+i] + f * (kv[right[k]+i] - u[k])
b[k, left[k]+i+1] = f * (u[k] - kv[right[k]+i-j])
return b
Using this timeit code to benchmark it's performance,
import timeit
import numpy as np
import cython_bspline as CB
import numpy_bspline as NB
c = 6
n = 10**5
d = 3
fb = NB.fitpack_basis(c, n, d)
bb = NB.bspline_basis(c, n, d)
cfb = CB.cython_fitpack_basis(c,n,d)
assert np.allclose(bb, fb)
assert np.allclose(cfb, fb)
# print(NB.fitpack_basis(c,5,d,rMinOffset=-0.1,rMaxOffset=.2))
timing = dict()
timing['NB.fitpack_basis'] = timeit.timeit(
stmt='NB.fitpack_basis(c, n, d)',
setup='from __main__ import NB, c, n, d',
number=10)
timing['NB.bspline_basis'] = timeit.timeit(
stmt='NB.bspline_basis(c, n, d)',
setup='from __main__ import NB, c, n, d',
number=10)
timing['CB.cython_fitpack_basis'] = timeit.timeit(
stmt='CB.cython_fitpack_basis(c, n, d)',
setup='from __main__ import CB, c, n, d',
number=10)
for func_name, t in timing.items():
print "{:>25}: {:.4f}".format(func_name, t)
it appears Cython can make the fitpack_basis
code run as fast as (and perhaps a bit faster) than bspline_basis:
NB.bspline_basis: 0.3322
CB.cython_fitpack_basis: 0.2939
NB.fitpack_basis: 0.9182
回答2:
Here is a version of coxDeBoor
which is (on my machine) 840x faster than the original.
In [102]: %timeit basis(len(cv), 10**5, degree)
1 loops, best of 3: 24.5 s per loop
In [104]: %timeit bspline_basis(len(cv), 10**5, degree)
10 loops, best of 3: 29.1 ms per loop
import numpy as np
import scipy.interpolate as si
def scipy_bspline(cv, n, degree):
""" bspline basis function
c = list of control points.
n = number of points on the curve.
degree = curve degree
"""
# Create a range of u values
c = len(cv)
kv = np.array(
[0] * degree + range(c - degree + 1) + [c - degree] * degree, dtype='int')
u = np.linspace(0, c - degree, n)
# Calculate result
arange = np.arange(n)
points = np.zeros((n, cv.shape[1]))
for i in xrange(cv.shape[1]):
points[arange, i] = si.splev(u, (kv, cv[:, i], degree))
return points
def memo(f):
# Peter Norvig's
"""Memoize the return value for each call to f(args).
Then when called again with same args, we can just look it up."""
cache = {}
def _f(*args):
try:
return cache[args]
except KeyError:
cache[args] = result = f(*args)
return result
except TypeError:
# some element of args can't be a dict key
return f(*args)
_f.cache = cache
return _f
def bspline_basis(c, n, degree):
""" bspline basis function
c = number of control points.
n = number of points on the curve.
degree = curve degree
"""
# Create knot vector and a range of samples on the curve
kv = np.array([0] * degree + range(c - degree + 1) +
[c - degree] * degree, dtype='int') # knot vector
u = np.linspace(0, c - degree, n) # samples range
# Cox - DeBoor recursive function to calculate basis
@memo
def coxDeBoor(k, d):
# Test for end conditions
if (d == 0):
return ((u - kv[k] >= 0) & (u - kv[k + 1] < 0)).astype(int)
denom1 = kv[k + d] - kv[k]
term1 = 0
if denom1 > 0:
term1 = ((u - kv[k]) / denom1) * coxDeBoor(k, d - 1)
denom2 = kv[k + d + 1] - kv[k + 1]
term2 = 0
if denom2 > 0:
term2 = ((-(u - kv[k + d + 1]) / denom2) * coxDeBoor(k + 1, d - 1))
return term1 + term2
# Compute basis for each point
b = np.column_stack([coxDeBoor(k, degree) for k in xrange(c)])
b[n - 1][-1] = 1
return b
# Control points
cv = np.array([[50., 25., 0.],
[59., 12., 0.],
[50., 10., 0.],
[57., 2., 0.],
[40., 4., 0.],
[40., 14., 0.]])
n = 10 ** 6
degree = 3 # Curve degree
points_scipy = scipy_bspline(cv, n, degree)
b = bspline_basis(len(cv), n, degree)
points_basis = np.dot(b, cv)
print np.allclose(points_basis, points_scipy)
The majority of the speedup is achieved by making coxDeBoor compute a vector of
results instead of a single value at a time. Notice that u
is removed from the
arguments passed to coxDeBoor
. Instead, the new coxDeBoor(k, d)
calculates
what was before np.array([coxDeBoor(u[i], k, d) for i in xrange(n)])
.
Since NumPy array arithmetic has the same syntax as scalar arithmetic, very
little code needed to change. The only syntactic change was in the end
condition:
if (d == 0):
return ((u - kv[k] >= 0) & (u - kv[k + 1] < 0)).astype(int)
(u - kv[k] >= 0)
and (u - kv[k + 1] < 0)
are boolean arrays. astype
changes the array values to 0 and 1. So whereas before a single 0 or 1 was
returned, now a whole array of 0s and 1s are returned -- one for each value in
u
.
Memoization can also improve performance, since the recurrence relations
causes coxDeBoor(k, d)
to be called for the same values of k
and d
more than once. The decorator syntax
@memo
def coxDeBoor(k, d):
...
is equivalent to
def coxDeBoor(k, d):
...
coxDeBoor = memo(coxDeBoor)
and the memo
decorator causes coxDeBoor
to record in a cache
a mapping
from (k, d)
pairs of arguments to returned values. If coxDeBoor(k, d)
is
called again, then value from the cache
will be returned instead of
re-computing the value.
scipy_bspline
is still faster, but at least bspline_basis
plus np.dot
is in the ballpark,
and may be useful if you want to re-use b
with many control points, cv
.
In [109]: %timeit scipy_bspline(cv, 10**5, degree)
100 loops, best of 3: 17.2 ms per loop
In [104]: %timeit b = bspline_basis(len(cv), 10**5, degree)
10 loops, best of 3: 29.1 ms per loop
In [111]: %timeit points_basis = np.dot(b, cv)
100 loops, best of 3: 2.46 ms per loop