Efficient matching of two arrays (how to use KDTre

2019-02-19 10:57发布

I have two 2d arrays, obs1 and obs2. They represent two independent measurement series, and both have dim0 = 2, and slightly different dim1, say obs1.shape = (2, 250000), and obs2.shape = (2, 250050). obs1[0] and obs2[0] signify time, and obs1[1] and obs2[1] signify some spatial coordinate. Both arrays are (more or less) sorted by time. The times and coordinates should be identical between the two measurement series, but in reality they aren't. Also, not each measurement from obs1 has a corresponding value in obs2 and vice-versa. Another problem is that there might be a slight offset in the times.

I'm looking for an efficient algorithm to associate the best matching value from obs2 to each measurement in obs1. Currently, I do it like this:

define dt = some_maximum_time_difference
define dx = 3
j = 0
i = 0
matchresults = np.empty(obs1.shape[1])
for j in obs1.shape[1]:
    while obs1[0, j] - obs2[0, j] < dt:
        i += 1
    matchresults[j] = i - dx + argmin(abs(obs1[1, i] - obs2[1, i-dx:i+dx+1]))

This yields good results. However, it is extremely slow, running in a loop.

I would be very thankful for ideas on how to improve this algorithm speed-wise, e.g. using KDtree or something similar.

标签： python numpy pandas scipy kdtree

1条回答

走好不送

2楼-- · 2019-02-19 11:23

Using cKDTree for this case would look like:

from scipy.spatial import cKDTree

obs2 = array with shape (2, m)
obs1 = array with shape (2, n)

kdt = cKDTree(obs2.T)
dist, indices = kdt.query(obs1.T)

where indices will contain the column indices in obs2 corresponding to each observation in obs1. Note that I had to transpose obs1 and obs2.

0人赞添加讨论(0) 举报

Efficient matching of two arrays (how to use KDTre

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间