可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have trajectory data, where each trajectory consists of a sequence of coordinates(x, y points) and each trajectory is identified by a unique ID.

These trajectories are in x - y plane, and I want to divide the whole plane into equal sized grid (square grid). This grid is obviously invisible but is used to divide trajectories into sub-segments. Whenever a trajectory intersects with a grid line, it is segmented there and becomes a new sub-trajectory with new_id.

I have included a simple handmade graph to make clear what I am expecting.

It can be seen how the trajectory is divided at the intersections of the grid lines, and each of these segments has new unique id.

I am working on Python, and seek some python implementation links, suggestions, algorithms, or even a pseudocode for the same.

Please let me know if anything is unclear.

UPDATE

In order to divide the plane into grid , cell indexing is done as following:

#finding cell id for each coordinate
#cellid = (coord / cellSize).astype(int)
cellid = (coord / 0.5).astype(int)
cellid
Out[] : array([[1, 1],
              [3, 1],
              [4, 2],
              [4, 4],
              [5, 5],
              [6, 5]])
#Getting x-cell id and y-cell id separately 
x_cellid = cellid[:,0]
y_cellid = cellid[:,1]

#finding total number of cells
xmax = df.xcoord.max()
xmin = df.xcoord.min()
ymax = df.ycoord.max()
ymin = df.ycoord.min()
no_of_xcells = math.floor((xmax-xmin)/ 0.5)
no_of_ycells = math.floor((ymax-ymin)/ 0.5)
total_cells = no_of_xcells * no_of_ycells
total_cells
Out[] : 25

Since the plane is now divided into 25 cells each with a cellid. In order to find intersections, maybe I could check the next coordinate in the trajectory, if the cellid remains the same, then that segment of the trajectory is in the same cell and has no intersection with grid. Say, if x_cellid[2] is greater than x_cellid[0], then segment intersects vertical grid lines. Though, I am still unsure how to find the intersections with the grid lines and segment the trajectory on intersections giving them new id.

回答1:

This can be solved by shapely:

%matplotlib inline
import pylab as pl
from shapely.geometry import MultiLineString, LineString
import numpy as np
from matplotlib.collections import LineCollection

x0, y0, x1, y1 = -10, -10, 10, 10
n = 11

lines = []
for x in np.linspace(x0, x1, n):
    lines.append(((x, y0), (x, y1)))

for y in np.linspace(y0, y1, n):
    lines.append(((x0, y), (x1, y)))

grid = MultiLineString(lines)

x = np.linspace(-9, 9, 200)
y = np.sin(x)*x
line = LineString(np.c_[x, y])

fig, ax = pl.subplots()
for i, segment in enumerate(line.difference(grid)):
    x, y = segment.xy
    pl.plot(x, y)
    pl.text(np.mean(x), np.mean(y), str(i))

lc = LineCollection(lines, color="gray", lw=1, alpha=0.5)
ax.add_collection(lc);

The result:

To not use shapely, and do it yourself:

import pylab as pl
import numpy as np
from matplotlib.collections import LineCollection

x0, y0, x1, y1 = -10, -10, 10, 10
n = 11
xgrid = np.linspace(x0, x1, n)
ygrid = np.linspace(y0, y1, n)
x = np.linspace(-9, 9, 200)
y = np.sin(x)*x
t = np.arange(len(x))

idx_grid, idx_t = np.where((xgrid[:, None] - x[None, :-1]) * (xgrid[:, None] - x[None, 1:]) <= 0)
tx = idx_t + (xgrid[idx_grid] - x[idx_t]) / (x[idx_t+1] - x[idx_t])

idx_grid, idx_t = np.where((ygrid[:, None] - y[None, :-1]) * (ygrid[:, None] - y[None, 1:]) <= 0)
ty = idx_t + (ygrid[idx_grid] - y[idx_t]) / (y[idx_t+1] - y[idx_t])

t2 = np.sort(np.r_[t, tx, tx, ty, ty])

x2 = np.interp(t2, t, x)
y2 = np.interp(t2, t, y)

loc = np.where(np.diff(t2) == 0)[0] + 1

xlist = np.split(x2, loc)
ylist = np.split(y2, loc)


fig, ax = pl.subplots()
for i, (xp, yp) in enumerate(zip(xlist, ylist)):
    pl.plot(xp, yp)
    pl.text(np.mean(xp), np.mean(yp), str(i))


lines = []
for x in np.linspace(x0, x1, n):
    lines.append(((x, y0), (x, y1)))

for y in np.linspace(y0, y1, n):
    lines.append(((x0, y), (x1, y)))

lc = LineCollection(lines, color="gray", lw=1, alpha=0.5)
ax.add_collection(lc);

回答2:

You're asking a lot. You should attack most of the design and coding yourself, once you have a general approach. Algorithm identification is reasonable for Stack Overflow; asking for design and reference links is not.

I suggest that you put the point coordinates into a list. use the NumPy and SciKit capabilities to interpolate the grid intersections. You can store segments in a list (of whatever defines a segment in your data design). Consider making a dictionary that allows you to retrieve the segments by grid coordinates. For instance, if segments are denoted only by the endpoints, and points are a class of yours, you might have something like this, using the lower-left corner of each square as its defining point:

grid_seg = {
    (0.5, 0.5): [p0, p1],
    (1.0, 0.5): [p1, p2],
    (1.0, 1.0): [p2, p3],
    ...
}

where p0, p1, etc. are the interpolated crossing points.

回答3:

Each trajectory is composed of a series of straight line segments. You therefore need a routine to break each line segment into sections that lie completely within a grid cell. The basis for such a routine would be the Digital Differential Analyzer (DDA) algorithm, though you'll need to modify the basic algorithm since you need endpoints of the line within each cell, not just which cells are visited.

A couple of things you have to be careful of:

1) If you're working with floating point numbers, beware of rounding errors in the calculation of the step values, as these can cause the algorithm to fail. For this reason many people choose to convert to an integer grid, obviously with a loss of precision. This is a good discussion of the issues, with some working code (though not python).

2) You'll need to decide which of the 4 grid lines surrounding a cell belong to the cell. One convention would be to use the bottom and left edges. You can see the issue if you consider a horizontal line segment that falls on a grid line - does its segments belong to the cell above or the cell below?

Cheers

回答4:

data = list of list of coordinates
For point_id, point_coord in enumerate(point_coord_list):
   if current point & last point stayed in same cell:
        append point's index to last list of data
   else:
        append a new empty list to data
        interpolate the two points and add a new point
        that is on the grid lines.

Data stores all trajectories. Each list within the data is a trajectory.

The cell index along x and y axes (x_cell_id, y_cell_id) can be found by dividing coordinate of point by dimension of cell, then round to integer. If the cell indices of current point are same as that of last points, then these two points are in the same cell. list is good for inserting new points but it is not as memory efficient as arrays.

Might be a good idea to create a class for trajectory. Or use a memory buffer and sparse data structure instead of list and list and an array for the x-y coordinates if the list of coordinates wastes too much memory. Inserting new points into array is slow, so we can use another array for new points.

Warning: I haven't thought too much about the things below. It probably has bugs, and someone needs to fill in the gaps.

# coord       n x 2 numpy array. 
#             columns 0, 1 are x and y coordinate. 
#             row n is for point n
# cell_size   length of one side of the square cell.
# n_ycells    number of cells along the y axis

import numpy as np
cell_id_2d = (coord / cell_size).astype(int)
x_cell_id = cell_id_2d[:,0]
y_cell_id = cell_id_2d[:,1]
cell_id_1d = x_cell_id + y_cell_id*n_x_cells

# if the trajectory exits a cell, its cell id changes
# and the delta_cell_id is not zero.
delta_cell_id = cell_id_1d[1:] - cell_id_1d[:-1]

# The nth trajectory should contains the points from
# the (crossing_id[n])th to the (crossing_id[n + 1] - 1)th
w = np.where(delta_cell_id != 0)[0]
crossing_ids = np.empty(w.size + 1)
crossing_ids[1:] = w
crossing_ids[0] = 0

# need to interpolate when the trajectory cross cell boundary.
# probably can replace this loop with numpy functions/indexing
new_points = np.empty((w.size, 2))
for i in range(1, n):
    st = coord[crossing_ids[i]]
    en = coord[crossing_ids[i+1]]
    # 1. check which boundary of the cell is crossed
    # 2. interpolate
    # 3. put points into new_points

# Each trajectory contains some points from coord array and 2 points 
# in the new_points array.

For retrieval, make a sparse array that contains the index of the starting point in the coord array.

Linear interpolation can look bad if the cell size is large.

Further explanation:

Description of the grid

For n_xcells = 4, n_ycells = 3, the grid is:

   0   1   2   3   4
0 [  ][  ][  ][  ][  ]
1 [  ][  ][  ][* ][  ]
2 [  ][  ][  ][  ][  ]

[* ] has an x_index of 3 and a y_index of 1.

There are (n_x_cells * n_y_cells) cells in the grid.

Relationship between point and cell

The cell that contains the i^th point of the trajectory has an x_index of x_cell_id[i] and a y_index of x_cell_id[i]. I get this by discretization through dividing the xy-coordinates of the points by the length of the cell and then truncate to integers.

The cell_id_1d of the cells are the number in [  ]

   0   1   2   3   4
0 [0 ][1 ][2 ][3 ][4 ]
1 [5 ][6 ][7 ][8 ][9 ]
2 [10][11][12][13][14]

cell_id_1d[i] = x_cell_id[i] + y_cell_id[i]*n_x_cells

I converted the pair of cell indices (x_cell_id[i], y_cell_id[i]) for the i^th point to a single index called cell_id_1d.

How to find if trajectory exit a cell at the i^th point

Now, the i^th and (i + 1)^th points are in same cell, if and only if (x_cell_id[i], y_cell_id[i]) == (x_cell_id[i + 1], y_cell_id[i + 1]) and also cell_id_1d[i] == cell_id[i + 1], and cell_id[i + 1] - cell_id[i] == 0. delta_cell_ids[i] = cell_id_1d[i + 1] - cell_id[i], which is zero if and only the i^th and (i + 1)^th points are in the same cell.