I have a list of unsorted points:
List = [(-50.6261, 74.3683), (-63.2489, 75.0038), (-76.0384, 75.6219), (-79.8451, 75.7855), (-30.9626, 168.085), (-27.381, 170.967), (-22.9191, 172.928), (-16.5869, 173.087), (-4.813, 172.505), (-109.056, 92.0063), (-96.0705, 91.4232), (-83.255, 90.8563), (-80.7807, 90.7498), (-54.1694, 89.5087), (-41.6419, 88.9191), (-32.527, 88.7737), (-27.6403, 91.0134), (-22.3035, 95.141), (-18.0168, 100.473), (-15.3918, 105.542), (-13.6401, 112.373), (-13.3475, 118.988), (-14.4509, 125.238), (-17.1246, 131.895), (-21.6766, 139.821), (-28.5735, 149.98), (-33.395, 156.344), (-114.702, 83.9644), (-114.964, 87.4599), (-114.328, 89.8325), (-112.314, 91.6144), (-109.546, 92.0209), (-67.9644, 90.179), (-55.2013, 89.5624), (-34.4271, 158.876), (-34.6987, 161.896), (-33.6055, 164.993), (-87.0365, 75.9683), (-99.8007, 76.0889), (-105.291, 76.5448), (-109.558, 77.3525), (-112.516, 79.2509), (-113.972, 81.3335), (2.30014, 171.635), (4.40918, 169.691), (5.07165, 166.974), (5.34843, 163.817), (5.30879, 161.798), (-29.6746, 73.5082), (-42.5876, 74.0206)]
I want to sort those points to have a continuous curve passing by every point just once, starting from start = (-29.6746, 73.5082)
and end = (5.30879, 161.798)
This is what I tried so far:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import NearestNeighbors
import networkx as nx
for el in List:
X.append(el[0])
Y.append(el[1])
x = np.array(X)
y = np.array(Y)
points = np.c_[x, y]
# find 2 nearest neighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# indexes of the new order
order = list(nx.dfs_preorder_nodes(T, 0))
# sorted arrays
new_x = x[order]
new_y = y[order]
plt.plot(new_x, new_y)
plt.show()
But I still get an unsorted list, and I couldn't find a way to determine the start point and end point.
We can see the problem as a Traveling salesman problem, that we can optimize by looking for the nearest point
def distance(P1, P2):
"""
This function computes the distance between 2 points defined by
P1 = (x1,y1) and P2 = (x2,y2)
"""
return ((P1[0] - P2[0])**2 + (P1[1] - P2[1])**2) ** 0.5
def optimized_path(coords, start=None):
"""
This function finds the nearest point to a point
coords should be a list in this format coords = [ [x1, y1], [x2, y2] , ...]
"""
if start is None:
start = coords[0]
pass_by = coords
path = [start]
pass_by.remove(start)
while pass_by:
nearest = min(pass_by, key=lambda x: distance(path[-1], x))
path.append(nearest)
pass_by.remove(nearest)
return path
# define a start point
start = [x0, y0]
path = optimized_path(List,start)
Not an answer, but too much for a comment
I plotted the data points as scatter and line
I see a visually smooth (low order local derivatve spline curve) with ~10% points 'out of order'
Is this typical of the problem?, is the data mostly in order?
How general or specific does the code have to be
I don't know the "big hammer" libs, but cleaned up the surounding code and did the same plot
List = [(-50.6261, 74.3683), (-63.2489, 75.0038), (-76.0384, 75.6219), (-79.8451, 75.7855), (-30.9626, 168.085), (-27.381, 170.967), (-22.9191, 172.928), (-16.5869, 173.087), (-4.813, 172.505), (-109.056, 92.0063), (-96.0705, 91.4232), (-83.255, 90.8563), (-80.7807, 90.7498), (-54.1694, 89.5087), (-41.6419, 88.9191), (-32.527, 88.7737), (-27.6403, 91.0134), (-22.3035, 95.141), (-18.0168, 100.473), (-15.3918, 105.542), (-13.6401, 112.373), (-13.3475, 118.988), (-14.4509, 125.238), (-17.1246, 131.895), (-21.6766, 139.821), (-28.5735, 149.98), (-33.395, 156.344), (-114.702, 83.9644), (-114.964, 87.4599), (-114.328, 89.8325), (-112.314, 91.6144), (-109.546, 92.0209), (-67.9644, 90.179), (-55.2013, 89.5624), (-34.4271, 158.876), (-34.6987, 161.896), (-33.6055, 164.993), (-87.0365, 75.9683), (-99.8007, 76.0889), (-105.291, 76.5448), (-109.558, 77.3525), (-112.516, 79.2509), (-113.972, 81.3335), (2.30014, 171.635), (4.40918, 169.691), (5.07165, 166.974), (5.34843, 163.817), (5.30879, 161.798), (-29.6746, 73.5082), (-42.5876, 74.0206)]
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import NearestNeighbors
import networkx as nx
points = np.asarray(List)
# find 2 nearest neighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# indexes of the new order
order = list(nx.dfs_preorder_nodes(T, 0))
# sorted arrays
new_points = points[order]
plt.scatter(*zip(*points))
plt.plot(*zip(*new_points), 'r')
plt.show()