How to cluster Latitude and longitude data in pyth

2019-07-29 06:20发布

问题:

I have a Latitude and Longitude data of size (34000 * 2) in pandas df

df =

Index       Latitude            Longitude
0           66.36031097267725   23.714807357485936
1           66.36030099322495   23.71479548193769
2
.
.
.
.
34000       66.27918383581169   23.568631229948359

Important Note : The above Lat & Long route has been covered twice which means if I cover the route only once, then my Latitude and Longitude data will be of size (34000/2, 2) for example.

Problem

I just want Lat and Long Data for a particular selected area. So i filtered using the starting and ending Lat and Long points in my df. On doing that, the another part of the area also selected. (See picture below after filtering)

Requirement

How to remove the additional area ? I am sure there will be some easy approach for this problem. Note : The Lat & Long data after filtering also it covered twice.

Filtered

def apply_geofence_on_data(interpolated_data, min_latitude=66.27832887852133, max_latitude=66.37098470528755, min_longitude=23.568626549485927,
                               max_longitude=23.71481685393929):

    interpolated_data = interpolated_data[interpolated_data['Latitude'] > min_latitude]
    interpolated_data = interpolated_data[interpolated_data['Latitude'] < max_latitude]
    interpolated_data = interpolated_data[interpolated_data['Longitude'] < max_longitude]
    interpolated_data = interpolated_data[interpolated_data['Longitude'] > min_longitude]

    return interpolated_data

回答1:

here a solution to test: the idea is to trap all points above the line. you choose the value of P to select the right line.

from random import uniform
import matplotlib.pyplot as plt

def newpoint(lon_min = -180.0, lon_max = 180.0, lat_min = -90.0, lat_max = 90.0 ):#long,lat
    return uniform(lon_min, lon_max), uniform(lat_min, lat_max)

lon_min = 23.568626549485927; lon_max = 23.71481685393929
lat_min = 66.27832887852133; lat_max = 66.37098470528755
p = 0.25 # i have taken this value for sample, for your case i think a value nearer from 0.75

# i generate 10 points for sample
n=10
points = (newpoint(lon_min, lon_max, lat_min, lat_max) for x in range(n))
points = [x for x in points]
Lon = [x for x,y in points]
Lat = [x for y,x in points]
df = pd.DataFrame({'Lat': Lat, 'Lon': Lon})
print(df)

#equation of the line using points A and B -> y=m*x + z 
m = (lat_max - lat_min)/(lon_max - lon_min)
z = lat_min - m * (lon_min + p * (lon_max - lon_min))
xa = lon_min + p * (lon_max - lon_min)
xb = lon_max

#you could uncomment to display result 
#df['calcul'] = df['Lon'] * m + z

#select only points above the line
df = df[df['Lon'] * m + z < df['Lat']]
print(df)

#plot to show result
plt.plot([xa, xb] , [m * xa + z, m * xb + z])
plt.plot(df.Lon, df.Lat, 'ro')
plt.show()

inital ouput:

         Lat        Lon
0  66.343486  23.674008
1  66.281614  23.678554
2  66.359215  23.637975
3  66.303976  23.659128
4  66.302640  23.589577
5  66.313877  23.634785
6  66.309733  23.683281
7  66.365582  23.667262
8  66.344611  23.688108
9  66.352028  23.673376


final result: points index 1, 3 and 6 have been put off (they are below the line)

         Lat        Lon
0  66.343486  23.674008
2  66.359215  23.637975
4  66.302640  23.589577
5  66.313877  23.634785
7  66.365582  23.667262
8  66.344611  23.688108
9  66.352028  23.673376