How to cluster Latitude and longitude data in pyth

2019-07-29 06:12发布

I have a Latitude and Longitude data of size (34000 * 2) in pandas df

df =

Index       Latitude            Longitude
0           66.36031097267725   23.714807357485936
1           66.36030099322495   23.71479548193769
2
.
.
.
.
34000       66.27918383581169   23.568631229948359

Important Note : The above Lat & Long route has been covered twice which means if I cover the route only once, then my Latitude and Longitude data will be of size (34000/2, 2) for example.

Problem

I just want Lat and Long Data for a particular selected area. So i filtered using the starting and ending Lat and Long points in my df. On doing that, the another part of the area also selected. (See picture below after filtering)enter image description here

Requirement

How to remove the additional area ? I am sure there will be some easy approach for this problem. Note : The Lat & Long data after filtering also it covered twice.

Filtered

def apply_geofence_on_data(interpolated_data, min_latitude=66.27832887852133, max_latitude=66.37098470528755, min_longitude=23.568626549485927,
                               max_longitude=23.71481685393929):

    interpolated_data = interpolated_data[interpolated_data['Latitude'] > min_latitude]
    interpolated_data = interpolated_data[interpolated_data['Latitude'] < max_latitude]
    interpolated_data = interpolated_data[interpolated_data['Longitude'] < max_longitude]
    interpolated_data = interpolated_data[interpolated_data['Longitude'] > min_longitude]

    return interpolated_data

1条回答
ら.Afraid
2楼-- · 2019-07-29 06:36

here a solution to test: the idea is to trap all points above the line. you choose the value of P to select the right line.

enter image description here

from random import uniform
import matplotlib.pyplot as plt

def newpoint(lon_min = -180.0, lon_max = 180.0, lat_min = -90.0, lat_max = 90.0 ):#long,lat
    return uniform(lon_min, lon_max), uniform(lat_min, lat_max)

lon_min = 23.568626549485927; lon_max = 23.71481685393929
lat_min = 66.27832887852133; lat_max = 66.37098470528755
p = 0.25 # i have taken this value for sample, for your case i think a value nearer from 0.75

# i generate 10 points for sample
n=10
points = (newpoint(lon_min, lon_max, lat_min, lat_max) for x in range(n))
points = [x for x in points]
Lon = [x for x,y in points]
Lat = [x for y,x in points]
df = pd.DataFrame({'Lat': Lat, 'Lon': Lon})
print(df)

#equation of the line using points A and B -> y=m*x + z 
m = (lat_max - lat_min)/(lon_max - lon_min)
z = lat_min - m * (lon_min + p * (lon_max - lon_min))
xa = lon_min + p * (lon_max - lon_min)
xb = lon_max

#you could uncomment to display result 
#df['calcul'] = df['Lon'] * m + z

#select only points above the line
df = df[df['Lon'] * m + z < df['Lat']]
print(df)

#plot to show result
plt.plot([xa, xb] , [m * xa + z, m * xb + z])
plt.plot(df.Lon, df.Lat, 'ro')
plt.show()

inital ouput:

         Lat        Lon
0  66.343486  23.674008
1  66.281614  23.678554
2  66.359215  23.637975
3  66.303976  23.659128
4  66.302640  23.589577
5  66.313877  23.634785
6  66.309733  23.683281
7  66.365582  23.667262
8  66.344611  23.688108
9  66.352028  23.673376

enter image description here


final result: points index 1, 3 and 6 have been put off (they are below the line)

         Lat        Lon
0  66.343486  23.674008
2  66.359215  23.637975
4  66.302640  23.589577
5  66.313877  23.634785
7  66.365582  23.667262
8  66.344611  23.688108
9  66.352028  23.673376

enter image description here

查看更多
登录 后发表回答