I have a Latitude and Longitude data of size (34000 * 2) in pandas df
df =
Index Latitude Longitude
0 66.36031097267725 23.714807357485936
1 66.36030099322495 23.71479548193769
2
.
.
.
.
34000 66.27918383581169 23.568631229948359
Important Note : The above Lat & Long route has been covered twice which means if I cover the route only once, then my Latitude and Longitude data will be of size (34000/2, 2) for example.
Problem
I just want Lat and Long Data for a particular selected area. So i filtered using the starting and ending Lat and Long points in my df. On doing that, the another part of the area also selected. (See picture below after filtering)
Requirement
How to remove the additional area ? I am sure there will be some easy approach for this problem.
Note : The Lat & Long data after filtering also it covered twice.
Filtered
def apply_geofence_on_data(interpolated_data, min_latitude=66.27832887852133, max_latitude=66.37098470528755, min_longitude=23.568626549485927,
max_longitude=23.71481685393929):
interpolated_data = interpolated_data[interpolated_data['Latitude'] > min_latitude]
interpolated_data = interpolated_data[interpolated_data['Latitude'] < max_latitude]
interpolated_data = interpolated_data[interpolated_data['Longitude'] < max_longitude]
interpolated_data = interpolated_data[interpolated_data['Longitude'] > min_longitude]
return interpolated_data
here a solution to test: the idea is to trap all points above the line. you choose the value of P to select the right line.
from random import uniform
import matplotlib.pyplot as plt
def newpoint(lon_min = -180.0, lon_max = 180.0, lat_min = -90.0, lat_max = 90.0 ):#long,lat
return uniform(lon_min, lon_max), uniform(lat_min, lat_max)
lon_min = 23.568626549485927; lon_max = 23.71481685393929
lat_min = 66.27832887852133; lat_max = 66.37098470528755
p = 0.25 # i have taken this value for sample, for your case i think a value nearer from 0.75
# i generate 10 points for sample
n=10
points = (newpoint(lon_min, lon_max, lat_min, lat_max) for x in range(n))
points = [x for x in points]
Lon = [x for x,y in points]
Lat = [x for y,x in points]
df = pd.DataFrame({'Lat': Lat, 'Lon': Lon})
print(df)
#equation of the line using points A and B -> y=m*x + z
m = (lat_max - lat_min)/(lon_max - lon_min)
z = lat_min - m * (lon_min + p * (lon_max - lon_min))
xa = lon_min + p * (lon_max - lon_min)
xb = lon_max
#you could uncomment to display result
#df['calcul'] = df['Lon'] * m + z
#select only points above the line
df = df[df['Lon'] * m + z < df['Lat']]
print(df)
#plot to show result
plt.plot([xa, xb] , [m * xa + z, m * xb + z])
plt.plot(df.Lon, df.Lat, 'ro')
plt.show()
inital ouput:
Lat Lon
0 66.343486 23.674008
1 66.281614 23.678554
2 66.359215 23.637975
3 66.303976 23.659128
4 66.302640 23.589577
5 66.313877 23.634785
6 66.309733 23.683281
7 66.365582 23.667262
8 66.344611 23.688108
9 66.352028 23.673376
final result: points index 1, 3 and 6 have been put off (they are below the line)
Lat Lon
0 66.343486 23.674008
2 66.359215 23.637975
4 66.302640 23.589577
5 66.313877 23.634785
7 66.365582 23.667262
8 66.344611 23.688108
9 66.352028 23.673376