So, continuing from the discussion @TheBlackCat and I were having in this answer, I would like to know the best way to pass arguments to a Numpy vectorized function. The function in question is defined thus:
vect_dist_funct = np.vectorize(lambda p1, p2: vincenty(p1, p2).meters)
where, vincenty
comes from the Geopy package.
I currently call vect_dist_funct
in this manner:
def pointer(point, centroid, tree_idx):
intersect = list(tree_idx.intersection(point))
if len(intersect) > 0:
points = pd.Series([point]*len(intersect)).values
polygons = centroid.loc[intersect].values
dist = vect_dist_funct(points, polygons)
return pd.Series(dist, index=intercept, name='Dist').sort_values()
else:
return pd.Series(np.nan, index=[0], name='Dist')
points['geometry'].apply(lambda x: pointer(point=x.coords[0], centroid=line['centroid'], tree_idx=tree_idx))
(Please refer to the question here: Labelled datatypes Python)
My question pertains to what happens inside the function pointer
. The reason I am converting points
to a pandas.Series
and then getting the values (in the 4th line, just under the if
statement) is to make it in the same shape as polygons. If I merely call points either as points = [point]*len(intersect)
or as points = itertools.repeat(point, len(intersect))
, Numpy complains that it "cannot broadcast arrays of size (n,2) and size (n,) together" (n is the length of intersect
).
If I call vect_dist_funct
like so: dist = vect_dist_funct(itertools.repeat(points, len(intersect)), polygons)
, vincenty
complains that I have passed it too many arguments. I am at a complete loss to understand the difference between the two.
Note that these are coordinates, therefore will always be in pairs. Here are examples of how point
and polygons
look like:
point = (-104.950752 39.854744) # Passed directly to the function like this.
polygons = array([(-104.21750802451864, 37.84052458697633),
(-105.01017084789603, 39.82012158954065),
(-105.03965315742742, 40.669867471420886),
(-104.90353460825702, 39.837631505433706),
(-104.8650601872832, 39.870796282334744)], dtype=object)
# As returned by statement centroid.loc[intersect].values
What is the best way to call vect_dist_funct
in this circumstance, such that I can have a vectorized call, and both Numpy and vincenty will not complain that I am passing wrong arguments? Also, techniques that result in minimum memory consumption, and increased speed are sought. The goal is to compute distance between the point to each polygon centroid.