I have a code for sequentially whether every pair of cartesian coordinates found in my DataFrame
fall into certain geometric enclosed areas. But it is rather slow, I suspect because it is not vectorized. Here is an example:
from matplotlib.patches import Rectangle
r1 = Rectangle((0,0), 10, 10)
r2 = Rectangle((50,50), 10, 10)
df = pd.DataFrame([[1,2],[-1,5], [51,52]], columns=['x', 'y'])
for j in range(df.shape[0]):
coordinates = df.x.iloc[j], df.y.iloc[j]
if r1.contains_point(coordinates):
df['location'].iloc[j] = 0
else r2.contains_point(coordinates):
df['location'].iloc[j] = 1
Can someone propose an approach for speed-up?
It's better to convert the rectangular patches into an array and work on it after deducing the extent to which they are spread out.
For the given sample the function outputs:
benchmarks:
testing on a
DF
of 10K rows:So, the vectorized approach is approximately 2200 times faster compared to the loopy one.