Given
x = [5, 30, 58, 72]
y = [8, 35, 53, 60, 66, 67, 68, 73]
The goal is to iterate through x_i
and find the value for y
that's larger than x_i
but not larger than x_i+1
Assume that both list are sorted and all items are unique, the desired output given the x
and y
is:
[(5, 8), (30, 35), (58, 60), (72, 73)]
I've tried:
def per_window(sequence, n=1):
"""
From http://stackoverflow.com/q/42220614/610569
>>> list(per_window([1,2,3,4], n=2))
[(1, 2), (2, 3), (3, 4)]
>>> list(per_window([1,2,3,4], n=3))
[(1, 2, 3), (2, 3, 4)]
"""
start, stop = 0, n
seq = list(sequence)
while stop <= len(seq):
yield tuple(seq[start:stop])
start += 1
stop += 1
x = [5, 30, 58, 72]
y = [8, 35, 53, 60, 66, 67, 68, 73]
r = []
for xi, xiplus1 in per_window(x, 2):
for j, yj in enumerate(y):
if yj > xi and yj < xiplus1:
r.append((xi, yj))
break
# For the last x value.
# For the last x value.
for j, yj in enumerate(y):
if yj > xiplus1:
r.append((xiplus1, yj))
break
But is there a simpler way to achieve the same with numpy
, pandas
or something else?
You can use numpy.searchsorted
with side='right'
to find out the index of the first value in y
that is larger than x
and then extract the elements with the index; A simple version which assumes there is always one value in y
larger than any element in x
could be:
x = np.array([5, 30, 58, 72])
y = np.array([8, 35, 53, 60, 66, 67, 68, 73])
np.column_stack((x, y[np.searchsorted(y, x, side='right')]))
#array([[ 5, 8],
# [30, 35],
# [58, 60],
# [72, 73]])
Given y
is sorted:
np.searchsorted(y, x, side='right')
# array([0, 1, 3, 7])
returns the index of the first value in y
that is larger than the corresponding value in x
.
We can use pd.DataFrame
on list with merge_asof
with direction = forward
i.e
new = pd.merge_asof(pd.DataFrame(x,index=x), pd.DataFrame(y,index=y),on=0,left_index=True,direction='forward')
out = list(zip(new[0],new.index))
If you dont need exact matches to match the you need to pass allow_exact_matches=False
to merge_asof
Output :
[(5, 8), (30, 35), (58, 60), (72, 73)]
You can construct a new list by iterating over x
zipped with itself -- offset by 1 index and appended with the last element of y
-- and then iterating over y, check the condition at each pass and break the inner most loop.
out = []
for x_low, x_high in zip(x, x[1:]+y[-1:]):
for yy in y:
if (yy>x_low) and (yy<=x_high):
out.append((x_low,yy))
break
out
# returns:
[(5, 8), (30, 35), (58, 60), (72, 73)]
def find(list1,list2):
final = []
for i in range(len(list1)):
pos=0
try:
while True:
if i+1==len(list1) and list1[i]<list2[pos]:
final.append((list1[i],list2[pos]))
raise Exception
if list1[i]<list2[pos] and list1[i+1]>list2[pos]:
final.append((list1[i],list2[pos]))
raise Exception
pos+=1
except: pass
return final