Given two dataframes df_1
and df_2
, how to join them such that datetime column df_1
is in between start
and end
in dataframe df_2
:
print df_1
timestamp A B
0 2016-05-14 10:54:33 0.020228 0.026572
1 2016-05-14 10:54:34 0.057780 0.175499
2 2016-05-14 10:54:35 0.098808 0.620986
3 2016-05-14 10:54:36 0.158789 1.014819
4 2016-05-14 10:54:39 0.038129 2.384590
print df_2
start end event
0 2016-05-14 10:54:31 2016-05-14 10:54:33 E1
1 2016-05-14 10:54:34 2016-05-14 10:54:37 E2
2 2016-05-14 10:54:38 2016-05-14 10:54:42 E3
Get corresponding event
where df1.timestamp
is between df_2.start
and df2.end
timestamp A B event
0 2016-05-14 10:54:33 0.020228 0.026572 E1
1 2016-05-14 10:54:34 0.057780 0.175499 E2
2 2016-05-14 10:54:35 0.098808 0.620986 E2
3 2016-05-14 10:54:36 0.158789 1.014819 E2
4 2016-05-14 10:54:39 0.038129 2.384590 E3
One simple solution is create
interval index
fromstart and end
settingclosed = both
then useget_loc
to get the event i.e (Hope all the date times are in timestamps dtype )Output :
Option 1
Option 2
In this method, we assume TimeStamp objects are used.
Explanation of
get_event
For each timestamp in
df1
, sayt0 = 2016-05-14 10:54:33
,(t0 >= df2.start) & (t0 <= df2.end)
will contain 1 true. (See example 1). Then, take a dot product withnp.arange(event_num)
to get the index of the event that at0
belongs to.Examples:
Example 1
Take
t2 = 2016-05-14 10:54:35
for another exampleWe finally use
transform
to transform each timestamp into an event.A slight improvement to Dark's solution:
Reference: A question on
IntervalIndex.get_indexer.
You can use the module pandasql