I have the following table mytable
in Hive:
id radar_id car_id datetime
1 A21 123 2017-03-08 17:31:19.0
2 A21 555 2017-03-08 17:32:00.0
3 A21 777 2017-03-08 17:33:00.0
4 B15 123 2017-03-08 17:35:22.0
5 B15 555 2017-03-08 17:34:05.0
5 B15 777 2017-03-08 20:50:12.0
6 A21 123 2017-03-09 11:00:00.0
7 C11 123 2017-03-09 11:10:00.0
8 A21 123 2017-03-09 11:12:00.0
9 A21 555 2017-03-09 11:12:10.0
10 B15 123 2017-03-09 11:14:00.0
11 C11 555 2017-03-09 11:20:00.0
I want to get the routes of cars passing through radars A21
and B15
within the same trip. For example, if the date is different for the same car_id
, then it is not the same trip. Basically, I want to consider that the maximum time difference between radars A21
and B15
for the same vehicle should be 30 minutes. If it's bigger, then the trip is not the same, like for example for the car_id
777
.
My final goal is to count the average number of trips per day (non-unique, so if the same car passed 2 times by the same route, then it should be calculated 2 times).
The expected result is the following one:
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5
On the date 2017-03-08
there are 2 trips between radars A21
and B15
(car 777
is not considered due to 30 minutes limit), while on the date 2017-03-09
there is only 1 trip. The average is 2+1=1.5 trips per day.
How can I get this result? Basically, I do not know how to introduce 30 minutes limit in the query and how to group rides by radar_start
and radar_end
.
Thanks.
Update:
- The trip is registered at the date it started.
- If the car was triggered by radar
A21
at2017-03-08 23:55
and by radarB15
at2017-03-09 00:15
, then it should be considered as the same trip registered for the date2017-03-08
. - In case of
ids
6 and 8 the same car123
passed byA21
two times, and then it turned toB15
(id
10). The last ride withid
8 should be considered. So,8-10
. Thus, the closest previous toB15
. The interpretation is that a car passed byA21
two times and the second time is turned toB15
.