This question already has an answer here:
- How to group by time interval in Spark SQL 2 answers
I have a hive
table with two columns with date-time values: start and finish of "session". The following is the sample of such a table:
+----------------------+----------------------+--+
| start_time | end_time |
+----------------------+----------------------+--+
| 2017-01-01 00:24:52 | 2017-01-01 00:25:20 |
| 2017-01-01 00:31:11 | 2017-01-01 10:31:15 |
| 2017-01-01 10:31:15 | 2017-01-01 20:40:53 |
| 2017-01-01 20:40:53 | 2017-01-01 20:40:53 |
| 2017-01-01 10:31:15 | 2017-01-01 10:31:15 |
| 2017-01-01 07:09:34 | 2017-01-01 07:29:00 |
| 2017-01-01 11:36:41 | 2017-01-01 15:32:00 |
| 2017-01-01 07:29:00 | 2017-01-01 07:34:30 |
| 2017-01-01 11:06:30 | 2017-01-01 11:36:41 |
| 2017-01-01 07:45:00 | 2017-01-01 07:50:00 |
+----------------------+----------------------+--+
There are a lot of sessions. I need to get a dataset that presents a number of sessions on half-hour time grid on some interval as following
+----------------------+--------------+--+
| time | sessions_num |
+----------------------+--------------+--+
| 2018-07-04 00:30:00 | 85 |
| 2018-07-04 01:00:00 | 86 |
| 2018-07-04 01:30:00 | 84 |
| 2018-07-04 02:00:00 | 85 |
| 2018-07-04 02:30:00 | 84 |
| 2018-07-04 03:00:00 | 84 |
| 2018-07-04 03:30:00 | 84 |
| 2018-07-04 04:00:00 | 84 |
| 2018-07-04 04:30:00 | 84 |
| 2018-07-04 05:00:00 | 84 |
| 2018-07-04 05:30:00 | 84 |
| 2018-07-04 06:00:00 | 84 |
| 2018-07-04 06:30:00 | 85 |
| 2018-07-04 07:00:00 | 85 |
| 2018-07-04 07:30:00 | 85 |
| 2018-07-04 08:00:00 | 85 |
| 2018-07-04 08:30:00 | 85 |
| 2018-07-04 09:00:00 | 83 |
| 2018-07-04 09:30:00 | 82 |
| 2018-07-04 10:00:00 | 82 |
| 2018-07-04 10:30:00 | 83 |
| 2018-07-04 11:00:00 | 82 |
| 2018-07-04 11:30:00 | 82 |
| 2018-07-04 12:00:00 | 83 |
+----------------------+--------------+--+
What is the Apache Hive
or Apache Spark
or maybe some other way to make last table from first one?