hadoop hive count concurrency

2019-08-28 10:59发布

How to implement it in hadoop?

In hive, I have a table with lots columns, which two of them are begin_time, end_time.

I need to count the number on the each time

a piece of the table is this:

begin_time                  end_time
2011.04.26 10:19:06^A2011.04.26 10:20:22
2011.04.26 10:19:08^A2011.04.26 10:21:49
2011.04.26 10:19:08^A2011.04.26 11:18:46
2011.04.26 10:19:09^A2011.04.26 12:08:36
2011.04.26 10:19:09^A2011.04.26 11:00:16
2011.04.26 10:19:11^A2011.04.26 10:19:17
2011.04.26 10:19:12^A2011.04.26 10:46:21
2011.04.26 10:19:13^A2011.04.26 10:55:43
2011.04.26 10:19:17^A2011.04.26 10:19:41
2011.04.26 10:19:18^A2011.04.26 10:34:41

the result I want is how many people is in on a specific time.

e.g. on 2011.04.26 10:19:08, there 3 visitor on course there one in 19:06, and 2 in 19:08.

and 2011.04.26 10:19:18 is 9, course ten but one leave on 2011.04.26 10:19:17

the desired result for piece is

2011.04.26 10:19:06 1
2011.04.26 10:19:08 3
2011.04.26 10:19:09 5
2011.04.26 10:19:11 6
2011.04.26 10:19:12 7
2011.04.26 10:19:13 8
2011.04.26 10:19:17 9
2011.04.26 10:19:18 9

Any help is much appreciated and welcome.

标签： java hadoop hive

1条回答

甜甜的少女心

2楼-- · 2019-08-28 11:46

You can try this on hive (assume the table name is test_log):

select /*+ MAPJOIN(driven) */ driven.time, count(*)    
from         
    (select time 
     from 
     (select begin_time time from test_log union all 
      select end_time time from test_log) u  
     group by time) driven
join test_log l on true
where
    driven.time between l.begin_time and l.end_time
group by driven.time

Probably not the best solution but at least it works. You can add some filter on the driven subquery to reduce the data set.

0人赞添加讨论(0) 举报

hadoop hive count concurrency

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间