Let's say I have the following hive table as input, let's call it connections
:
userid | timestamp
--------|-------------
1 | 1433258019
1 | 1433258020
2 | 1433258080
2 | 1433258083
2 | 1433258088
2 | 1433258170
[...] | [...]
With the following query:
SELECT
userid,
timestamp,
timestamp - LAG(timestamp, 1, 0) OVER w AS timediff
CASE
WHEN timediff > 60
THEN 'new_session'
ELSE 'same_session'
END AS session_state
FROM connections
WINDOW w PARTITION BY userid ORDER BY timestamp ASC;
I'm generating the following output:
userid | timestamp | timediff | session_state
--------|-------------|------------|---------------
1 | 1433258019 | 1433258019 | new_session
1 | 1433258020 | 1 | same_session
2 | 1433258080 | 1433258080 | new_session
2 | 1433258083 | 3 | same_session
2 | 1433258088 | 5 | same_session
2 | 1433258170 | 82 | new_session
[...] | [...] | [...] | [...]
How would I do to generate that:
userid | timestamp | timediff | sessionid
--------|-------------|------------------------------
1 | 1433258019 | 1433258019 | user1-session-1
1 | 1433258020 | 1 | user1-session-1
2 | 1433258080 | 1433258080 | user2-session-1
2 | 1433258083 | 3 | user2-session-1
2 | 1433258088 | 5 | user2-session-1
2 | 1433258170 | 82 | user2-session-2
[...] | [...] | [...] | [...]
Is that possible using only HQL and "famous" UDFs (I'd rather not use custom UDFs or reducer scripts) ?
This works:
OUTPUT:
you can try something like this if timediff is not required:
select userid,timestamp ,session_count+ concat('user',userid,'-','session-',cast(LAG(session_count-1,1,0) over w1 as string)) AS session_state
--LAG(session_count-1,1,0) over w1 AS session_count_new FROM (select userid, timestamp, timediff, cast (timediff/60 as int)+1 as session_count
Interesting question. Per your comment to @Madhu, I added the line
2 1433258172
to your example. What you need is to increment every timetimediff > 60
is satisfied. The easiest way to do this is to flag it and then cumulatively sum over the window.Query:
Output:
Use the following select concat_ws('-',name, city) from employee; the first parameter of concat_ws is separator. name and city are column names for employee table. See that they are of type strings. You can look here for more