sort within category and limit 5 for this hive tab

2020-05-04 03:16发布

问题:

I have a hive table A that has the following column

USER   ITEM    SCORE
U1      I1       S1
U1      I2       S2
...................

What I want is a table B such a format

USER    ITEMS    #ITEMS is an array
 U1     [I2,I3,...]   # items are sorted according to score in descending and limit 5

for users have less than 5 items, just put the array with items in descending order.

回答1:

should be something like this :

select USER,collect_set(ITEM) from (
    select USER, ITEM,row_number () over (partition by USER order by SCORE desc) RN 
    from A
) t1
where RN <= 5
group by USER;


标签: sql hive nosql