Hive/SQL bundling columns for few columns,rest of

2019-08-27 05:08发布

问题:

i have a hive table as below with 5 columns

name orderno productcategory amount description
KJFSFKS 1   1   40  D1
KJFSFKS 2   2   50  D2
KJFSFKS 3   2   67  D3
KJFSFKS 4   2   10  D4
KJFSFKS 5   3   2   D5
KJFSFKS 6   3   5   D6
KJFSFKS 7   3   6   D7
KJFSFKS 8   4   8   D8
KJFSFKS 9   5   8   D9
KJFSFKS 10  5   10  D10

desired output based on same product category code, if productcategory code is same across multiple rows add amount field, pick the description based on highest orderno, orderno always picklowest, output as below

name    orderno productcategory amount  description
KJFSFKS 1   1   40  D1
KJFSFKS 2   2   127 D4
KJFSFKS 5   3   13  D7
KJFSFKS 8   4   8   D8
KJFSFKS 9   5   18  D10

As said above,some fields are in some order, other in different order

i used group by but sum(amount) is fine, what about description field, it is based on orderno column, also there are other columns in my requirement where i should pick based on order number

回答1:

select name, orderno,  productcategory,  amount,   description 
from 
(
select name, orderno, productcategory, 
       sum(amount) over(partition by name, productcategory) amount, 
       first_value(description) over(partition by name, productcategory order by orderno desc) description,
       row_number() over (partition by name, productcategory order by orderno) rn
from  your_table
)s where rn=1; --pick lowest orderno 

OK
KJFSFKS 1       1       40      D1
KJFSFKS 2       2       127     D4
KJFSFKS 5       3       13      D7
KJFSFKS 8       4       8       D8
KJFSFKS 9       5       18      D10
Time taken: 12.492 seconds, Fetched: 5 row(s)


回答2:

select      name
           ,min(orderno)    as orderno
           ,productcategory
           ,sum(amount)     as amount
           ,max(named_struct('orderno',orderno,'description',description)).description

from        mytable

group by    name
           ,productcategory
;