Pivot a multiple rows into one row with multiple c

2019-03-04 12:24发布

问题:

I have a table that contains the data that for each product, they have certain attributes, say (limited to A,B,C for now). and I have the value corresponding to each attribute.

Product Attribute Value
p1  A  1
p1  B  2
p1  C  3
p2  A  1
p3  B  3
p3  C  2

And I want my output to be:

Product    A     B    C
1          1     2    3
2          1     NULL NULL
3          NULL  3    2

What I have tried so far is and it did gave me what I want in the end, but this query is totally a big mass, I came from R background and I am wondering if their is an easy solution or something similar like cast from reshape in R.

Thanks!

select
    product, 
    max(`A`) as A, 
    max(`B`) as B, 
    max(`C`) as C 
from
(select 
    product,
    case attribute when 'A' then `value` else null end as A,
    case attribute when 'B' then `value` else null end as B,
    case attribute when 'C' then `value` else null end as C
from test) t
group by product

回答1:

I'm not aware of a pivot function in Hive but this is possible. You will need to collect attribute and value to a map, which can be done using this UDAF

Example:

ADD JAR /root/path/to/jar/brickhouse-0.7.0.jar;
CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF';

SELECT product
    , attr_map['A'] AS A
    , attr_map['B'] AS B
    , attr_map['C'] AS C
FROM (
    SELECT product
        , collect(attribute, value) AS attr_map
    FROM test
    GROUP BY product
    ) x

The caveat here is that if you have a lot of attributes, this can be quite a bit of repetitive code.



标签: mysql hive