Group by one combination and once again group it f

2019-08-25 03:36发布

问题:

Folks,

We have following data and we need following output.

 CUSTOMER_NAME PRODUCT_NAME PRICE OCCURANCE ID
 customer1,    product1,    20,       1
 customer1,    product2,    30,       2
 customer1,    product1,    25,       3
 customer1,    product1,    20,       1
 customer1,    product2,    20,       2
 customer1,    product2,    30,       2

First we need to average the price by occurance id.

 customer1,product1,20 (AVG is 20 for occurance 1), 1
 customer1,product1,25 (AVG is 25 for occurance 3) , 3

Now once again we have to average it by customername,product name (Occurance is ignored in group by)

Final Output custoemr1,product1,avg price of all occurances.

customer1,product1, 20 + 25/2 = 22.5

Basically how to do average of average in HIVE ? We are not able to write anything for this.

回答1:

Hi this can be achieved using nested queries as follows :

First step : to calculate initial averages of price by occurrence_id

SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id ;

Second Step : calculate the avg of avgs returned in first step

hive (default)>
              > SELECT customer_name, product_name,avg(avg_of_current_occurance) as final_avg
              > FROM(
              > SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
              > FROM customer_info
              > GROUP BY customer_name,product_name,occurance_id
              > ) W
              > GROUP BY customer_name,product_name;

Total MapReduce jobs = 1
Launching Job 1 out of 1

Execution completed successfully

customer_name   product_name    final_avg
customer1       product1        22.5
customer1       product2        26.666666666666668