Nested queries in Hive SQL

2020-07-09 11:12发布

I have a database, and I use a query to produce an intermediate table like this:

id    a      b    
xx    1      2    
yy    7      11    

and I would like to calculate the standard deviations of b for the users who have a < avg(a)

I calculate avg(a) that way and it works fine:

select avg(select a from (query to produce intermediate table)) from table;

But the query:

select stddev_pop(b) 
from (query to produce intermediate table) 
where a < (select avg(select a 
                     from (query to produce intermediate table))
          from table);

Returns me an error, and more precisely, I am told that the "a" from avg(select a from...) is not recognised. This makes me really confused, as it works in the previous query.

I would be grateful if somebody could help.

EDIT:

I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem. The non working query becomes:

select stddev_pop(b) from temp where a < (select avg(a) from temp);

while this works:

select avg(a) from temp;

标签: sql hive
3条回答
姐就是有狂的资本
2楼-- · 2020-07-09 11:33

OK, a colleague helped me to do it. I'll post the answer in case someone runs into the same problem:

select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;

Basically hive doesn't do caching of a table as a variable.

查看更多
The star\"
3楼-- · 2020-07-09 11:41

ok , first of all hive doesnt support sub queries anywhere only than the from clause. so you can't use subquery in where clause you have to create a temp table in from clause and you can use that table. Now if you create a temp table and than you are using it in your where clause than to refer that temp table it has to again run the fetching query so again it will not support .

Bob I think hive will not support this select stddev_pop(b) from temp where a < ( select * from (select avg(a) from temp) x );

but yes select stddev_pop(b) from temp x join (select avg(a) as average from temp) y where x.a < y.average;

if we can create a temp table physically and put the data select avg(a) as average from temp into that then we can refer this .

查看更多
乱世女痞
4楼-- · 2020-07-09 11:51

You likely need to move your parentheses in your WHERE clause. Try this:

select stddev_pop(b) 
from (query to produce intermediate table) 
where c < ( select avg(a) 
            from (query to produce intermediate table)
          );

And, your question refers to a column c; did you mean a?

UPDATE: I saw a similar question with MySQL today; sorry I don't know Hive. See if this works:

select stddev_pop(b) 
from   temp 
where  a < ( select *
             from (select avg(a) from temp) x
          );
查看更多
登录 后发表回答