I have a database, and I use a query to produce an intermediate table like this:
id a b
xx 1 2
yy 7 11
and I would like to calculate the standard deviations of b for the users who have a < avg(a)
I calculate avg(a) that way and it works fine:
select avg(select a from (query to produce intermediate table)) from table;
But the query:
select stddev_pop(b)
from (query to produce intermediate table)
where a < (select avg(select a
from (query to produce intermediate table))
from table);
Returns me an error, and more precisely, I am told that the "a" from avg(select a from...) is not recognised. This makes me really confused, as it works in the previous query.
I would be grateful if somebody could help.
EDIT:
I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem.
The non working query becomes:
select stddev_pop(b) from temp where a < (select avg(a) from temp);
while this works:
select avg(a) from temp;
OK, a colleague helped me to do it. I'll post the answer in case someone runs into the same problem:
select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;
Basically hive doesn't do caching of a table as a variable.
You likely need to move your parentheses in your WHERE
clause. Try this:
select stddev_pop(b)
from (query to produce intermediate table)
where c < ( select avg(a)
from (query to produce intermediate table)
);
And, your question refers to a column c
; did you mean a
?
UPDATE: I saw a similar question with MySQL
today; sorry I don't know Hive
. See if this works:
select stddev_pop(b)
from temp
where a < ( select *
from (select avg(a) from temp) x
);
ok , first of all hive doesnt support sub queries anywhere only than the from clause.
so you can't use subquery in where clause you have to create a temp table in from clause and you can use that table.
Now if you create a temp table and than you are using it in your where clause than to refer that temp table it has to again run the fetching query so again it will not support .
Bob I think hive will not support this
select stddev_pop(b)
from temp
where a < ( select *
from (select avg(a) from temp) x
);
but yes
select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;
if we can create a temp table physically and put the data select avg(a) as average from temp into that then we can refer this .