Does Hive support Select in Select?

2019-07-23 04:53发布

问题:

Is the code bellow in Hive possible, wich some change?:

insert into table webmap
select a.res reference, b.res resource, 
(select count(ip) from weblog where resource=a.res and referer=b.res) weight
from toprefres a join toprefres b;

I run that in hive-0.10.0-cdh4.5.0 and got the error: FAILED: ParseException line 3:1 cannot recognize input near 'select' 'count' '(' in expression specification

What if select in select in Hive is supported!

I've finally figured out the answer:

insert into table webmap
select refres.reso,refres.refe, count(ip) weight from 
weblog join (select a.res refe, b.res reso from toprefres a join toprefres b) refres on
trimrslash(weblog.resource)=refres.reso and trimrslash(weblog.referrer)=concat("http://dongxicheng.org",refres.refe)
group by refres.reso,refres.refe;

Which is perfect for my need!

回答1:

As far as I know such corelated subqueries are not supported in Hive. I confirmed my idea from the following post:

Hive column as a subquery select

However what I think is, you can modify your query to achieve what you are trying to get:

insert into table webmap
select a.res reference, b.res resource, 
count(weblog.ip) weight
from toprefres a 
join toprefres b
join weblog ON weblog.resource=a.res 
and weblog.referer=b.res;

Also noted that there in no ON clause in your which might result in a cross join creating a Cartesian product and taking more time. If possible try to optimize that.