To compare count between two hive table

2019-06-03 00:18发布

问题:

I am trying to do count comparision between two table . As minus operator does not work in hive , it is not happening. Could you please give some simple way to do count comparision between two tables.

select  'Call Detail - Hive T1 to HDFS Staging - Data Compare',
case when cnt>0 then 'Fail' Else 'Pass' end
from
(select count(*) cnt from (
(select 
count(*) from students1 s1)-
(select count(*) from students2 s2)
) as tbl1
) as tbl2;

It's throwing error:

FAILED: ParseException line 81:0 cannot recognize input near '(' '(' 'select' in from source

回答1:

Use cross join if you have no group by columns. In this case it will produce one row with both counts:

select s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result
from
(select count(*) cnt  from students1 s1) s
cross join
(select count(*) cnt from students2 s2) s1

If you will add some group by columns to compare on more detailed grain, then use FULL JOIN on group by columns:

select s.col1 s_col1, s1.col1 s1_col1, s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result
from
(select count(*) cnt, col1  from students1 s1 group by col1) s
full join
(select count(*) cnt, col1 from students2 s2 group by col1) s1 
on s.col1 = s1.col1

This query will return joined rows with difference calculated and also not joined rows from both tables.



回答2:

check out below query .. it's working fine locally at my system. Let me know if it helps.

select 'Call Detail - Hive T1 to HDFS Staging - Data Compare',
       case 
       when (sum(cnt1) - sum(cnt2)) > 0 
       then 'PASS' 
       else 'FAIL' 
       end as count_records
  from (select count(*) as cnt1, 0 as cnt2 from students1 
        union all
        select 0 as cnt1, count(*) as cnt2 from students1  ) tbl;