The join documentation for Hive encourages the use of implicit joins, i.e.
SELECT *
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';
Is this equivalent to
SELECT t1.*, t2.*, t3.*
FROM table1 t1
INNER JOIN table2 t2 ON
t1.id = t2.id
INNER JOIN table3 t3 ON
t2.id = t3.id
WHERE t1.zipcode = '02535'
, or will the above return additional records?
Not always. Your queries are equivalent. But without
WHERE t1.id = t2.id AND t2.id = t3.id
it will beCROSS JOIN
.Update:
This is interesting question and I decided to add some demo. Let's create two tables:
A(c1 int, c2 string)
andB(c1 int, c2 string)
.Load data:
Check data:
Check cross join (implicit join without
where
transformed to cross):Check inner join (implicit join with
where
works as INNER):Try to perform left join by adding
OR b.c1 is null
to the where:As you can see we got inner join again.
or b.c1 is null
is ignoredNow
left join
withoutwhere
andON
clause (transformed to CROSS):As you can see we got cross again.
Try left join with
where
clause and withoutON
(works as INNER):We got INNER join
Try left join with
where
clause and withoutON
+ try to allow nulls:Again got INNER. or
b.c1 is null
is ignored.Left join with
on
clause:Yes, it is true left join.
Left join with
on
+where
(got INNER):We got INNER because WHERE does not allow NULLS.
Left join with where + allow nulls:
Yes, it is left join.
Conclusion: