What is the difference between an INNER JOIN
and LEFT SEMI JOIN
?
In the scenario below, why am I getting two different results?
The INNER JOIN
result set is a lot larger. Can someone explain? I am trying to get the names within table_1
that only appear in table_2
.
SELECT name
FROM table_1 a
INNER JOIN table_2 b ON a.name=b.name
SELECT name
FROM table_1 a
LEFT SEMI JOIN table_2 b ON (a.name=b.name)
An
INNER JOIN
returns the columns from both tables. ALEFT SEMI JOIN
only returns the records from the left-hand table. It's equivalent to (in standard SQL):If there are multiple matching rows in the right-hand column, an
INNER JOIN
will return one row for each match on the right table, while aLEFT SEMI JOIN
only returns the rows from the left table, regardless of the number of matching rows on the right side. That's why you're seeing a different number of rows in your result.Then a
LEFT SEMI JOIN
is the appropriate query to use.Suppose there are 2 tables TableA and TableB with only 2 columns (Id, Data) and following data:
TableA:
TableB:
Inner Join on column
Id
will return columns from both the tables and only the matching records:Left Join (or Left Outer join) on column
Id
will return columns from both the tables and matching records with records from left table (Null values from right table):Right Join (or Right Outer join) on column
Id
will return columns from both the tables and matching records with records from right table (Null values from left table):Full Outer Join on column
Id
will return columns from both the tables and matching records with records from left table (Null values from right table) and records from right table (Null values from left table):Left Semi Join on column
Id
will return columns only from left table and matching records only from left table:Tried in Hive and got the below output
table1
table2
Inner Join
Left Join
Left Semi Join