I have a table (MainTable
) with a bit over 600,000 records. It joins onto itself via a 2nd table (JoinTable
) in a parent/child type relationship:
SELECT Child.ID, Parent.ID
FROM MainTable
AS Child
JOIN JoinTable
ON Child.ID = JoinTable.ID
JOIN MainTable
AS Parent
ON Parent.ID = JoinTable.ParentID
AND Parent.SomeOtherData = Child.SomeOtherData
I know that every child record has a parent record and the data in JoinTable is acurate.
When I run this query it takes literally minutes to run. However if I join to Parent using a Left Join then it takes < 1 second to run:
SELECT Child.ID, Parent.ID
FROM MainTable
AS Child
JOIN JoinTable
ON Child.ID = JoinTable.ID
LEFT JOIN MainTable
AS Parent
ON Parent.ID = JoinTable.ParentID
AND Parent.SomeOtherData = Child.SomeOtherData
WHERE ...[some info to make sure we don't select parent records in the child dataset]...
I understand the difference in the results between an INNER JOIN
and a LEFT JOIN
. In this case it is returning exactly the same result as every child has a parent. If I let both queries run, I can compare the datasets and they are exactly the same.
Why is it that a LEFT JOIN
runs so much faster than an INNER JOIN
?
UPDATE Checked the query plans and when using an inner join it starts with the Parent dataset. When doing a left join it starts with the child dataset.
The indexes it uses are all the same.
Can I force it to always start with the child? Using a left join works, it just feels wrong.
Similar questions have been asked here before, but none seem to answer my question.
e.g. the selected answer in INNER JOIN vs LEFT JOIN performance in SQL Server says that Left Joins are always slower than Inner joins. The argument makes sense, but it's not what I'm seeing.