I hope it's ok to make a posting like this.
I have been using SQL for quite some time and people at work have been using 2 different ways to return the same number or rows in a database.
For example:
SELECT Name
FROM
Employees
WHERE
DepartmentID IN (SELECT DepartmentID
FROM
Departments
WHERE
Department LIKE '%Engineering')
SELECT Employees.Name
FROM
Departments
INNER JOIN Employees
ON Departments.DepartmentID = Employees.DepartmentID
WHERE
Departments.Department LIKE '%Engineering'
Both return the same data. People have been telling me that using subqueries is the best way to do it.
My question is this: Which of these 2 will execute faster? My guess would be the one with the inner join but I may be wrong.
Thanks.
These two queries are not equivalent. The equivalent to the "IN" query is the following:
In other words, the "IN" version can never return duplicate rows. The join version can return duplicate rows. In this case, I don't think it would because departmentid is probably unique in the table.
The next question is about optimization. SQL engines really should optimize the two versions equivalently. That means that the "IN" version is easier to read.
However, there is one big exception to this. MySQL is definitely suboptimal when it comes to IN. When using that database engine, it is much better to use the join version.
By the way, there are other ways to do this, such as:
Which works best . . . you need to measure on whatever database you are using.
The query optimizers in most modern RDBMS will probably internally work with them the same way, so there won't be a performance difference.
Older RDBMS might probably show differences, but this depends on the RDBMS itself.
In this case, both are equivalent. However, for the one with the INNER JOIN, if there is more than 1 department selected with the same DepartmentID, employees with that DepartmentID will be returned multiple times.
SQL is a declarative language, which means that the language is not supposed to say how the query should be performed, only what result should be found. It is up to the DMBS to work out how to perform it.
A decent SQL database will probably optimize them to both do the same or similar things.
To check if they are doing the same thing, run
EXPLAIN
on the query.If they have the same steps, obviously they will take the same amount of time to run. Otherwise, you will see what ways the database is treating the differently. How the DBMS optimizes it is implementation dependent. So the best way is to use
EXPLAIN
.Note: EXPLAIN is an SQL command, just like SELECT etc. See http://www.sql.org/sql-database/postgresql/manual/sql-explain.html.
Assuming it executes with the same steps, the way you write it will be the way you think appears clearest what the intention is.