If I have a table
CREATE TABLE users (
id int(10) unsigned NOT NULL auto_increment,
name varchar(255) NOT NULL,
profession varchar(255) NOT NULL,
employer varchar(255) NOT NULL,
PRIMARY KEY (id)
)
and I want to get all unique values of profession
field, what would be faster (or recommended):
SELECT DISTINCT u.profession FROM users u
or
SELECT u.profession FROM users u GROUP BY u.profession
?
They are essentially equivalent to each other (in fact this is how some databases implement
DISTINCT
under the hood).If one of them is faster, it's going to be
DISTINCT
. This is because, although the two are the same, a query optimizer would have to catch the fact that yourGROUP BY
is not taking advantage of any group members, just their keys.DISTINCT
makes this explicit, so you can get away with a slightly dumber optimizer.When in doubt, test!
If you don't have to do any group functions (sum, average etc in case you want to add numeric data to the table), use SELECT DISTINCT. I suspect it's faster, but i have nothing to show for it.
In any case, if you're worried about speed, create an index on the column.
This is not a rule
For each query .... try separately distinct and then group by ... compare the time to complete each query and use the faster ....
In my project sometime I use group by and others distinct
Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!
It seems that the queries are not exactly the same. At least for MySQL.
Compare:
The second query gives additionally "Using filesort" in Extra.
All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column. Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...
BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.
So if you have something like:
It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.
So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!