What's faster, SELECT DISTINCT or GROUP BY in

2019-01-01 12:16发布

If I have a table

CREATE TABLE users (
  id int(10) unsigned NOT NULL auto_increment,
  name varchar(255) NOT NULL,
  profession varchar(255) NOT NULL,
  employer varchar(255) NOT NULL,
  PRIMARY KEY  (id)
)

and I want to get all unique values of profession field, what would be faster (or recommended):

SELECT DISTINCT u.profession FROM users u

or

SELECT u.profession FROM users u GROUP BY u.profession

?

15条回答
笑指拈花
2楼-- · 2019-01-01 12:34

They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).

If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.

When in doubt, test!

查看更多
与风俱净
3楼-- · 2019-01-01 12:34

If you don't have to do any group functions (sum, average etc in case you want to add numeric data to the table), use SELECT DISTINCT. I suspect it's faster, but i have nothing to show for it.

In any case, if you're worried about speed, create an index on the column.

查看更多
ら面具成の殇う
4楼-- · 2019-01-01 12:36

This is not a rule

For each query .... try separately distinct and then group by ... compare the time to complete each query and use the faster ....

In my project sometime I use group by and others distinct

查看更多
几人难应
5楼-- · 2019-01-01 12:37

Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!

查看更多
只靠听说
6楼-- · 2019-01-01 12:40

It seems that the queries are not exactly the same. At least for MySQL.

Compare:

  1. describe select distinct productname from northwind.products
  2. describe select productname from northwind.products group by productname

The second query gives additionally "Using filesort" in Extra.

查看更多
情到深处是孤独
7楼-- · 2019-01-01 12:41

All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column. Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...

BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.

So if you have something like:

// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins

// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id

It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.

So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!

查看更多
登录 后发表回答