Counting DISTINCT over multiple columns

2019-01-05 09:05发布

Is there a better way of doing a query like this:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

17条回答
做个烂人
2楼-- · 2019-01-05 09:43

To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

In MySQL you can do the same thing without the concatenation step as follows:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

This feature is mentioned in the MySQL documentation:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

查看更多
甜甜的少女心
3楼-- · 2019-01-05 09:43

if you had only one field to "DISTINCT", you could use:

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

but you'll have issues if NULLs are involved. I'd just stick with the original query.

查看更多
爷的心禁止访问
4楼-- · 2019-01-05 09:44

Here's a shorter version without the subselect:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.

Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.

查看更多
Animai°情兽
5楼-- · 2019-01-05 09:45

select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)

from DocumentOutputItems

group by i ,s;

This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in mysql like a charm.

查看更多
爷、活的狠高调
6楼-- · 2019-01-05 09:49

I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)

SELECT COUNT(DISTINCT DocumentID) AS Count1, 
  COUNT(DISTINCT DocumentSessionId) AS Count2
  FROM DocumentOutputItems
查看更多
Emotional °昔
7楼-- · 2019-01-05 09:56

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

查看更多
登录 后发表回答