Counting DISTINCT over multiple columns

2019-01-05 09:05发布

Is there a better way of doing a query like this:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

17条回答
何必那么认真
2楼-- · 2019-01-05 09:36

It works for me. In oracle:

SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;

In jpql:

SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;
查看更多
3楼-- · 2019-01-05 09:38

How about something like:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

Probably just does the same as you are already though but it avoids the DISTINCT.

查看更多
smile是对你的礼貌
4楼-- · 2019-01-05 09:38

I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.

At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.

Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using

SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))

but this is also not conforting.

You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.

I would suggest using

SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems
查看更多
一纸荒年 Trace。
5楼-- · 2019-01-05 09:39

There's nothing wrong with your query, but you could also do it this way:

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery
查看更多
Explosion°爆炸
6楼-- · 2019-01-05 09:41

Hope this works i am writing on prima vista

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId
查看更多
爷的心禁止访问
7楼-- · 2019-01-05 09:42

This was posed and answered on Quora (https://www.quora.com/In-SQL-how-to-I-count-DISTINCT-over-multiple-columns):

select col1, col2, col3, count(*)
from table
group by col1, col2, col3

I was working on this in SAS, and SAS Proc SQL does not like DISTINCT with more than one column.

查看更多
登录 后发表回答