Counting DISTINCT over multiple columns-第3页回答

Is there a better way of doing a query like this:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

标签： sql sql-server tsql

17条回答

祖国的老花朵

2楼-- · 2019-01-05 09:57

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

edit

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

edit 2

Given the following data the concatenating solution provided above will miscount:

col1  col2
----  ----
A     AA
AA    A

So we to include a separator...

select col1 + '*' + col2 from t23
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

0人赞添加讨论(0) 举报

虎瘦雄心在

3楼-- · 2019-01-05 09:57

Many (most?) SQL databases can work with tuples like values so you can just do: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems; If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g. COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).

A related use of tuples is performing IN queries such as: SELECT * FROM DocumentOutputItems WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2019-01-05 09:58

How about this,

Select DocumentId, DocumentSessionId, count(*) as c 
from DocumentOutputItems 
group by DocumentId, DocumentSessionId;

This will get us the count of all possible combinations of DocumentId, and DocumentSessionId

0人赞添加讨论(0) 举报

爷、活的狠高调

5楼-- · 2019-01-05 10:00

You can just use the Count Function Twice.

In this case, it would be:

SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
FROM DocumentOutputItems

0人赞添加讨论(0) 举报

Viruses.

6楼-- · 2019-01-05 10:02

Edit: Altered from the less-than-reliable checksum-only query I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

0人赞添加讨论(0) 举报

上一页 1 2 3

Counting DISTINCT over multiple columns

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间