Count Distinct over partition by sql

2019-02-28 02:34发布

问题:

I have a table like

col1ID  col2String Col3ID Col4String Col5Data
  1        xxx       20      abc     14-09-2018
  1        xxx       20      xyz     14-09-2018
  2        xxx       30      abc     14-09-2018
  2        xxx       30      abc     14-09-2018 

I would like to add column which count how many different strings I have in col4String group by col1ID and col3ID.

So something like

COUNT(DISTINCT (Col4String)) over (partition by col1ID, col3ID)

but it doesn't work, I receive an error

Use of DISTINCT is not allowed with the OVER clause.
Msg 102, Level 15, State 1, Line 23.

I have more columns like col2String, col5Data but they shouldn´t be affected, so I can't use distinct at the beginning of SELECT, and dense_rank() also doen´t seems to work in my case.

Thank You for help.

回答1:

Try this:

DECLARE @DataSource TABLE
(
    [col1ID] INT
   ,[col2String] VARCHAR(12) 
   ,[Col3ID]  INT
   ,[Col4String]  VARCHAR(12)
   ,[Col5Data] DATE
);

INSERT INTO @DataSource
VALUES (1, 'xxx', 20, 'abc', '2018-09-14')
      ,(1, 'xxx', 20, 'xyz', '2018-09-14')
      ,(2, 'xxx', 30, 'abc', '2018-09-14')
      ,(2, 'xxx', 30, 'abc', '2018-09-14');

SELECT *
     ,dense_rank() over (partition by col1ID, col3ID order by [Col4String])  + dense_rank() over (partition by col1ID, col3ID order by [Col4String] desc) - 1
FROM @DataSource



回答2:

Obviously distinct is not supported in window function in SQL Server, therefore, you may use a subquery instead. Something along these lines:

 select (
           select COUNT(DISTINCT Col4String) 
           from your_table t2
           where t1.col1ID = t2.col1ID and t1.col3ID = t2.col3ID
        )
 from your_table t1


回答3:

Try this way;

select * from TableX X
outer apply(select count(*) as stringCount , X2.Col4String 
            from TableX X2 on X.col1ID= X2.col1ID and X.col3ID = X2.col3ID
            group by X2.Col4String ) K


回答4:

I would use APPLY :

SELECT t.*, t1.Col4String_Cnt
FROM table t CROSS APPLY
     (SELECT COUNT(DISTINCT t1.Col4String) AS Col4String_Cnt
      FROM table t1
      WHERE t1.col1ID = t.col1ID AND t1.col3ID  = t.col3ID 
     ) t1;


回答5:

You can do this with an additional level of window functions. One method uses dense_rank():

SELECT . . .,
       MAX(DR) OVER (PARTITION BY col1ID, col3ID)
FROM (SELECT t.*, 
             DENSE_RANK() OVER (PARTITION BY col1ID, col3ID ORDER BY Col4String) as dr
      FROM t
     ) t