How can I count distinct multiple fields without r

2019-03-06 20:35发布

问题:

I have a query with several groupings that returns a count per month. Something like this:

SELECT field1, field2, year(someDate), month(someDate), count(*) as myCount
FROM myTable
WHERE field5 = 'test'
GROUP BY field1, field2, year(someDate), month(someDate)

The problem is that I want the count to be distinct per day, based on an id field + the date field (without the time). As in, I want to get the distinct count of ids each day, per month. So I want something like this:

SELECT field1, field2, year(someDate), month(someDate), 
       count(distinct someID, someDate) as myCount
FROM myTable
WHERE field5 = 'test'
GROUP BY field1, field2, year(someDate), month(someDate)

There are 2 problems with this:

  1. You can't list 2 distinct fields for a count aggregate
  2. This will include the time of the date as well, so it won't filter anything out as it will almost always have a different time

I can take care of 2. easily by converting to a varchar of just the date, but I'm not sure how to deal with the multiple distinct fields issue. I can't use this solution, as I don't want to repeat the entire where clause and group by clause. This is what I've come up with:

SELECT field1, field2, year(someDate), month(someDate), 
       count(distinct someID + CONVERT(VARCHAR, someDate, 112)) as myCount
FROM myTable
WHERE field5 = 'test'
GROUP BY field1, field2, year(someDate), month(someDate)

Instead of listing the distinct fields in a comma-delimited list, I simply concatenated them. Is there any downside to this method that I should look out for? Can I count on it to be accurate? And - is there any better way to accomplish this?

Basically, I'm grouping per month, but the "distinct" count should be based on day. As in, if I have id 31 on Jan 3 and Jan 5, I want it to count as 2 for January, but if I have id 31 twice on Jan 3, I only want it to count once.

Some basic sample data & expected output (skipping field1 and field2 for this):

*Date*              *ID*
1/3/12 00:00:09     22
1/3/12 00:13:00     22
1/4/12 12:00:00     22
1/7/12 15:00:45     27
1/15/12 15:00:00    22
2/6/12 00:00:09     50
2/8/12 00:13:00     44
2/8/12 12:00:00     45
2/22/12 15:00:45    33
2/22/12 15:00:00    33
2/22/12 15:00:00    44

*Year*  *Month* *Count*
2012    Jan     4
2012    Feb     5

回答1:

UPDATED

Based on your sample data, this gives the required result:

Declare @Tab table ([Date] datetime,ID int)
insert into @Tab([Date],ID) values
('2012-01-03T00:00:09.000', 22),
('2012-01-03T00:13:00.000', 22),
('2012-01-04T12:00:00.000', 22),
('2012-01-07T15:00:45.000', 27),
('2012-01-15T15:00:00.000', 22),
('2012-02-06T00:00:09.000', 50),
('2012-02-08T00:13:00.000', 44),
('2012-02-08T12:00:00.000', 45),
('2012-02-22T15:00:45.000', 33),
('2012-02-22T15:00:00.000', 33),
('2012-02-22T15:00:00.000', 44)

select DATEADD(month,DATEDIFF(month,0,[Date]),0) as MonthStart,SUM(distinctDayIDs)
from
(
    SELECT DATEADD(day,DATEDIFF(day,0,[Date]),0) as [Date], 
           count(distinct ID) as distinctDayIDs
    FROM @Tab
    --WHERE field5 = 'test'
    GROUP BY DATEADD(day,DATEDIFF(day,0,[Date]),0)
) t
group by DATEADD(month,DATEDIFF(month,0,[Date]),0)

I think, because we have to do a count for each day, we have to do it as two separate grouping operations.


Older Answer

It sounds like the desired output would be field1, field2, the date, and the count of distinct IDs on that date?

If so, I think you're overcomplicating things:

SELECT field1, field2, DATEADD(day,DATEDIFF(day,0,someDate),0) as Date, 
       count(distinct someID) as myCount
FROM myTable
WHERE field5 = 'test'
GROUP BY ffield1, field2, DATEADD(day,DATEDIFF(day,0,someDate),0)

(I'm using DATEADD/DATEDIFF to strip the time portion, rather than converting to varchar)



回答2:

You can try with a count 'over partition':

SELECT 
   field1, field2, someID, someDate, 
   count(*) OVER(PARTITION BY someID, someDate) as myCount
FROM myTable
WHERE field5 = 'test'
GROUP BY field1, field2, someID, someDate

Or prepare a CTE select:

;with cte as (
   select someDate, count( someID) as myCount
   from myTable
   group by someDate)
 select m.field1, m.field2, m.someID, m.someDate, cte.myCount
 from myTable m inner join cte 
   on m.someDate = cte.someDate
 where ...