I am working on query to get cumulative distinct count of uids on daily basis.
Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appeared on next day with new uid 300 (100,200,300) on 2016-11-02 At this point i want store cumulative count to be 3 not 5 as (user id 100 and 200 already appeared on past day ).
Input table:
date uid
2016-11-01 100
2016-11-01 200
2016-11-01 300
2016-11-01 400
2016-11-02 100
2016-11-02 200
2016-11-03 300
2016-11-03 400
2016-11-03 500
2016-11-03 600
2016-11-04 700
Expected query result:
date daily_cumulative_count
2016-11-01 4
2016-11-02 4
2016-11-03 6
2016-11-04 7
Till now i am able to get cumulative distinct count per day but it includes previous distinct uids from previous day as well.
SELECT
date,
SUM(count) OVER (
ORDER BY date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM (
SELECT
date,
COUNT(DISTINCT uid) AS count
FROM sample_table
GROUP by 1
)ORDER BY date DESC;
Any kind of help would be greatly appreciated.
You can use
exists
to check if an id was present on any of the previous dates. Then get the running sum and find the max value for each group which would get you the daily distinct cumulative count.Using
SELECT DISTINCT
because(date, COUNT(uid))
will be duplicated many times.Explanation: for each date
dt
, it counts uid from the earliest date up todt
, because we are specifyingORDER BY date
and it defaults toBETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.Please try the following...
This statement starts by joining one instance of
sample_table
to another in such a way that each record inleftTable
has associated with it a copy of each record fromrightTable
that has an earlier or equaldate
value. This effectively attaches a list to eachdate
of alluid
values that have occurred up to and including thatdate
value.The resulting dataset is refined to unique
date
anduid
combinations through use ofGROUP BY
.The refined dataset from the subquery
allUIDSForDateFinder
is then grouped bydate
by the main body of the query, and aCOUNT()
ofuid
values associated with each group is performed.If you have any questions or comments, then please feel free to post a Comment accordingly.
easiest way:
or something like this