This is essentially a gaps and islands problem however it's atypical. I did cut the example down to bare minimum. I need to identify gaps that exceed a certain threshold and duplicates can't be a problem although this example removes them.
In any case the common solution of using ROW_NUMBER() doesn't help since gaps of even 1 can't be handled and the gap value is a parameter in 'real life'.
The code below actually works correctly. And it's super fast! But if you look at it you'll see why people are rather gun shy about relying upon it. The method was first published 9 years ago here http://www.sqlservercentral.com/articles/T-SQL/68467/ and I've read all 32 pages of comments. Nobody has successfully poked holes in it other than to say "it's not documented behavior". I've tried it on every version from 2005 to 2019 and it works.
The question is, beyond using a cursor or while loop to look at many millions of rows 1 by 1 - which takes, I don't know how long because I cancel after 30 min. - is there a 'supported' way to get the same results in a reasonable amount of time? Even 100x slower would complete 4M rows in 10 minutes and I can't find a way to come close to that!
CREATE TABLE #t (CreateDate date not null
,TufpID int not null
,Cnt int not null
,FuzzyGroup int null);
ALTER TABLE #t ADD CONSTRAINT PK_temp PRIMARY KEY CLUSTERED (CreateDate,TufpID);
-- Takes 40 seconds to write 4.4M rows from a source of 70M rows.
INSERT INTO #T
SELECT X.CreateDate
,X.TufpID
,Cnt = COUNT(*)
,FuzzyGroup = null
FROM SessionState SS
CROSS APPLY(VALUES (CAST(SS.CreateDate as date),SS.TestUser_Form_Part_id)) X(CreateDate,TufpID)
GROUP BY X.CreateDate
,X.TufpID
ORDER BY x.CreateDate,x.TufpID;
-- Takes 6 seconds to update 4.4M rows. They WILL update in clustered index order!
-- (Provided all the rules are followed - see the link above)
DECLARE @FuzzFactor int = 38
DECLARE @Prior int = -@FuzzFactor; -- Insure 1st row has it's own group
DECLARE @Group int;
DECLARE @CDate date;
UPDATE #T
SET @Group = FuzzyGroup = CASE WHEN t.TufpID - @PRIOR < @FuzzFactor AND t.CreateDate = @CDate
THEN @Group ELSE t.TufpID END
,@CDate = CASE WHEN @CDate = t.CreateDate THEN @CDate ELSE t.CreateDate END
,@Prior = CASE WHEN @Prior = t.TufpID-1 THEN @Prior + 1 ELSE t.TufpID END
FROM #t t WITH (TABLOCKX) OPTION(MAXDOP 1);
After the above executes the FuzzyGroup column contains the lowest value of TufpID in the group. IOW the first row (in clustered index order) contains the value of it's own TufpID column. Thereafter every row gets the same value until the date changes or a gap size (in this case 38) is exceeded. In those cases the current TufpID becomes the value put into FuzzyGroup until another change is detected. So after 6 seconds I can run queries that group by FuzzyGroup and analyze the islands.
In practice I do some running counts and totals as well in the same pass and so it takes 8 seconds not 6 but I could do those things with window functions pretty easily if I need to so I left them off.
This is the smallest table and I'll eventually need to handle 100M rows. Thus 10 minutes for 4.4M is probably not good enough but it's a place to start.