How do you identify record pattern sequences in re

2019-02-07 07:23发布

问题:

This is a fairly new exercise for me but I need to find a way to identify pattern sequences within a table. So for example, lets say I have a simple table that resembles the following:

Now what I would like to do is identify and group all the records which have the sequenced pattern of values 5, 9 and 6 presenting them in a query. How would you accomplish this task using T-SQL?

The results should look like this:

I've looked for some potential examples of how this might be accomplished but couldn't find anything that really helps.

回答1:

You can use the following query wrapped in a CTE in order to assign sequence numbers to the values contained in your sequence:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
)

Output:

v   rn
-------
5   1
9   2
6   3

Using the above CTE you can identify islands, i.e. slices of sequential rows containing the whole of the sequence:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
), Grp AS (
SELECT [Key], [Value], 
       ROW_NUMBER() OVER (ORDER BY [Key]) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Value = s.v
)
SELECT *
FROM Grp

Output:

    Key Value   grp
   -----------------
    1   5       0
    2   9       0
    3   6       0
    6   5       3
    7   9       3
    8   6       3

grp field helps you identify exactly these islands.

All you need to do now is to just filter out partial groups:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
), Grp AS (
SELECT [Key], [Value], 
       ROW_NUMBER() OVER (ORDER BY [Key]) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Value = s.v
)
SELECT g1.[Key], g1.[Value]
FROM Grp AS g1
INNER JOIN (
   SELECT grp
   FROM Grp
   GROUP BY grp
   HAVING COUNT(*) = 3 ) AS g2
ON g1.grp = g2.grp

Demo here

Note: The initial version of this answer used an INNER JOIN to Seq. This won't work if table contains values like 5, 42, 9, 6, as 42 will be filtered out by the INNER JOIN and this sequence falsely identified as a valid one. Credit goes to @HABO for this edit.



回答2:

Not very optimized, but I think propper answer:

CREATE TABLE pattern (
    rowID INT IDENTITY(1,1) PRIMARY KEY, 
    rowValue INT NOT NULL
);

INSERT INTO pattern (rowValue) VALUES (5);
INSERT INTO pattern (rowValue) VALUES (9);
INSERT INTO pattern (rowValue) VALUES (6);

SELECT * FROM pattern;

SELECT Trg.* FROM Keys Trg 
INNER JOIN pattern Pt ON (Trg.fValue = Pt.rowValue) 
INNER JOIN (
    SELECT K.fKey - P.rowID AS X, COUNT(*) AS Xc FROM Keys K 
        LEFT JOIN pattern P ON (K.fValue = P.rowValue) 
    WHERE 
        (P.rowID IS NOT NULL) 
    GROUP BY K.fKey - P.rowID 
    HAVING COUNT(*) = (SELECT COUNT(*) FROM pattern)
) Z ON (Trg.fKey - Pt.rowID = Z.X);

I use a table for pattern joining it to the main table. I calculate difference between the Key and pattern Key and I show only that rows which difference is matching (and rows count for the difference matching rows inside pattern table).