Using recursive CTE to resolve a group, not hierar

2019-05-17 20:07发布

问题:

I'm trying to do a recursive CTE in SQL Server with the following example data

Class        Student
------       ------
English      Sally   <- Sally is what were searching for
English      Peter   <- Peter's on same Class as Sally
Swedish      Peter   <- Found because Peter's on this class
Dutch        Peter   <- Found because Peter's on this class
Finnish      Harry   <- Not found, no relation to class or student
Swedish      Tim     <- Found because Peter's on Swedish class
Spanish      Lauren  <- Not found, no relation to class or student
Spanish      Colin   <- Not found, no relation to class or student

So I need a CTE, to which I give 'Sally' as parameter, and it will find out all different classes related to Sally, then all Students related to Classes that Sally is in, then all classes related to students in the same classes as Sally, and so on until no more rows found.

But I just cannot figure out how to write the joins, this is what I tried but failed miserably:

WITH myCTE (Class, Student) AS
(
    SELECT Class, Student FROM TABLE1 WHERE TABLE1.Student= 'Sally'
    UNION ALL
    SELECT t.Class, t.Student FROM TABLE1 t
    JOIN myCTE t2 ON t2.Class = t.Class
)
SELECT * FROM myCTE

回答1:

The first problem is that you've got infinite recursion: Sally takes English with Peter, who takes English with Sally, who takes English with Peter...

Once you've sorted that out, you'll need an additional query in your recursive CTE. You're currently joining on Class to get the other students in the same class, but you also need to join on Student to get the other classes for the students.

Something like this should work:

WITH cteSource As
(
   SELECT
      Class,
      Student,
      -- Create a unique ID for each record:
      ROW_NUMBER() OVER (ORDER BY Student, Class) As ID
   FROM
      TABLE1
),
cteRecursive (Class, Student, IDPath) As
(
   SELECT
      Class,
      Student,
      -- Used to exclude records we've already visited:
      Convert(varchar(max), '/' + Convert(varchar(10), ID) + '/')
   FROM
      cteSource
   WHERE
      Student = 'Sally'

   UNION ALL

   -- Students in the same class:
   SELECT
      T.Class,
      T.Student,
      R.IDPath + Convert(varchar(10), T.ID) + '/'
   FROM
      cteSource As T
      INNER JOIN cteRecursive As R
      ON T.Class = R.Class
   WHERE
      CharIndex('/' + Convert(varchar(10), t.ID) + '/', R.IDPath) = 0

   UNION ALL

   -- Other classes for the students:
   SELECT
      T.Class,
      T.Student,
      R.IDPath + Convert(varchar(10), T.ID) + '/'
   FROM
      cteSource As T
      INNER JOIN cteRecursive As R
      ON T.Student = R.Student
   WHERE
      CharIndex('/' + Convert(varchar(10), t.ID) + '/', R.IDPath) = 0
)
SELECT
   Class,
   Student,
   IDPath
FROM
   cteRecursive
;

With your test data, you'll get the following results:

English   Sally   /7/
English   Peter   /7/5/
Dutch     Peter   /7/5/4/
Swedish   Peter   /7/5/6/
Swedish   Tim     /7/5/6/8/
Dutch     Peter   /7/5/6/4/
Swedish   Peter   /7/5/4/6/
Swedish   Tim     /7/5/4/6/8/

If you're using SQL 2008 or higher, you might get better performance if you make the IDPath a HierarchyID, but you'd need to test with your real data.

EDIT
You might need to change the final select to:

SELECT DISTINCT
   Class,
   Student
FROM
   cteRecursive

to deal with cases where there are multiple paths to the same record. For example, "Dutch/Peter", "Swedish/Peter" and "Swedish/Tim" all appear twice.