I have a table which contains Folders Paths.
I need to find all the "gaps" between those folders in the hierarchy.
I mean that, if the table contains these 3 folders:
'A'
'A\B\C'
'A\B\C\D\E\F\G'
I need to find the following missing folders in the hierarchy:
'A\B'
'A\B\C\D'
'A\B\C\D\E'
'A\B\C\D\E\F'
This table contains more than 250,000 records of folders, so we seek for the most efficient way to do so, otherwise the script will be stuck for long time, time we don't have.
Comment: I don't have list of all folders. What I have are the "root" folders and the "leafs" folders which I need to find the "gaps" between them in the hierarchy.
Second comment: The table can contains more than one hierarchy and we need to find the "gaps" in all of the hierarchies.
For that matter, There are 2 another int columns: "DirID" and "BaseDirID". The "DirID" column is the id column in our table. The "BaseDirID" contains the id of the first folder in the hierarchy. So all the folders (paths) from the same hierarchy share the same value in this column. Sample data for example:
DirID BaseDirID DisplayPath
1 1 'A'
2 1 'A\B\C'
3 1 'A\B\C\D\E'
4 4 'U'
5 4 'U\V\W'
6 4 'U\V\W\X\Y'
So we need to find the following data:
BaseDirID DisplayPath
1 'A\B'
1 'A\B\C\D'
4 'U\V'
4 'U\V\W\X'
Thanks in advance.
Here is one approach using Recursive CTE
and split string function
;WITH existing_hierachies
AS (SELECT DirID,
BaseDirID,
DisplayPath
FROM (VALUES (1,1,'A' ),
(2,1,'A\B\C' ),
(3,1,'A\B\C\D\E' ),
(4,4,'U' ),
(5,4,'U\V\W' ),
(6,4,'U\V\W\X\Y' )) tc (DirID, BaseDirID, DisplayPath) ),
folders_list
AS (SELECT ItemNumber,
item fol,
BaseDirID
FROM (SELECT row_number()over(partition by BaseDirID order by Len(DisplayPath) DESC)rn,*
FROM existing_hierachies) a
CROSS apply dbo.[Delimitedsplit8k](DisplayPath, '\')
Where Rn = 1),
rec_cte
AS (SELECT *,
Cast(fol AS VARCHAR(4000))AS hierar
FROM folders_list
WHERE ItemNumber = 1
UNION ALL
SELECT d.*,
Cast(rc.hierar + '\' + d.fol AS VARCHAR(4000))
FROM rec_cte rc
JOIN folders_list d
ON rc.BaseDirID = d.BaseDirID
AND d.ItemNumber = rc.ItemNumber + 1)
SELECT rc.BaseDirID,
rc.hierar AS Missing_Hierarchies
FROM rec_cte rc
WHERE NOT EXISTS (SELECT 1
FROM existing_hierachies eh
WHERE eh.BaseDirID = rc.BaseDirID
AND eh.DisplayPath = rc.hierar)
Order by rc.BaseDirID
Result :
+-----------+---------------------+
| BaseDirID | Missing_Hierarchies |
+-----------+---------------------+
| 1 | A\B |
| 1 | A\B\C\D |
| 4 | U\V |
| 4 | U\V\W\X |
+-----------+---------------------+
Split string function code
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
GO
Referred from http://www.sqlservercentral.com/articles/Tally+Table/72993/