SQL: Find missing folders paths in splitting hiera

2019-08-05 04:40发布

问题:

I have a table which contains folders paths. This table contains four columns: DirID, BaseDirID, DirLevel and DisplayPath. DirID - The folder's ID. BaseDirID - The ID of the first folder in the hierarchy. So all the folders (paths) from the same hierarchy share the same value in this column. DirLevel - The depth of the folder. DisplayPath - The folder's path.

I need to find all the "gaps" between those folders in the hierarchy.

Sample data for example:

DirID BaseDirID DirLevel DisplayPath
1   1  1  'A'
2   1  3  'A\B\C'
3   1  5  'A\B\C\D\E'
4   1  3  'A\B\F'
5   1  5  'A\B\F\G\H'
6   2  1  'U'
7   2  3  'U\V\W'
8   2  5  'U\V\W\X\Y'
9   2  3  'U\V\M'
10  2  5  'U\V\M\L\O'

So we need to find the following data:

BaseDirID DisplayPath
1   'A\B'
1   'A\B\C\D'
1   'A\B\F\G'
2   'U\V'
2   'U\V\W\X'
2   'U\V\M\L'

Comments:

a. This table contains more than 250,000 records of folders, so we seek for the most efficient way to do so, otherwise the script will be stuck for long time, time we don't have.

b. I don't have list of all folders. What I have are the "root" folders and the "leafs" folders which I need to find the "gaps" between them in the hierarchy.

c. The table can contains more than one hierarchy and we need to find the "gaps" in all of the hierarchies.

d. Each of the hierarchies can split, as you can see in the sample data the first hierarchy splits to two folders paths from the 'A\B' folder: 'A\B\C' and 'A\B\F'. And the second hierarchy splits to two folders paths from the 'U\V' folder: 'U\V\W' and 'U\V\M'. And we need to find all the "gaps" even in such cases when the hierarchies split.

e. The sql server version is: SQL 2012 SP3.

This question is a continuation question for the question which asked in the following link: SQL: Find missing hierarchy Folders (Paths) in a table Our question include also the 4th comment which appear in bold.

I saw that there is a new column type which called "hierarchyid" (start from sql server 2008), which I thought that might help us - https://docs.microsoft.com/en-us/sql/t-sql/data-types/hierarchyid-data-type-method-reference What do you think?

Thanks in advance.

回答1:

Using this added path (11,2,'U\V\Z\L\O\Q\R\S\T') to show multiple missing folders in a path:

with cte as (
select BaseDirID, DisplayPath = left(DisplayPath,len(DisplayPath)-charindex('\',reverse(DisplayPath)))
from t
where DirLevel > 1
  and not exists (
  select 1 
  from t i
  where t.BaseDirId = i.BaseDirId
    and i.DisplayPath = left(t.DisplayPath,len(t.DisplayPath)-charindex('\',reverse(t.DisplayPath)))
    )
union all 
select BaseDirID, DisplayPath = left(DisplayPath,len(DisplayPath)-charindex('\',reverse(DisplayPath)))
from cte t
where not exists (
  select 1 
  from t i
  where t.BaseDirId = i.BaseDirId
    and i.DisplayPath = left(t.DisplayPath,len(t.DisplayPath)-charindex('\',reverse(t.DisplayPath)))                                                   
    )
)
select distinct * 
from cte

rextester demo: http://rextester.com/CEVGZ96613

returns:

+-----------+-----------------+
| BaseDirID |   DisplayPath   |
+-----------+-----------------+
|         1 | A\B             |
|         1 | A\B\C\D         |
|         1 | A\B\F\G         |
|         2 | U\V             |
|         2 | U\V\M\L         |
|         2 | U\V\W\X         |
|         2 | U\V\Z           |
|         2 | U\V\Z\L         |
|         2 | U\V\Z\L\O       |
|         2 | U\V\Z\L\O\Q     |
|         2 | U\V\Z\L\O\Q\R   |
|         2 | U\V\Z\L\O\Q\R\S |
+-----------+-----------------+