I am given a list of paths that I need to check files within. Of course, if I am given a root, and a subdirectory, there is no need to process the sub-directory. For example
c:\test // process this
c:\test\pics // do not process this
c:\test2 // process this
How can I tell (cross platform) that a path is not a subdirectory of the other. Preferably I would want this to be cross platform, and am not worried about symlinks as long as they are not cyclical (worse case is that I end up processing the data twice).
UPDATE: here is the code I ended up using, thanks to @F.J
def unique_path_roots(paths):
visited = set()
paths = list(set(paths))
for path in sorted(paths,key=cmp_to_key(locale.strcoll)):
path = normcase(normpath(realpath(path)))
head, tail = os.path.split(path)
while head and tail:
if head in visited:
break
head, tail = os.path.split(head)
else:
yield path
visited.add(path)
I would maintain a set of directories you have already processed, and then for each new path check to see if any of its parent directories already exist in that set before processing:
import os.path
visited = set()
for path in path_list:
head, tail = os.path.split(path)
while head and tail:
if head in visited:
break
head, tail = os.path.split(head)
else:
process(path)
visited.add(path)
Note that path_list
should be sorted so that subdirectories are always after their parent directories if they exist.
def is_subdir(path, directory):
path = os.path.realpath(path)
directory = os.path.realpath(directory)
relative = os.path.relpath(path, directory)
if relative.startswith(os.pardir):
return False
else:
return True
Track the directories you've already processed (in a normalized form) and don't process them again if you've already seen them. Something like this should work:
from os.path import realpath, normcase, sep
dirs = [r"C:\test", r"C:\test\pics", r"C:\test2"]
processed = []
for dir in dirs:
dir = normcase(realpath(dir)) + sep
if not any(dir.startswith(p) for p in processed):
processed.append(dir)
process(dir) # your code here
Fixed and simplified jgoeders's version:
def is_subdir(suspect_child, suspect_parent):
suspect_child = os.path.realpath(suspect_child)
suspect_parent = os.path.realpath(suspect_parent)
relative = os.path.relpath(suspect_child, start=suspect_parent)
return not relative.startswith(os.pardir)
Here is an is_subdir
utility function I came up with.
- Python3.x compatible (works with
bytes
and str
, matching os.path
which also supports both).
- Normalizes paths for comparison.
(parent hierarchy and case to work on ms-windows).
- Avoids using
os.path.relpath
which will raise an exception on ms-windows if the paths are on different drives. (C:\foo
-> D:\bar
)
Code:
def is_subdir(path, directory):
"""
Returns true if *path* in a subdirectory of *directory*.
"""
import os
from os.path import normpath, normcase, sep
path = normpath(normcase(path))
directory = normpath(normcase(directory))
if len(path) > len(directory):
sep = sep.encode('ascii') if isinstance(directory, bytes) else sep
if path.startswith(directory.rstrip(sep) + sep):
return True
return False