I am given a list of paths that I need to check files within. Of course, if I am given a root, and a subdirectory, there is no need to process the sub-directory. For example
c:\test // process this
c:\test\pics // do not process this
c:\test2 // process this
How can I tell (cross platform) that a path is not a subdirectory of the other. Preferably I would want this to be cross platform, and am not worried about symlinks as long as they are not cyclical (worse case is that I end up processing the data twice).
UPDATE: here is the code I ended up using, thanks to @F.J
def unique_path_roots(paths):
visited = set()
paths = list(set(paths))
for path in sorted(paths,key=cmp_to_key(locale.strcoll)):
path = normcase(normpath(realpath(path)))
head, tail = os.path.split(path)
while head and tail:
if head in visited:
break
head, tail = os.path.split(head)
else:
yield path
visited.add(path)
Track the directories you've already processed (in a normalized form) and don't process them again if you've already seen them. Something like this should work:
Fixed and simplified jgoeders's version:
I would maintain a set of directories you have already processed, and then for each new path check to see if any of its parent directories already exist in that set before processing:
Note that
path_list
should be sorted so that subdirectories are always after their parent directories if they exist.Here is an
is_subdir
utility function I came up with.bytes
andstr
, matchingos.path
which also supports both).(parent hierarchy and case to work on ms-windows).
os.path.relpath
which will raise an exception on ms-windows if the paths are on different drives. (C:\foo
->D:\bar
)Code: