How to determine if a path is a subdirectory of an

2020-06-18 03:28发布

I am given a list of paths that I need to check files within. Of course, if I am given a root, and a subdirectory, there is no need to process the sub-directory. For example

c:\test  // process this
c:\test\pics // do not process this
c:\test2 // process this

How can I tell (cross platform) that a path is not a subdirectory of the other. Preferably I would want this to be cross platform, and am not worried about symlinks as long as they are not cyclical (worse case is that I end up processing the data twice).

UPDATE: here is the code I ended up using, thanks to @F.J

   def unique_path_roots(paths):
    visited = set()
    paths = list(set(paths))

    for path in sorted(paths,key=cmp_to_key(locale.strcoll)):
        path = normcase(normpath(realpath(path)))

        head, tail = os.path.split(path)
        while head and tail:
            if head in visited:
                break
            head, tail = os.path.split(head)
        else:
            yield path
            visited.add(path)

标签: python file
5条回答
【Aperson】
2楼-- · 2020-06-18 04:02

Track the directories you've already processed (in a normalized form) and don't process them again if you've already seen them. Something like this should work:

from os.path import realpath, normcase, sep

dirs = [r"C:\test", r"C:\test\pics", r"C:\test2"]

processed = []

for dir in dirs:
    dir = normcase(realpath(dir)) + sep
    if not any(dir.startswith(p) for p in processed):
        processed.append(dir)
        process(dir)            # your code here
查看更多
▲ chillily
3楼-- · 2020-06-18 04:14

Fixed and simplified jgoeders's version:

def is_subdir(suspect_child, suspect_parent):
    suspect_child = os.path.realpath(suspect_child)
    suspect_parent = os.path.realpath(suspect_parent)

    relative = os.path.relpath(suspect_child, start=suspect_parent)

    return not relative.startswith(os.pardir)
查看更多
Anthone
4楼-- · 2020-06-18 04:15

I would maintain a set of directories you have already processed, and then for each new path check to see if any of its parent directories already exist in that set before processing:

import os.path

visited = set()
for path in path_list:
    head, tail = os.path.split(path)
    while head and tail:
        if head in visited:
            break
        head, tail = os.path.split(head)
    else:
        process(path)
        visited.add(path)

Note that path_list should be sorted so that subdirectories are always after their parent directories if they exist.

查看更多
smile是对你的礼貌
5楼-- · 2020-06-18 04:17
def is_subdir(path, directory):
    path = os.path.realpath(path)
    directory = os.path.realpath(directory)

    relative = os.path.relpath(path, directory)

    if relative.startswith(os.pardir):
        return False
    else:
        return True
查看更多
ら.Afraid
6楼-- · 2020-06-18 04:18

Here is an is_subdir utility function I came up with.

  • Python3.x compatible (works with bytes and str, matching os.path which also supports both).
  • Normalizes paths for comparison.
    (parent hierarchy and case to work on ms-windows).
  • Avoids using os.path.relpath which will raise an exception on ms-windows if the paths are on different drives. (C:\foo -> D:\bar)

Code:

def is_subdir(path, directory):
    """
    Returns true if *path* in a subdirectory of *directory*.
    """
    import os
    from os.path import normpath, normcase, sep
    path = normpath(normcase(path))
    directory = normpath(normcase(directory))
    if len(path) > len(directory):
        sep = sep.encode('ascii') if isinstance(directory, bytes) else sep
        if path.startswith(directory.rstrip(sep) + sep):
            return True
    return False
查看更多
登录 后发表回答