How can I test if a string refers to a file or dir

2019-08-07 05:09发布

问题:

so I'm writting a generic backup application with os module and pickle and far I've tried the code below to see if something is a file or directory (based on its string input and not its physical contents).

import os, re

def test(path):
    prog = re.compile("^[-\w,\s]+.[A-Za-z]{3}$")
    result = prog.match(path)
    if os.path.isfile(path) or result:
        print "is file"
    elif os.path.isdir(path):
        print "is directory"
    else: print "I dont know"

Problems

test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
I dont know
test("beach.jpg")
I dont know
test("/directory/")
I dont know

Desired Output

test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
is file
test("beach.jpg")
is file
test("/directory/")
is directory

Resources

  • Test filename with regular expression
  • Python RE library
  • Validating file types by regular expression

what regular expression should I be using to tell the difference between what might be a file and what might be a directory? or is there a different way to go about this?

回答1:

In a character class, if present and meant as a hyphen, the - needs to either be the first/last character, or escaped \- so change "^[\w-,\s]+\.[A-Za-z]{3}$" to "^[-\w,\s]+.[A-Za-z]{3}$" for instance.

Otherwise, I think using regex's to determine if something looks like a filename/directory is pointless...

  • /dev/fd0 isn't a file or directory for instance
  • ~/comm.pipe could look like a file but is a named pipe
  • ~/images/test is a symbolic link to a file called '~/images/holiday/photo1.jpg'

Have a look at the os.path module which have functions that ask the OS what something is...:



回答2:

The os module provides methods to check whether or not a path is a file or a directory. It is advisable to use this module over regular expressions.

>>> import os
>>> print os.path.isfile(r'/Users')
False
>>> print os.path.isdir(r'/Users')
True


回答3:

This might help someone, I had the exact same need and I used the following regular expression to test whether an input string is a directory, file or neither: for generic file:

^(\/+\w{0,}){0,}\.\w{1,}$

for generic directory:

^(\/+\w{0,}){0,}$

So the generated python function looks like :

import os, re

def check_input(path):
    check_file = re.compile("^(\/+\w{0,}){0,}\.\w{1,}$")
    check_directory = re.compile("^(\/+\w{0,}){0,}$")
    if check_file.match(path):
        print("It is a file.")
    elif check_directory.match(path):
        print("It is a directory")
    else:
        print("It is neither")

Example:

  • check_input("/foo/bar/file.xyz") prints -> Is a file
  • check_input("/foo/bar/directory") prints -> Is a directory
  • check_input("Random gibberish") prints -> It is neither

This layer of security of input may be reinforced later by the os.path.isfile() and os.path.isdir() built-in functions as Mr.Squig kindly showed but I'd bet this preliminary test may save you a few microseconds and boost your script performance.

PS: While using this piece of code, I noticed I missed a huge use case when the path actually contains special chars like the dash "-" which is widely used. To solve this I changed the \w{0,} which specifies the requirement of alphabetic only words with .{0,} which is just a random character. This is more of a workaround than a solution. But that's all I have for now.