so I'm writting a generic backup application with os
module and pickle
and far I've tried the code below to see if something is a file or directory (based on its string input and not its physical contents).
import os, re
def test(path):
prog = re.compile("^[-\w,\s]+.[A-Za-z]{3}$")
result = prog.match(path)
if os.path.isfile(path) or result:
print "is file"
elif os.path.isdir(path):
print "is directory"
else: print "I dont know"
Problems
test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
I dont know
test("beach.jpg")
I dont know
test("/directory/")
I dont know
Desired Output
test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
is file
test("beach.jpg")
is file
test("/directory/")
is directory
Resources
- Test filename with regular expression
- Python RE library
- Validating file types by regular expression
what regular expression should I be using to tell the difference between what might be a file
and what might be a directory
? or is there a different way to go about this?
In a character class, if present and meant as a hyphen, the -
needs to either be the first/last character, or escaped \-
so change "^[\w-,\s]+\.[A-Za-z]{3}$"
to "^[-\w,\s]+.[A-Za-z]{3}$" for instance.
Otherwise, I think using regex's to determine if something looks like a filename/directory is pointless...
/dev/fd0
isn't a file or directory for instance
~/comm.pipe
could look like a file but is a named pipe
~/images/test
is a symbolic link to a file called '~/images/holiday/photo1.jpg'
Have a look at the os.path
module which have functions that ask the OS what something is...:
The os
module provides methods to check whether or not a path is a file or a directory. It is advisable to use this module over regular expressions.
>>> import os
>>> print os.path.isfile(r'/Users')
False
>>> print os.path.isdir(r'/Users')
True
This might help someone, I had the exact same need and I used the following regular expression to test whether an input string is a directory, file or neither:
for generic file:
^(\/+\w{0,}){0,}\.\w{1,}$
for generic directory:
^(\/+\w{0,}){0,}$
So the generated python function looks like :
import os, re
def check_input(path):
check_file = re.compile("^(\/+\w{0,}){0,}\.\w{1,}$")
check_directory = re.compile("^(\/+\w{0,}){0,}$")
if check_file.match(path):
print("It is a file.")
elif check_directory.match(path):
print("It is a directory")
else:
print("It is neither")
Example:
- check_input("/foo/bar/file.xyz") prints -> Is a file
- check_input("/foo/bar/directory") prints -> Is a directory
- check_input("Random gibberish") prints -> It is neither
This layer of security of input may be reinforced later by the os.path.isfile() and os.path.isdir() built-in functions as Mr.Squig kindly showed but I'd bet this preliminary test may save you a few microseconds and boost your script performance.
PS: While using this piece of code, I noticed I missed a huge use case when the path actually contains special chars like the dash "-" which is widely used. To solve this I changed the \w{0,} which specifies the requirement of alphabetic only words with .{0,} which is just a random character. This is more of a workaround than a solution. But that's all I have for now.