I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3"
or "s3://mybucket/data"
) etc.
I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:
import os
class FileSystem(object):
class NoAccess(Exception):
pass
def __new__(cls,path):
if cls is FileSystem:
if path.upper().startswith('NFS://'):
return super(FileSystem,cls).__new__(Nfs)
else:
return super(FileSystem,cls).__new__(LocalDrive)
else:
return super(FileSystem,cls).__new__(cls,path)
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
def __init__ (self,path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
def __init__(self,path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')
print type(data1)
print type(data2)
print data2.count_files()
I thought this would be a good use of __new__
but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?
I don't think using __new__()
to do what you want is improper. In other words, I disagree with the accepted answer to this question that Factory functions are always the "best way to do it".
If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method. Given the choices available, making the __new__()
method one — since it's static by default — is a perfectly sensible approach.
That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__()
method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.
A similar implementation could also be used to move the creation of instances out of __new__
method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Pattern for matching "xxx://" where x is any character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
@classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass
@classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)
for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # no subclass with matching prefix found (and no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
prefix = 'nfs'
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1)) # -> <class '__main__.Nfs'>
print(type(data2)) # -> <class '__main__.LocalDrive'>
print(data2.count_files()) # -> <some number>
In my opinion, using __new__
in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs
and LocalDrive
with their corresponding classes.
Why not make a separate function with this behaviour? It can even be a static method of FileSystem
class:
class FileSystem(object):
# other code ...
@staticmethod
def from_path(path):
if path.upper().startswith('NFS://'):
return Nfs(path)
else:
return LocalDrive(path)
And you call it like this:
data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')