I have a urllib2 caching module, which sporadically crashes because of the following code:
if not os.path.exists(self.cache_location):
os.mkdir(self.cache_location)
The problem is, by the time the second line is being executed, the folder may exist, and will error:
File ".../cache.py", line 103, in __init__
os.mkdir(self.cache_location)
OSError: [Errno 17] File exists: '/tmp/examplecachedir/'
This is because the script is simultaneously launched numerous times, by third-party code I have no control over.
The code (before I attempted to fix the bug) can be found here, on github
I can't use the tempfile.mkstemp, as it solves the race condition by using a randomly named directory (tempfile.py source here), which would defeat the purpose of the cache.
I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error), for example:
$ touch blah
$ python
>>> import os
>>> os.mkdir("blah")
Traceback (most recent call last):
File "", line 1, in
OSError: [Errno 17] File exists: 'blah'
>>>
I cannot using threading.RLock
as the code is called from multiple processes.
So, I tried writing a simple file-based lock (that version can be found here), but this has a problem: it creates the lockfile one level up, so /tmp/example.lock
for /tmp/example/
, which breaks if you use /tmp/
as a cache dir (as it tries to make /tmp.lock
)..
In short, I need to cache urllib2
responses to disc. To do this, I need to access a known directory (creating it, if required), in a multiprocess safe way. It needs to work on OS X, Linux and Windows.
Thoughts? The only alternative solution I can think of is to rewrite the cache module using SQLite3 storage, rather than files.
Instead of
if not os.path.exists(self.cache_location):
os.mkdir(self.cache_location)
you could do
try:
os.makedirs(self.cache_location)
except OSError:
pass
As you would end up with the same functionality.
DISCLAIMER: I don't know how Pythonic this might be.
Using SQLite3
, might be a bit of overkill, but would add a lot of functionality and flexibility to your use case.
If you have to do a lot of "selecting", concurrent inserting and filtering, it's a great idea to use SQLite3
, as it wont add too much complexity over simple files (it could be argued that it removes complexity).
Rereading your question (and comments) I can better understand your problem.
What is the possibility that a file could create the same race condition?
If it is small enough, then I'd do something like:
if not os.path.isfile(self.cache_location):
try:
os.makedirs(self.cache_location)
except OSError:
pass
Also, reading your code, I'd change
else:
# Our target dir is already a file, or different error,
# relay the error!
raise OSError(e)
to
else:
# Our target dir is already a file, or different error,
# relay the error!
raise
as it's really what you want, Python to reraise the exact same exception (just nitpicking).
One more thing, may be this could be of use for you (Unix-like only).
The code I ended up with was:
import os
import errno
folder_location = "/tmp/example_dir"
try:
os.mkdir(folder_location)
except OSError as e:
if e.errno == errno.EEXIST and os.path.isdir(folder_location):
# File exists, and it's a directory,
# another process beat us to creating this dir, that's OK.
pass
else:
# Our target dir exists as a file, or different error,
# reraise the error!
raise
Could you catch the exception and then test whether the file exists as a directory or not?
When you have race conditions often EAFP(easier to ask forgiveness than permission) works better that LBYL(look before you leap)
Error checking strategies