I am writing a script that is required to perform safe-writes to any given file i.e. append a file if no other process is known to be writing into it. My understanding of the theory was that concurrent writes were prevented using write locks on the file system but it seems not to be the case in practice.
Here's how I set up my test case:
I am redirecting the output of a ping command:
ping 127.0.0.1 > fileForSafeWrites.txt
On the other end, I have the following python code attempting to write to the file:
handle = open('fileForSafeWrites.txt', 'w')
handle.write("Probing for opportunity to write")
handle.close()
Running concurrently both processes gracefully complete. I see that fileForSafeWrites.txt has turned into a file with binary content, instead of a write lock issued by the first process that protects it from being written into by the Python code.
How do I force either or both of my concurrent processes not to interfere with each other? I have read people advise the ability to get a write file handle as evidence for the file being write to safe, such as in https://stackoverflow.com/a/3070749/1309045
Is this behavior specific to my Operating System and Python. I use Python2.7 in an Ubuntu 12.04 environment.
Inspired from a solution described for concurrency checks, I came up with the following snippet of code. It works if one is able to appropriately predict the frequency at which the file in question is written. The solution is through the use of file-modification times.
import os
import time
'''Find if a file was modified in the last x seconds given by writeFrequency.'''
def isFileBeingWrittenInto(filename,
writeFrequency = 180, overheadTimePercentage = 20):
overhead = 1+float(overheadTimePercentage)/100 # Add some buffer time
maxWriteFrequency = writeFrequency * overhead
modifiedTimeStart = os.stat(filename).st_mtime # Time file last modified
time.sleep(writeFrequency) # wait writeFrequency # of secs
modifiedTimeEnd = os.stat(filename).st_mtime # File modification time again
if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency:
return True
else:
return False
if not isFileBeingWrittenInto('fileForSafeWrites.txt'):
handle = open('fileForSafeWrites.txt', 'a')
handle.write("Text written safely when no one else is writing to the file")
handle.close()
This does not do true concurrency checks but can be combined with a variety of other methods for practical purposes to safely write into a file without having to worry about garbled text. Hope it helps the next person searching for a way to do this.
EDIT UPDATE:
Upon further testing, I encountered a high-frequency write process that required the conditional logic to be modified from
if 0 < (modifiedTimeEnd - modifiedTimeStart) < maxWriteFrequency
to
if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency
That makes a better answer, in theory and in practice.
Use the lockfile module as shown in Locking a file in Python