I'm writing scripts that will run in parallel and will get their input data from the same file. These scripts will open the input file, read the first line, store it for further treatment and finally erase this read line from the input file.
Now the problem is that multiple scripts accessing the file can lead to the situation where two scripts access the input file simultaneously and read the same line, which produces the unacceptable result of the line being processed twice.
Now one solution is to write a lock file (.lock_input
) before accessing the input file, and then erase it when releasing the input file, but this solution is not appealing in my case because sometimes NFS slows down network communication randomly and may not have reliable locking.
Another solution is to put a process lock instead of writing a file, which means the first script to access the input file will launch a process called lock_input, and the other scripts will ps -elf | grep lock_input
. If it is present on the process list they will wait. This may be faster than writing to the NFS but still not perfect solution ...
So my question is: Is there any bash command (or other script interpreter) or a service I can use that will behave like semaphore or mutex locks used for synchronization in thread programming?
Thank you.
Small rough example:
Let's say we have input_file as following:
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Treatment script : TrScript.sh
#!/bin/bash
NbLines=$(cat input_file | wc -l)
while [ ! $NbLines = 0 ]
do
FirstLine=$(head -1 input_file)
echo "Hello World today is $FirstLine"
RemainingLines=$(expr $NbLines - 1 )
tail -n $RemainingLines input_file > tmp
mv tmp input_file
NbLines=$(cat input_file | wc -l)
done
Main script:
#! /bin/bash
./TrScript.sh &
./TrScript.sh &
./TrScript.sh &
wait
The result should be:
Hello World today is Monday Hello World today is Tuesday Hello World today is Wednesday Hello World today is Thursday Hello World today is Friday Hello World today is Saturday Hello World today is Sunday
Using FLOM (Free LOck Manager) tool your main script can become as easy as:
if you are running the script inside a single host and something like:
if you want to distribute your script on many hosts. Some usage examples are available at this URL: http://sourceforge.net/p/flom/wiki/FLOM%20by%20examples/
I have always liked the lockfile program (sample search result for lockfile manpage) from the procmail set of tools (should be available on most systems, though it might not be installed by default).
It was designed to lock mail spool files, which are (were?) commonly mounted via NFS, so it does work properly over NFS (as much as anything can).
Also, as long as you you are making the assumption that all your ‘workers’ are on the same machine (by assuming you can check for PIDs, which may not work properly when PIDs eventually wrap), you could put your lock file in some other, local, directory (e.g. /tmp) while processing files hosted on an NFS server. As long as all the workers use the same lock file location (and a one-to-one mapping of lockfile filenames to locked pathnames), it will work fine.
use
for accessing the file you want to read from. This uses file locks, though.
prints the first line of the input