I have 4 shell script to generate a file(let's say param.txt) which is used by another tool(informatica) and as the tool is done with processing, it deletes param.txt.
The intent here is all four scripts can get invoked at different time lets say 12:10 am, 12:13 am, 12:16 am, 12:17 am. First script runs at 12:10am and creates param.txt and trigger informatica process which uses param.txt. Informatica process takes another 5-10 minutes to complete and deletes the param.txt. The 2nd script invokes at 12:13 am and waits for unavailability of param.txt and as informatica process deletes it, script 2 creates new param.txt and triggers same informatica again. The same happen for another 2 scripts.
I am using Until and sleep commands in all 4 shell script to check the unavailability of param.txt like below:
until [ ! -f "$paramfile" ]
do
Sleep 10
done
<create param.txt file>
Issue here is, sometimes when all 4 scripts begin, the first one succeeds and generates param.txt(as there was no param.txt before) and other waits but when informatica process completes and deletes param.txt, remaining 3 scripts or 2 of them checks the unavailability at same time and one of them creates it but all succeed. I have checked different combinations of sleep interval between four scripts but this situation is occurring almost every time.
Thanks for your valuable suggestions. It did help me to think from other dimension. However I missed to mention that I am using Solaris UNIX where I couldn't find equivalent of flock or similar function. I could have asked team to install one utility but in mean time I found a workaround for this issue.
I read about the mkdir function being atomic in nature where as 'touch' command to create a file is not(still don't have complete explanation on how it works). That means at a time only 1 script can create/delete directory 'lockdir' out of 4 and other 3 has to wait.
You are experiencing a classical race condition. To solve this issue, you need a shared "lock" (or similar) between your 4 scripts.
There are several ways to implement this. One way to do this in bash is by using the
flock
command, and an agreed-upon filename to use as a lock. The flock man page has some usage examples which resemble this:You can also ask
flock
to fail right away if the lock can't be acquired, or to fail after a timeout (e.g. to print "still trying to acquire the lock" and restart).Depending on your use case, you could also put the lock on the 'informatica' binary (be sure to use
200<
in that case, to open the file for reading instead of (over)writing)You can use GNU Parallel as a counting semaphore or a mutex, by invoking it as
sem
instead of asparallel
. Scroll down toMutex
on this page.So, you could use:
Note I have specified a global
id
in case you run the jobs from different terminals or cron. This is not necessary if you are starting all jobs from one terminal.