I have a bash script that loops through a folder and processes all *.hql files. Sometimes one of the hive script fails (syntax, resource constraint, etc), instead of the script failing it will continue onto the next .hql file.
Anyway I can stop the bash from processing the remaining? Below is my sample bash:
for i in `ls ${layer}/*.hql`; do
echo "Processing $i ..."
hive ${hiveconf_all} -hiveconf DATE=${date} -f ${i} &
if [ $j -le 5 ]; then
j=$(( j+1 ))
else
wait
j=0
fi
done
I would check the process completion state of the previous command and invoke the exit command to come out the loop
(( $? == 0 )) && exit 1
Introduce the above line after the hive command and should do the trick.
add
set -e
to the top of your script
Use this template for running parallel processes and wait for their completion. Add your date
, layer
, hiveconf_all
and other variables:
#!/bin/bash
set -e
#Run parallel processes and write their logs
log_dir=/tmp/my_script_logs
for i in `ls ${layer}/*.hql`; do
echo "Processing $i ..."
#Run hive in parallel and redirect to the log file
hive ${hiveconf_all} -hiveconf DATE=${date} -f ${i} 2>&1 | tee "log_dir/${i}".log &
done
#Now wait for all processes to complete
FAILED=0
for job in `jobs -p`
do
echo "job=$job"
wait $job || let "FAILED+=1"
done
if [ "$FAILED" != "0" ]; then
echo "Execution FAILED! ($FAILED)"
#Do something here, log or send message, etc
exit 1
fi
#All processes are completed successfully!
#Do something here
echo "Done successfully"
Then you will be able to inspect each process log individually.