I trying start task on cluster via Torque PBS with command
qsub -o a.txt a.sh
File a.sh contain single string:
hostname
After command qsub I make qstat command, that give next output:
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
302937.voms a.sh user 00:00:00 E long
After 5 seconds command qstat return empty output (no jobs in queue). Command
qsub --version
give output: version: 2.5.13
Command
which qsub
Output: /usr/bin/qsub
The problem is that the file a.txt (from command qsub -o a.txt a.sh) is not created! In the terminal returned only job id, there is not any errors. Command
qsub a.sh
has the same behavior. How I can fix it? Where is qsub log files with errors?
If I use command
qsub -l nodes=node36:ppn=1 -o a.txt a.sh
then output files I can find in folder
/var/spool/pbs/undelivered
on node36 (after ssh login on it). Output file contain string "node36", error file is empty. Why my files is "undelivered"?
The output log and error log files are kept on the execution node in a spool directory and copied back to the head node after the job has completed. The location of the spool directory may vary. But you should look for it under
/var/torque/spool
on the first node from the list of nodes the job has been allocated.There are multiple reasons that might cause torque to fail to deliver the output files.
This list is by no means complete. Already here on Stack Overflow one can find a number of questions dealing with such a failure. Try to check if any of the above applies to your case.
You(or anyone else finding this thread) should also check out the solution given here: PBS, refresh stdout
If you have admin access, you can set
which causes the output to be written directly to the final destination.