This page documents the known issues for migrating jobs from Torque to Slurm.
$PBS_NODEFILE and $SLURM_JOB_NODELIST
Please be aware that $PBS_NODEFILE is a file while $SLURM_JOB_NODELIST is a string variable.
The analog on Slurm to cat $PBS_NODEFILE is srun hostname | sort -n
Environment variables are not evaluated in job script directives
Environment variables do not work in a slurm directive inside a job script.
The job script job.txt including #SBATCH --output=$HOME/jobtest.out won't work in Slurm. Please use the following instead:
sbatch --output=$HOME/jobtest.out job.txt
Using mpiexec with Intel MPI
Intel MPI (all versions through 2019.x) is configured to support PMI and Hydra process managers. It is recommended to use srun as the MPI program launcher. This is a possible symptom of using mpiexec/mpirun:
as well as:
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
If you prefer using mpiexec/mpirun with SLURM, please add the following code to the batch script before running any MPI executable:
unset I_MPI_PMI_LIBRARY export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 # the option -ppn only works if you set this before
Executables with a certain MPI library using SLURM PMI2 interface
e.g.
Stopping mpi4py python processes during an interactive job session only from a login node:
pbsdcp with Slurm
pbsdcp with gather option sometimes does not work correctly. It is suggested to use sbcast for scatter option and sgather for gather option instead of pbsdcp. Please be aware that there is no wildcard (*) option for sbcast / sgather . And there is no recursive option for sbcast.In addition, the destination file/directory must exist.
Here are some simple examples:
sbcast <src_file> <nodelocaldir>/<dest_file> sgather <src_file> <shareddir>/<dest_file> sgather -r --keep <src_dir> <sharedir>/dest_dir>
Signal handling in slurm
The below script needs to use a wait command for the user-defined signal USR1 to be received by the process.
The sleep process is backgrounded using & wait so that the bash shell can receive signals and execute the trap commands instead of ignoring the signals while the sleep process is running.
#!/bin/bash
#SBATCH --job-name=minimal_trap
#SBATCH --time=2:00
#SBATCH --nodes=1 --ntasks-per-node=1
#SBATCH --output=%x.%A.log
#SBATCH --signal=B:USR1@60
function my_handler() {
echo "Catching signal"
touch $SLURM_SUBMIT_DIR/job_${SLURM_JOB_ID}_caught_signal
exit
}
trap my_handler USR1
trap my_handler TERM
sleep 3600 &
wait
reference: https://bugs.schedmd.com/show_bug.cgi?id=9715
'mail' does not work; use 'sendmail'
The 'mail' does not work in a batch job; use 'sendmail' instead as:
sendmail user@example.com <<EOF subject: Output path from $SLURM_JOB_ID from: user@example.com ... EOF
srun' with no arguments is to allocate a single task when using 'sinteractive'
srun with no arguments is to allocate a single task when using sinteractive to request an interactive job, even you request more than one task. Please pass the needed arguments to srun:
[xwang@owens-login04 ~]$ sinteractive -n 2 -A PZS0712 ... [xwang@o0019 ~]$ srun hostname o0019.ten.osc.edu [xwang@o0019 ~]$ srun -n 2 hostname o0019.ten.osc.edu o0019.ten.osc.edu
Be careful not to overwrite a Slurm batch output file for a running job
Unlike a PBS batch output file, which lived in a user-non-writeable directory while the job was running, a Slurm batch output file resides under the user's home directory while the job is running. File operations, such as editing and copying, are permitted. Please be careful to avoid such operations while the job is running. In particular, this batch script idiom is no longer correct (e.g., for the default job output file of name $SLURM_SUBMIT_DIR/slurm-jobid.out):
Please submit any issue using the webform below: