This page documents the known issues for migrating jobs from Torque to Slurm.
Please be aware that $PBS_NODEFILE
is a file while $SLURM_JOB_NODELIST
is a string variable.
The analog on Slurm to cat $PBS_NODEFILE
is srun hostname | sort -n
Environement variables do not work in a slurm directive inside a job script.
The job script job.txt
including #SBATCH --output=$HOME/jobtest.out
won't work in Slurm. Please use the following instead:
sbatch --output=$HOME/jobtest.out job.txt
Intel MPI (all versions through 2019.x) is configured to support PMI and Hydra process managers. It is recommended to use srun
as the MPI program launcher. This is a possible symptom of using mpiexec
/mpirun
:
as well as:
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
If you prefer using mpiexec
/mpirun
with SLURM, please add the following code to the batch script before running any MPI executable:
unset I_MPI_PMI_LIBRARY export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 # the option -ppn only works if you set this before
e.g.
Stopping mpi4py python processes during an interactive job session only from a login node:
pbsdcp
with gather option sometimes does not work correctly. It is suggested to use sbcast
for scatter option and sgather
for gather option instead of pbsdcp
. Please be aware that there is no wildcard (*) option for sbcast
/ sgather
. And there is no recursive option for sbcast
.In addition, the destination file/directory must exist.
Here are some simple examples:
sbcast <src_file> <nodelocaldir>/<dest_file> sgather <src_file> <shareddir>/<dest_file> sgather -r --keep <src_dir> <sharedir>/dest_dir>
The below script needs to use a wait command for the user-defined signal USR1 to be received by the process.
The sleep process is backgrounded using & wait
so that the bash shell can receive signals and execute the trap commands instead of ignoring the signals while the sleep process is running.
#!/bin/bash #SBATCH --job-name=minimal_trap #SBATCH --time=2:00 #SBATCH --nodes=1 --ntasks-per-node=1 #SBATCH --output=%x.%A.log #SBATCH --signal=B:USR1@60 function my_handler() { echo "Catching signal" touch $SLURM_SUBMIT_DIR/job_${SLURM_JOB_ID}_caught_signal exit } trap my_handler USR1 trap my_handler TERM sleep 3600 & wait
reference: https://bugs.schedmd.com/show_bug.cgi?id=9715
The 'mail' does not work in a batch job; use 'sendmail' instead as:
sendmail user@example.com <<EOF subject: Output path from $SLURM_JOB_ID from: user@example.com ... EOF
srun
with no arguments is to allocate a single task when using sinteractive
to request an interactive job, even you request more than one task. Please pass the needed arguments to srun
:
[xwang@owens-login04 ~]$ sinteractive -n 2 -A PZS0712 ... [xwang@o0019 ~]$ srun hostname o0019.ten.osc.edu [xwang@o0019 ~]$ srun -n 2 hostname o0019.ten.osc.edu o0019.ten.osc.edu