This document serves as a knowledge base for properly managing and diagnosing threading issues in user jobs. It focuses on OpenMP, Intel Math Kernel Library (MKL), and common thread-related misuse at OSC.
Intel MKL is widely used in HPC for linear algebra, FFTs, and statistical routines. MKL is multithreaded by default, which can significantly improve performance but only when correctly configured.
|
Variable |
Applies To |
Description |
|---|---|---|
|
|
All OpenMP programs |
Sets the number of threads for OpenMP. Recognized by all compilers. |
|
|
Intel MKL libraries |
Sets the number of threads for MKL. Takes precedence over |
MKL_NUM_THREADS for its internal operations, even if OMP_NUM_THREADS is higher.KMP_NUM_THREADS, etc.).Users often run programs in parallel using MPI or other approaches without realizing that the program was built with MKL threading or OpenMP enabled. While they may request sufficient resources for their primary parallelization method, MKL threading can still be automatically activated (as described above), leading to CPU oversubscription and performance degradation.
Commonly affected applications at OSC include R, LAMMPS, and GROMACS.
Consider an MPI job that requests 8 CPUs:
#!/bin/bash #SBATCH --ntasks-per-node=8 srun /path/to/mpi/program
Without properly setting OMP_NUM_THREADS or MKL_NUM_THREADS, each MPI rank may spawn 8 threads by default. This results in a total of 64 threads (8 threads × 8 ranks), which exceeds the allocated CPU resources. Such oversubscription can severely degrade performance, interfere with other users' jobs on the same node, and in extreme cases, even crash the node.
MKL_NUM_THREADS=1 unless performance tuning suggests otherwise.--cpus-per-task=N and set OMP_NUM_THREADS=N accordingly.OMP_NUM_THREADS=1 to disable threading safely.#!/bin/bash #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=8 export MKL_NUM_THREADS=1 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun /path/to/mpi/program
#!/bin/bash #SBATCH --ntasks-per-node=8 export MKL_NUM_THREADS=1 export OMP_NUM_THREADS=1 srun /path/to/mpi/program
There are several cases where the main program is not explicitly built with MKL threading or OpenMP enabled, but its dependent libraries are. A common example is a Python program that uses NumPy. Certain NumPy operations, such as np.dot, can leverage MKL or OpenMP internally and spawn multiple threads.
In such cases, if you are unsure whether threading is needed, it is safest to follow the example above and explicitly set:
export OMP_NUM_THREADS=1 export MKL_NUM_THREADS=1
This ensures controlled thread usage and prevents unexpected oversubscription.
Some programs not designed for HPC environments may spawn multiple subprocesses or determine the number of threads by directly reading system information from /proc/cpuinfo, ignoring Slurm-imposed resource limits. In such cases, the standard thread control methods described above may not work, as the internal settings override user-defined environment variables.
Sometimes, these programs offer command-line options or configuration parameters to control threading. Users should consult the program's documentation and explicitly set the number of threads as appropriate for their job's allocated resources.