OSC offers GPU computing on all its systems. While GPUs can provide a significant boost in performance for some applications, the computing model is very different from the CPU. This page will discuss some of the ways you can use GPU computing at OSC.
To request nodes with a GPU add the --gpus-per-node=x attribute to the directive in your batch script, for example, on Pitzer:
#SBATCH --gpus-per-node=1
In most cases you'll need to load the cuda module (module load cuda) to make the necessary Nvidia libraries available.
The GPUs on any cluster can be set to different compute modes as listed here. They can be set by adding the following to the GPU specification when using the srun command. By default it is set to shared.
srun --gpu_cmode=exclusive
or
srun --gpu_cmode=shared
The compute mode shared is the default on GPU nodes if a compute mode is not specified. With this compute mode, mulitple CUDA processes on the same GPU device are allowed.
#!/bin/bash #SBATCH --account <Project-ID> #SBATCH --job-name Pytorch_Example #SBATCH --nodes=1 #SBATCH --time=00:10:00 #SBATCH --gpus-per-node=4 ml miniconda3/4.10.3-p37 cuda/11.8.0 source activate pytorch python example.py
#!/bin/bash #SBATCH --account <Project-ID> #SBATCH --job-name Pytorch_Example #SBATCH --nodes=2 #SBATCH --time=00:10:00 #SBATCH --gpus-per-node=4 ml miniconda3/4.10.3-p37 cuda/11.8.0 source activate pytorch python example.py
If you are using Nsight GPU profiler, you may expereince an error as follows;
==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
This is because GPU monitoring service (DCGM) that we are running on the nodes by default. You can disable it and use Nisght by adding Slurm option --gres=nsight
We have several supported applications that can use GPUs. This includes
Please see the software pages for each application. They have different levels of support for multi-node jobs, cpu/gpu work sharing, and environment set-up.
There are a few libraries that provide GPU implementations of commonly used routines. While they mostly hide the details of using a GPU there are still some GPU specifics you'll need to be aware of, e.g. device initialization, threading, and memory allocation. These are available at OSC:
MAGMA is an implementation of BLAS and LAPACK with multi-core (SMP) and GPU support. There are some differences in the API of standard BLAS and LAPACK.
cuBLAS is a highly optimized BLAS from NVIDIA. There are a few versions of this library, from very GPU-specific to nearly transparent. cuSPARSE is a BLAS-like library for sparse matrices.
The MAGMA library is built on cuBLAS.
cuFFT is NVIDIA's Fourier transform library with an API similar to FFTW.
cuDNN is NVIDIA's Deep Neural Network machine learning library. Many ML applications are built on cuDNN.
GPUs present a different programming model from CPUs so there is a significant time investment in going this route.
OpenACC is a directives-based model similar to OpenMP. Currently this is only supported by the Portland Group C/C++ and Fortran compilers.
OpenCL is a set of libraries and C/C++ compiler extensions supporting GPUs (NVIDIA and AMD) and other hardware accelerators. The CUDA module provides an OpenCL library.
CUDA is the standard NVIDIA development environment. In this model explicit GPU code is written in the CUDA C/C++ dialect, compiled with the CUDA compiler NVCC, and linked with a native driver program.
If your job has low GPU utilization, consider running multiple GPU tasks within the same job using the --overlapoption, as demonstrated in the sample script below.
#!/bin/bash #SBATCH --job-name=shared-gpu #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --gpus-per-node=1 #SBATCH --gpu_cmode=shared #SBATCH --time=1:00:00 # Running 4 tasks on a shared GPU srun --overlap --gpus=1 -n 1 ./my-gpu-task1 & srun --overlap --gpus=1 -n 1 ./my-gpu-task2 & srun --overlap --gpus=1 -n 1 ./my-gpu-task3 & srun --overlap --gpus=1 -n 1 ./my-gpu-task4 & wait
OSC currently operates three HPC systems, the Cardinal, Ascend, and Pitzer clusters. These systems provide of a mix of x86 CPU cores and NVIDIA GPU devices in a range of configurations. You can find detail on available compute resources on the Cluster Computing page.
A summary of available GPU configurations is provided below: