With the addition of 160 Nvidia P100 GPUs to the Owens cluster OSC now offers GPU computing on all its systems. While GPUs can provide a significant boost in performance for some applications the computing model is very different from the CPU. This page will discuss some of the ways you can use GPU computing at OSC.
Accessing GPU Resources
To request nodes with a GPU add the
gpus=# attribute to the PBS nodes directive in your batch script, for example, on Owens,
#PBS -l nodes=2:ppn=28:gpus=1
On Oakley you can request 1 or 2 GPUs.
In most cases you'll need to load the cuda module (
module load cuda) to make the necessary Nvidia libraries available.
There is no additional RU charge for GPUs.
Using GPU-enabled Applications
We have several supported applications that can use GPUs. This includes
- Machine learning / Neural networks
- Molecular mechanics / dynamics
- General mathematics
- Engineering and Quantum Chemistry applications are expected to follow.
Please see the software pages for each application. They have different levels of support for multi-node jobs, cpu/gpu work sharing, and environment set-up.
Libraries with GPU Support
There are a few libraries that provide GPU implementations of commonly used routines. While they mostly hide the details of using a GPU there are still some GPU specifics you'll need to be aware of, e.g. device initialization, threading, and memory allocation.
MAGMA is an implementation of BLAS and LAPACK with multi-core (SMP) and GPU support. There are some differences in the API of standard BLAS and LAPACK.
cuBLAS is a highly optimized BLAS from NVIDIA. There are a few versions of this library, from very GPU-specific to nearly transparent. cuSPARSE is a BLAS-like library for sparse matrices.
The MAGMA library is built on cuBLAS.
cuDNN is NVIDIA's Deep Neural Network machine learning library. Many ML applications are built on cuDNN.
Direct GPU Programming
GPUs present a different programming model from CPUs so there is a significant time investment in going this route.
OpenACC is a directives-based model similar to OpenMP. Currently this is only supported by the Portland Group C/C++ and Fortran compilers.
OpenCL is a set of libraries and C/C++ compiler extensions supporting GPUs (NVIDIA and AMD) and other hardware accelerators. The CUDA module provides an OpenCL library.
About OSC GPU Hardware
Our GPUs span several generations with different capabilites and ease-of-use. Many of the differences won't be visible when using applications or libraries, but some features and applications may not be supported on the older models.
The M2070 is now a legacy product. It has a CUDA compute capability of 2.0. It is not supported by the latest CUDA 8 drivers and development environment.
Each M2070 has 5.5GB of memory. And they still provide a significant speed-up over the CPU.
The K40 has a compute capability of 3.5, which is supported by most applications.
Each K40 has 12GB of memory.
The P100 is NVIDIA's flagship GPU with a compute capability of 6.0. The 6.0 capability includes unified shared CPU/GPU memory -- the GPU now has its own virtual memory capability and can map CPU memory into its address space.
Each P100 has 16GB of on-board memory
There are example jobs and code at GitHub