CUDA™ (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by Nvidia that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).
Availability and Restrictions
Versions
CUDA is available on the clusters supporting GPUs. The versions currently available at OSC are:
Version | Owens | Pitzer | Ascend | Cardinal | cuDNN library |
---|---|---|---|---|---|
8.0.44 | X | 5.1.5 | |||
8.0.61 | X | 6.0.21 | |||
9.0.176 | X | X | 7.3.0 | ||
9.1.85 | X | X | 6.0.21 and 7.0.5 | ||
9.2.88 | X | X | 7.1.4 | ||
10.0.130 | X | X | 7.2.4 | ||
10.1.168 | X | X | 7.6.5 | ||
10.2.89 | X* | X* | 7.6.5 | ||
11.0.3 | X | X | X | 8.0.5 | |
11.1.1 | X | X | 8.0.5 | ||
11.2.2 | X | X | 8.1.1 | ||
11.5.2 | X | X | 8.3.2 | ||
11.6.1 | X | X | X | 8.3.2 | |
11.6.2 | X | ||||
11.7.1 | X* | ||||
11.8.0 | X | X | X | 8.8.1 | |
12.1.1 | X | ||||
12.2.2 | X | ||||
12.3.2 | X | ||||
12.4.1 | X* |
You can use module spider cuda
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
Access
CUDA is available for use by all OSC users.
Publisher/Vendor/Repository and License Type
Nvidia, Freeware
Usage
Usage on Owens
Set-up on Owens
To load the default version of CUDA module, usemodule load cuda
. To select a particular software version, use module load cuda/version
.
GPU Computing SDK
The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute.
Programming in CUDA
Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on optimizing CUDA codes to obtain greater speedups.
Compiling CUDA Code
Many of the tools loaded with the CUDA module can be used regardless of the compiler modules loaded. However, CUDA codes are compiled with nvcc
, which depends on the GNU compilers. In particular, if you are trying to compile CUDA codes and encounter a compiler error such as
#error -- unsupported GNU version! gcc versions later than X are not supported!
then you need to load an older GNU compiler with the module load gnu/version
command (if compiling standard C code with GNU compilers) or the module load gcc-compatibility/version
command (if compiling standard C code with Intel or PGI compilers).
One can type module show cuda-version-number
to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu
, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu
. This will create an executable by name mycudaApp
.
The environment variable OSC_CUDA_ARCH
defined in the module can be used to specify the CUDA_ARCH
, to compile with nvcc -o mycudaApp -arch=$OSC_CUDA_ARCH mycudaApp.cu
.
Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.
Debugging CUDA code
cuda-gdb can be used to debug CUDA codes. module load cuda
will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.
Detecting memory access errors
CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.
Setting the GPU compute mode on Owens
The GPUs on Owens can be set to different compute modes as listed here.
The default
compute mode is the default setting on our GPU nodes (--gpu_cmode=shared
), so you don't need to specify if you require this mode. With this mode, mulitple CUDA processes across GPU nodes are allowed, e.g CUDA processes via MPI. So, if you need to run a MPI-CUDA job, just keep the default compute mode. Should you need to use another compute mode, use --gpu_cmode
to specify the mode setting. For example:
--nodes=1 --ntasks-per-node=28 --gpus-per-node=1 --gpu_cmode=exclusive
Batch Usage on Owens
When you log into owens.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info.
Interactive Batch Session
For an interactive batch session one can run the following command:
sinteractive -A <project-account> -N 1 -n 28 -g 1 -t 00:20:00
which requests one whole node with 28 cores (-N 1 -n 1
), for a walltime of 20 minutes (-t 00:20:00
), with one gpu (-g 1
). You may adjust the numbers per your need.
Non-interactive Batch Job (Serial Run)
A batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (job.txt
) for a serial run:
#!/bin/bash #SBATCH -- time=01:00:00 #SBATCH --nodes=1 --ntasks-per-node=1:gpus=1 #SBATCH --job-name compute #SBATCH --account=<project-account> module load cuda cd $HOME/cuda cp mycudaApp $TMPDIR cd $TMPDIR ./mycudaApp
Usage on Pitzer
Set-up on Pitzer
To load the default version of CUDA module, usemodule load cuda
.
GPU Computing SDK
The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute.
Programming in CUDA
Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on optimizing CUDA codes to obtain greater speedups.
Compiling CUDA Code
Many of the tools loaded with the CUDA module can be used regardless of the compiler modules loaded. However, CUDA codes are compiled with nvcc
, which depends on the GNU compilers. In particular, if you are trying to compile CUDA codes and encounter a compiler error such as
#error -- unsupported GNU version! gcc versions later than X are not supported!
then you need to load an older GNU compiler with the module load gnu/version
command (if compiling standard C code with GNU compilers) or the module load gcc-compatibility/version
command (if compiling standard C code with Intel or PGI compilers).
One can type module show cuda-version-number
to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu
, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu
. This will create an executable by name mycudaApp
.
The environment variable OSC_CUDA_ARCH
defined in the module can be used to specify the CUDA_ARCH
, to compile with nvcc -o mycudaApp -arch=$OSC_CUDA_ARCH mycudaApp.cu
.
Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.
Debugging CUDA code
cuda-gdb can be used to debug CUDA codes. module load cuda
will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.
Detecting memory access errors
CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.
Setting the GPU compute mode on Pitzer
The GPUs on Pitzer can be set to different compute modes as listed here.
The default
compute mode is the default setting on our GPU nodes (--gpu_cmode=shared
), so you don't need to specify if you require this mode. With this mode, mulitple CUDA processes across GPU nodes are allowed, e.g CUDA processes via MPI. So, if you need to run a MPI-CUDA job, just keep the default compute mode. Should you need to use another compute mode, use --gpu_cmode
to specify the mode setting. For example:
--nodes=1 --ntasks-per-node=40 --gpus-per-node=1 --gpu_cmode=exclusive
Batch Usage on Pitzer
When you log into pitzer.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info.
Interactive Batch Session
For an interactive batch session one can run the following command:
sinteractive -A <project-account> -N 1 -n 40 -g 2 -t 00:20:00
which requests one whole node (-N 1), 40 cores (-n 40), 2 gpus (-g 2), and a walltime of 20 minutes (-t 00:20:00). You may adjust the numbers per your need.
Non-interactive Batch Job (Serial Run)
A batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (job.txt
) for a serial run:
#!/bin/bash #SBATCH --time=01:00:00 #SBATCH --nodes=1 --ntasks-per-node=1 --gpus-per-node=1 #SBATCH --job-name Compute #SBATCH --account=<project-account> module load cuda cd $HOME/cuda cp mycudaApp $TMPDIR cd $TMPDIR ./mycudaApp
CUDA Architecture
As mentioned in the previous Usage sections, to ensure that your the application you build runs regardless of changes to CUDA drivers, make sure you specify the architecture at runtime. You can use the helper OSC_CUDA_ARCH
environment variable defined the cuda
module to build your applications nvcc -o mycudaApp -arch=$OSC_CUDA_ARCH mycudaApp.cu
.
GNU Compiler Support for NVCC
CUDA Version | Max supported GCC version |
---|---|
9.2.88 - 10.0.130 | 7 |
10.1.168 - 10.2.89 | 8 |
11.0 | 9 |
11.1 - 11.4.0 | 10 |
11.4.1 - 11.8 | 11 |
12.0 | 12.1 |
Further Reading
Online documentation is available on the CUDA homepage.
Compiler support for the latest version of CUDA is available here.
CUDA optimization techniques.