CUDA

CUDA™ (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by Nvidia that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Availability and Restrictions

CUDA is available on Ruby, Oakley, and Glenn Clusters. The versions currently available at OSC are

Version Oakley Ruby notes
5.0.35 X X Default version on Oakley prior to 09/15/2015
5.5 X X  
6.0.37 X X  
6.5.14 X* X*  
7.0.28 X X  
7.5.18   X  
*: Current default version

You can use module avail cuda to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

CUDA is available for use by all OSC users.

Usage

Usage on Oakley

Set-up on Oakley

To load the default version of CUDA module, use  module load cuda. To select a particular software version, use   module load cuda/version. For example, use  module load cuda/7.0.28 to load CUDA version 7.0.28 on Oakley. 

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. On Oakley, the SDK binaries are located in $CUDA_HOME/bin/linux/release ( $CUDA_HOME is an environment variable set when you load the module).

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda-version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu This will create an executable by name mycudaApp.

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Batch Usage on Oakley

When you log into oakley.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info.  Batch jobs run on the compute nodes of the system and not on the login node. It is desirable for big problems since more resources can be used.

Interactive Batch Session

For an interactive batch session one can run the following command:

qsub -I -l nodes=1:ppn=1:gpus=1 -l walltime=00:20:00

which requests one core ( -l nodes=1:ppn=1), for a walltime of 20 minutes ( -l walltime=00:20:00). You may adjust the numbers per your need.

Please note that on Oakley, you can request any mix of ppn and gpus you need; please see the Batch Limit Rules and Job Scripts page in our batch guide for more information.
Non-interactive Batch Job (Serial Run)

A batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (  job.txt) for a serial run:

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Usage on Ruby

Set-up on Ruby

To load the default version of CUDA module, use  module load cuda. To select a particular software version, use   module load cuda/version. For example, use  module load cuda/7.0.28 to load CUDA version 7.0.28 on Ruby. 

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. 

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda-version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu This will create an executable by name mycudaApp.

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Setting the GPU compute mode on Ruby

The GPUs on Ruby can be set to different compute modes as listed here.

They can be set by adding the following to the GPU specification:

-l nodes=1:ppn=20:gpus=1:default
-l nodes=1:ppn=20:gpus=1:exclusive
-l nodes=1:ppn=20:gpus=1:exclusive_process

Note, the prohibited mode is not an option.

Batch Usage on Ruby

When you log into ruby.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info. 

Interactive Batch Session

For an interactive batch session one can run the following command:

qsub -I -l nodes=1:ppn=20:gpus=1 -l walltime=00:20:00 

which requests one whole node with 20 cores ( -l nodes=1:ppn=20), for a walltime of 20 minutes ( -l walltime=00:20:00). You may adjust the numbers per your need.

Non-interactive Batch Job (Serial Run)

A batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (  job.txt) for a serial run:

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=20:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Further Reading

Online documentation is available at http://developer.nvidia.com/nvidia-gpu-computing-documentation

See Also

Supercomputer: 
Service: 
Technologies: 
Fields of Science: