CUDA

CUDA™ (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by Nvidia that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Availability and Restrictions

Versions

CUDA is available on the clusters supporting GPUs. The versions currently available at OSC are:

Version Ruby Owens pitzer
5.0.35 X    
5.5.22 X    
6.0.37 X    
6.5.14 X    
7.0.28 X    
7.5.18 X    
8.0.44 X X  
8.0.61 X X  
9.0.176     X
9.1.85 X X  
9.2.88 X* X* X*
10.0.130 X X X
* Current default version

You can use module spider cuda to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

CUDA is available for use by all OSC users.

Publisher/Vendor/Repository and License Type

Nvidia, Freeware 

Usage

Usage on Ruby

Set-up on Ruby

To load the default version of CUDA module, use module load cuda. To select a particular software version, use   module load cuda/version. For example, use module load cuda/7.0.28 to load CUDA version 7.0.28 on Ruby. 

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. 

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda-version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu. This will create an executable by name mycudaApp.

The environment variable OSC_CUDA_ARCH defined in the module can be used to specify the CUDA_ARCH, to compile with nvcc -o mycudaApp -arch=$OSC_CUDA_ARCH mycudaApp.cu.

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Setting the GPU compute mode on Ruby

The GPUs on Ruby can be set to different compute modes as listed here.

They can be set by adding the following to the GPU specification:

-l nodes=1:ppn=20:gpus=1:default
-l nodes=1:ppn=20:gpus=1:exclusive_process

The compute mode exclusive_process is the default on GPU nodes if a compute mode is not specified. With this mode, mulitple CUDA processes across GPU nodes are not allowed, e.g CUDA processes via MPI. If you need to run a MPI-CUDA job, please set the compute mode to  default

Batch Usage on Ruby

When you log into ruby.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info. 

Interactive Batch Session

For an interactive batch session one can run the following command:

qsub -I -l nodes=1:ppn=20:gpus=1 -l walltime=00:20:00 

which requests one whole node with 20 cores (-l nodes=1:ppn=20), for a walltime of 20 minutes (-l walltime=00:20:00). You may adjust the numbers per your need.

Non-interactive Batch Job (Serial Run)

A batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (job.txt) for a serial run:

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Usage on Owens

Set-up on Owens

To load the default version of CUDA module, use module load cuda. To select a particular software version, use   module load cuda/version

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. 

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda-version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu. This will create an executable by name mycudaApp.

The environment variable OSC_CUDA_ARCH defined in the module can be used to specify the CUDA_ARCH, to compile with nvcc -o mycudaApp -arch=$OSC_CUDA_ARCH mycudaApp.cu.

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Setting the GPU compute mode on Owens

The GPUs on Owens can be set to different compute modes as listed here.

They can be set by adding the following to the GPU specification:

-l nodes=1:ppn=28:gpus=1:default
-l nodes=1:ppn=28:gpus=1:exclusive_process

The compute mode exclusive_process is the default on GPU nodes if a compute mode is not specified. With this mode, mulitple CUDA processes across GPU nodes are not allowed, e.g CUDA processes via MPI. If you need to run a MPI-CUDA job, please set the compute mode to  default

Batch Usage on Owens

When you log into owens.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info. 

Interactive Batch Session

For an interactive batch session one can run the following command:

qsub -I -l nodes=1:ppn=28:gpus=1 -l walltime=00:20:00 

which requests one whole node with 28 cores (-l nodes=1:ppn=28), for a walltime of 20 minutes (-l walltime=00:20:00). You may adjust the numbers per your need.

Non-interactive Batch Job (Serial Run)

batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (job.txt) for a serial run:

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Usage on Pitzer

Set-up on Pitzer

To load the default version of CUDA module, use module load cuda.

GPU Computing SDK

The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. 

Programming in CUDA

Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/

Compiling CUDA Code

One can type module show cuda-version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say mycudaApp.cu, the following could be done after loading the appropriate CUDA module: nvcc -o mycudaApp mycudaApp.cu. This will create an executable by name mycudaApp.

The environment variable OSC_CUDA_ARCH defined in the module can be used to specify the CUDA_ARCH, to compile with nvcc -o mycudaApp -arch=$OSC_CUDA_ARCH mycudaApp.cu.

Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.

Debugging CUDA code

cuda-gdb can be used to debug CUDA codes. module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.

Detecting memory access errors

CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.

Setting the GPU compute mode on Pitzer

The GPUs on Pitzer can be set to different compute modes as listed here.

They can be set by adding the following to the GPU specification:

-l nodes=1:ppn=40:gpus=2:default
-l nodes=1:ppn=40:gpus=2:exclusive_process

The compute mode exclusive_process is the default on GPU nodes if a compute mode is not specified. With this mode, mulitple CUDA processes across GPU nodes are not allowed, e.g CUDA processes via MPI. If you need to run a MPI-CUDA job, please set the compute mode to  default

Batch Usage on Pitzer

When you log into pitzer.osc.edu you are actually logged into a linux box referred to as the login node. To gain access to the mutiple processors in the computing environment, you must submit your job to the batch system for execution. Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations and Batch Limit Rules for more info. 

Interactive Batch Session

For an interactive batch session one can run the following command:

qsub -I -l nodes=1:ppn=40:gpus=2 -l walltime=00:20:00 

which requests one whole node with 40 cores (-l nodes=1:ppn=40), for a walltime of 20 minutes (-l walltime=00:20:00). You may adjust the numbers per your need.

Non-interactive Batch Job (Serial Run)

batch script can be created and submitted for a serial or parallel run. You can create the batch script using any text editor you like in a working directory on the system of your choice. Below is the example batch script (job.txt) for a serial run:

#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -N compute
#PBS -j oe
module load cuda
cd $HOME/cuda
cp mycudaApp $TMPDIR
cd $TMPDIR
./mycudaApp

Further Reading

Online documentation is available on the CUDA homepage.

Compiler support for the latest version of CUDA is available here.

Supercomputer: 
Service: 
Technologies: 
Fields of Science: