CUDA™ (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by Nvidia that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).
Availability and Compatability
CUDA is available on Oakley and Glenn Clusters. The versions currently available at OSC are
CUDA is available for use by all OSC users.
module avail to view available modules for a given machine. To load the appropriate CUDA module, type:
module load software-name.
For example: To select CUDA version 4.1.28 on Oakley, type:
module load cuda/4.1.28
GPU Computing SDK
The NVIDIA GPU Computing SDK provides hundreds of code samples and covers a wide range of applications/techniques to help you get started on the path of writing software with CUDA C/C++ or DirectCompute. On Oakley, the SDK has been installed in
$CUDA_HOME (an environment variable set when you load the module).
Programming in CUDA
Please visit the following link to learn programming in CUDA, http://developer.nvidia.com/cuda-education-training. The link also contains tutorials on Optimizing CUDA codes to obtain greater SpeedUp. One can also refer to the following webpage for some more CUDA optimization techniques, http://www.cs.berkeley.edu/~volkov/
Compiling CUDA Code
One can type
module show cuda/version-number to view the list of environment variables.
To compile a cuda code contained in a file, let say
mycudaApp.cu, the following could be done after loading the appropriate CUDA module:
nvcc -o mycudaApp mycudaApp.cu
This will create an executable by name
Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using. If both cards per node is in use by a single application, please use 'cudaSetDevice'.
Debugging CUDA code
cuda-gdb can be used to debug CUDA codes.
module load cuda will make it available to you. For more information on how to use the CUDA-GDB please visit http://developer.nvidia.com/cuda-gdb.
Detecting memory access errors
CUDA-MEMCHECK could be used for detecting the source and cause of memory access errors in your program. For more information on how to use CUDA-MEMCHECK please visit http://developer.nvidia.com/cuda-memcheck.
Following are the sample batch scripts for requesting GPU nodes on Glenn and Oakley. Notice that only the second line is different in the two batch scripts. In case of Oakley one can specify the number of GPUs required.
Sample Batch Script (Glenn)
#PBS -l walltime=01:00:00 #PBS -l nodes=1:ppn=8:gpu #PBS -N compute #PBS -j oe module load cuda cd $HOME/cuda cp mycudaApp $TMPDIR cd $TMPDIR ./mycudaApp
Sample Batch Script (Oakley)
#PBS -l walltime=01:00:00 #PBS -l nodes=1:ppn=1:gpus=1 #PBS -N compute #PBS -j oe module load cuda cd $HOME/cuda cp mycudaApp $TMPDIR cd $TMPDIR ./mycudaApp
For an interactive batch session one can run the following command:
qsub -I -l nodes=1:ppn=8:gpu -l walltime=00:20:00
qsub -I -l nodes=1:ppn=1:gpus=1 -l walltime=00:20:00
Please note that on Oakley, you can request any mix of ppn and gpus you need; please see the Job Scripts page in our batch guide for more information.
Online documentation is available at http://developer.nvidia.com/nvidia-gpu-computing-documentation