Supercomputing |
Supercomputing EnvironmentsGPGPU Hardware InformationThe Ohio Supercomputer Center (OSC) provides supercomputing services to Ohio colleges, universities, and companies. There are 36 GPU-capable nodes on Glenn, connected to 18 Quadro Plex S4's for a total of 72 CUDA-enabled graphics devices.Each node has access to two Quadro FX 5800-level graphics cards. IMPORTANT NOTE:Update: 7/11/2011 - Updated information on how to request GPU resources. Please read the section on batch requests for more information. Update: 10/25/2010 - Updated information on how to connect to the compute nodes. Please read the section on batch requests for more information. - Updated the default cuda module to use 3.1.
INDEXPlease see the hardware section for current system specifications. Getting startedThis page is meant for users to get familiar with CUDA development in the glenn environment. Many topics are not covered, so some degree of knowledge about our systems is necessary. This is a good starting place if you are not familar with the glenn system. Note: If you are familiar with CUDA development and don't plan to read through the whole page, please take a look at the important section describing exclusive mode under Coding in CUDA. This is specific to our machines, and is very important to how CUDA applications are run on the glenn cluster. The CUDA toolkit contains the CUDA Runtime API and CUDA Driver API libraries needed to run a CUDA application. The CUDA Runtime API is used in most of the examples. The Driver API is a low-level interface. The CUDA SDK is a collection of example programs illustrating various aspects of CUDA and GPGPU usage. It includes some utilities that are intended primarily for use with the examples. Use of the SDK in production code is not recommended. Users who want to use the SDK should install it in their home directories. (See instructions below). Documentation for CUDA can be found here. Batch requestsBatch requests can be made in both interactive and batch sessions. To request a node with GPU capabilities, add the following option to your node options in your PBS submission:
Here is what an example PBS file will look like to request an entire node: #PBS -l walltime=40:00:00 For an interactive session, this is the command to use to request an entire node: qsub -I -l nodes=1:ppn=8:gpu -l walltime=12:00:00 Programming environmentGlenn currently supports CUDA for GPGPU computation. Please see the CUDA section below for more information on how to load the module. CUDAThe newest version of the CUDA toolkit is currently supported. Use the command: module load cuda to load CUDA v3.1 into your path. SDK(s)Users can download the CUDA SDK themselves, or use the version currently available under the module. To setup the SDK from the module, simply execute: > sh /usr/local/cuda-3.1/gpucomputingsdk_3.1_linux.run Press enter for both options. The module has added the correct path to find the toolkit. To build the examples > cd ~/NVIDIA_GPU_Computing_SDK/C/ > make Binaries will be place in 'bin/linux/release'. Most of the demo's will not work because there is no X-display running. The list of examples that will work: deviceQuery, matrixMul, ... The examples that will not work: oceanFFT, simpleGL, ... Coding in CUDATo include CUDA directories, you can use a direct path to the include and lib directories from /usr/local/cuda-3.1/cuda, or you can use the environment variable CUDA_INSTALL_PATH. Here is a sample Makefile to help:
It is not recommended to use other compilers. Once proper support is provided through the compiler, then more options will be available (for example, PGI's compiler).
Important: The devices are configured in exclusive mode. This means that 'cudaSetDevice' should NOT be used if requesting one GPU resource. Once the first call to CUDA is executed, the system will figure out which device it is using.$(CUDA_CC) $(CUDA_FLAGS) $(CUDA_INC) -o [cuda.cu.obj] [cuda.cu] If both cards per node is in use by a single application, please use 'cudaSetDevice'. DebuggingThere are currently no graphical tools to aid in debugging a CUDA kernel at this time. The current method for debugging a CUDA kernel is to use device emulation in which the CUDA thread is run on the CPU. For application debugging, please take a look here cuda-gdb is available to anyone that loads the cuda module. Please see here for information on how to use the debugger. Performancecudaprof is available for performance analysis of a CUDA kernel. For more information on CPU performance, please take a look at our Performance Analysis page. CUDA development can also be combined with other parallel coding techniques. These examples will run on glenn and should provide insight to how CUDA can be combined. Please download the link for the examples multiLevel.tar or multiLevel.zip. In order to compile these examples, you will need to install the CUDA SDK, and modify the Makefile to point to the correct SDK path. The variable that needs to be modified is SDKDIR towards the top of the file. Also, please ensure that the correct mpi module is loaded. Use this command, 'module switch mpi mvapich2-1.2p1-gnu' to switch the module. Now you can type 'make' to build the examples.
CUDA + OpenMP This example shows how a user would call CUDA with OpenMP. With this setup, users will run multiple OpenMP threads on a node. Since the gpu's are in exclusive mode, each node used can only have two threads accessing a video card at a time. CUDA + MPI + OpenMP For this example, we will combine both MPI, CUDA, and OpenMP into a single application. The recommended pattern to use when developing a hybrid approach like this is to use MPI to distribute to the nodes, and OpenMP to thread on the nodes. And just like the other two models, you will be limited to two threads running per node for the GPU's. Software Training |
