Supercomputing |
Supercomputing Environments
Access [Back to top]Before anyone can access a high performance computing system at OSC, they must have a valid account for the given system. To apply for an account, see Accounts. Once you have an account, you can connect to our systems using any SSH client. For more information on getting an SSH client for your computer, consult our FAQ on the subject. Once you have installed a client, the following table will help you configure it to connect to our HPC systems. Table 1. Hostnames and Operating Systems
Environment [Back to top]All high performance computing systems at OSC run a flavor of UNIX. For more information, see Table 1. Each user is assigned a permanent file area known as their home filesystem. On all of the machines users share the same home filesystem that is mounted from our Mass Storage System. Your home area is the same whenever you login to any one of those systems. In addition to your home areas, each system has a temporary area known as /tmp. On the cluster systems, /tmp is not shared between nodes. This is typically a very large area where users may execute codes that produce large intermediate files. A few items to note about /tmp are
When using /tmp, either create a directory under /tmp with the same name as your userid or use the TMPDIR environment variable which is automatically assigned a unique directory name for the duration of an interactive or batch session. The Center provides a few basic skeleton files to help you get started. These files are often referred to as dot files because they begin with a ".". Typically, they do not appear in a directory listing. To display a listing of these files, use ls -a command. The files are
For most systems, the default shell (command processor) is the Korn shell. To change the default shell, contact oschelp@osc.edu. PGI, GNU, and Intel Compilers are available on all OSC Systems. Table 2. Compiling Systems and Commands
Parallel EnvironmentsTable 3 provides a summary of the parallel environments and types of memory available on the high-performance computers at OSC. Table 3. Parallel Environments
Scheduling Policies [Back to top]Scheduling of the cluster's computing resources is handled by software called Moab, which is configured with a numerous of scheduling policies to keep in mind: * Limits: By default, an individual user can have up to 128 concurrently running jobs and/or up to 2048 processor cores in use, and all the users in a particular group/project can between them have up to 192 concurrently running jobs and/or up to 2048 processor cores in use. Serial jobs (that is, jobs which request only one node) can run for up to 168 hours, while parallel jobs may run for up to 96 hours. In addition, a user may have no more than 1000 jobs submitted to the batch system at once. However, exceptions to these limits can be made under certain circumstances; please contact oschelp@osc.edu for details. * Priority: The priority of a job is influenced by a large number of factors, including the processor count requested, the length of time the job has been waiting, and how much other computing has been done by the user and their group over the last several days. However, having the highest priority does not necessarily mean that a job will run immediately, as there must also be enough processors and memory available to run it. * Backfill: During each scheduling iteration, the scheduler will identify the highest priority job that cannot currently be run and find a time in the future to reserve for it. Once that is done, the scheduler will then try to backfill in as many lower priority jobs as it can without affecting the highest priority job's start time. This keeps the overall utilization of the system high while still allowing reasonable turnaround time for high priority jobs. * Debugging: A small number of nodes are set aside during the day for jobs with a walltime limit of 1 hour or less. * Preemption: Serial jobs may be preempted in favor of higher priority parallel jobs in certain circumstances. Jobs which are preempted are effectively suspended in memory and should resume execution once the job that preempted them completes. Batch Processing [Back to top]The login nodes of the HPC clusters at OSC are reserved for interactive use, and very short execution times. There are, typically, many users logged onto the login nodes at one time. Extensive calculations would severly deprecate the resources on those nodes. So, the resources of time and memory are limited on the login nodes. Use the 'limit' command to view interactive limits on CPU time, memory size, disk size, etc. There are many advantages for running in batch mode. The batch system is the only way to access multiple processors. The use of batch processes increases the resources available to hpc users, and makes sure all users can get equal access to those resources. By enforcing scheduling policies improves system efficiency by weighing user requirements against the system load. A log file is generated for each batch request. Follow this link for a discussion of the [batch systems] on OSC's hpc systems. Performance Analysis and Optimization [Back to top]Performance analysis and tuning is an important part of code development, particularly for large, resource-limited applications. Optimization allows you to get results quicker and/or minimize resource consumption. For general information on measuring code performance, including basic optimization strategies, see: Basic Optimization Strategies The links in Table 4 provide basic information on the most useful performance analysis tools available on the OSC systems, including ways of determining standard performance metrics, e.g., the MFLOP rating. Table 4. Performance Analysis Tools
Usage by System [Back to top] |

