Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.

Supercomputing Environments

Much of this information is outdated. It is left here as reference material, while we generate new updated guides. Please see the guide on batch processing at OSC and the guide on storage systems for the most up-to-date information currently online.
Current hardware can be found here.

Before anyone can access a high performance computing system at OSC, they must have a valid account for the given system. To apply for an account, see Accounts.

Once you have an account, you can connect to our systems using any SSH client. For more information on getting an SSH client for your computer, consult our FAQ on the subject. Once you have installed a client, the following table will help you configure it to connect to our HPC systems.

Table 1. Hostnames and Operating Systems

System Hostname Operating System
Oakley oakley.osc.edu LINUX
Glenn glenn.osc.edu LINUX
GPGPU systems available on Glenn and Oakley LINUX
BALE Cluster bale-login.ovl.osc.edu LINUX



[Back to top]

All high performance computing systems at OSC run a flavor of UNIX. For more information, see Table 1.

File Systems

Each user is assigned a permanent file area known as their home filesystem.

On all of the machines users share the same home filesystem that is mounted from our Mass Storage System. Your home area is the same whenever you login to any one of those systems.

In addition to your home areas, each system has a temporary area known as /tmp. On the cluster systems, /tmp is not shared between nodes. This is typically a very large area where users may execute codes that produce large intermediate files. A few items to note about /tmp are

  • the system removes files last accessed more than 24 hours previously
  • there are no disk charges associated with /tmp
  • files on /tmp are not backed-up by the system
  • files on /tmp cannot be migrated to tape

When using /tmp, either create a directory under /tmp with the same name as your userid or use the TMPDIR environment variable which is automatically assigned a unique directory name for the duration of an interactive or batch session.

Dot Files

The Center provides a few basic skeleton files to help you get started. These files are often referred to as dot files because they begin with a ".". Typically, they do not appear in a directory listing. To display a listing of these files, use ls -a command. The files are

File containing your local e-mail address.
Note: the system and the center often rely on communications via e-mail. If your local e-mail address changes, please change the contents of the .forward file.
Start-up shell script for Korn, POSIX, and Bourne shell users. Users may modify this file to add/override any environment variables or conventions that were established by the system. For a list of current environment variables on a given system, enter the env command.
Start-up shell script for C shell users. Users may modify this file to add/override any environment variables or conventions that were established by the system. For a list of current environment variables on a given system, enter the env command.
Start-up shell script for C shell users that is executed each time a new C shell is invoked. Users may modify this file to establish variables and aliases.
Note: A similar file for Korn shell users is identified with the ENV environment variable set in the .profile script.
TIP: Do not redefine your PATH environment variable without including ${PATH}. If you hard-code your PATH, it will break the modules software which all of the OSC systems use to make software packages available, and as a result you may not be able to compile or submit batch jobs. The following is a better way to modify your PATH variable:
Korn shell (.profile)
export PATH
C shell (.cshrc)
setenv PATH ${PATH}:${HOME}/bin

For most systems, the default shell (command processor) is the Korn shell. To change the default shell, contact oschelp@osc.edu.

Compiling Systems

PGI, GNU, and Intel Compilers are available on all OSC Systems.

Table 2. Compiling Systems and Commands

System Default Compilers




BALE Cluster



Parallel Environments

Table 3 provides a summary of the parallel environments and types of memory available on the high-performance computers at OSC.

Table 3. Parallel Environments

System Programming Models Memory
Oakley Automatic
      Portland group: -Mconcur
      Intel: -parallel
  • distributed between nodes
  • shared between two processors in a node
Glenn Automatic
      Portland group: -Mconcur
      Intel: -parallel
  • distributed between nodes
  • shared between two processors in a node
BALE Cluster Automatic
      Portland group: -Mconcur
      Intel: -parallel
  • distributed between nodes
  • shared between two processors in a node


Scheduling Policies

[Back to top]

Scheduling of the cluster's computing resources is handled by software called Moab, which is configured with a numerous of scheduling policies to keep in mind:

* Limits: By default, an individual user can have up to 128 concurrently running jobs and/or up to 2048 (2040 for Oakley) processor cores in use, and all the users in a particular group/project can between them have up to 192 concurrently running jobs and/or up to 2048 (2040 for Oakley) processor cores in use. Serial jobs (that is, jobs which request only one node) can run for up to 168 hours, while parallel jobs may run for up to 96 hours. In addition, a user may have no more than 1000 jobs submitted to the batch system at once. However, exceptions to these limits can be made under certain circumstances; please contact oschelp@osc.edu for details.

* Priority: The priority of a job is influenced by a large number of factors, including the processor count requested, the length of time the job has been waiting, and how much other computing has been done by the user and their group over the last several days. However, having the highest priority does not necessarily mean that a job will run immediately, as there must also be enough processors and memory available to run it.

* Backfill: During each scheduling iteration, the scheduler will identify the highest priority job that cannot currently be run and find a time in the future to reserve for it. Once that is done, the scheduler will then try to backfill in as many lower priority jobs as it can without affecting the highest priority job's start time. This keeps the overall utilization of the system high while still allowing reasonable turnaround time for high priority jobs.

* Debugging: A small number of nodes are set aside during the day for jobs with a walltime limit of 1 hour or less.

* Preemption: Serial jobs may be preempted in favor of higher priority parallel jobs in certain circumstances. Jobs which are preempted are effectively suspended in memory and should resume execution once the job that preempted them completes.

Batch Processing

[Back to top]

The login nodes of the HPC clusters at OSC are reserved for interactive use, and very short execution times. There are, typically, many users logged onto the login nodes at one time. Extensive calculations would severly deprecate the resources on those nodes. So, the resources of time and memory are limited on the login nodes. Use the 'limit' command to view interactive limits on CPU time, memory size, disk size, etc.

There are many advantages for running in batch mode. The batch system is the only way to access multiple processors. The use of batch processes increases the resources available to hpc users, and makes sure all users can get equal access to those resources. By enforcing scheduling policies improves system efficiency by weighing user requirements against the system load. A log file is generated for each batch request. Follow this link for a discussion of the [batch systems] on OSC's hpc systems.

Performance Analysis and Optimization

[Back to top]

Performance analysis and tuning is an important part of code development, particularly for large, resource-limited applications. Optimization allows you to get results quicker and/or minimize resource consumption. For general information on measuring code performance, including basic optimization strategies, see: Basic Optimization Strategies

The links in Table 4 provide basic information on the most useful performance analysis tools available on the OSC systems, including ways of determining standard performance metrics, e.g., the MFLOP rating.

Table 4. Performance Analysis Tools

System Tools
General time; Job Accounting (ja);
Profiling; C and Fortran intrinsics
Glenn Profiling with #PBS -l nodes=#:ppn=4:perfmon