Knowledge Base

This knowledge base is a collection of important, useful information about OSC systems that does not fit into a guide or tutorial, and is too long to be answered in a simple FAQ.

Compilation Guide

As a general recommendation, we suggest selecting the newest compilers available for a new project. For repeatability, you may not want to change compilers in the middle of an experiment.

Oakley

We recommend the Intel compilers on Oakley.

Intel

Compilers

  non-MPI MPI
Fortran ifort mpif90
C icc mpicc
C++ icpc mpicxx

Recommended optimization options – not portable (run on Oakley only)

Sequential, not numerically sensitive -fast
Sequential, numerically sensitive -ipo -O2 -static -xHost
MPI, not numerically sensitive -ipo -O3 -no-prec-div -xHost
MPI, numerically sensitive -ipo -O2 -xHost
Note:  The -fast flag is equivalent to -ipo -O3 -no-prec-div -static -xHost

Recommended optimization options – portable (run on Oakley or Glenn)

Sequential, not numerically sensitive -ipo -O3 -no-prec-div –static –axSSE4.2
Sequential, numerically sensitive -ipo -O2 -static –axSSE4.2
Note:  MPI code must be built on the system where it will be run.
Note:  Other options are available for code with extreme numerical sensitivity; their description is beyond the scope of this guide.
Note:  Intel 14.0.0.080 has a bug related to generation of portable code. Add the flag -msse3 to get around it.

OpenMP

Add this flag to any of the above:

-openmp

PGI

Compilers

  non-MPI MPI
Fortran 90 or 95 pgfortran or pgf90 mpif90
Fortran 77 pgf77 mpif77
C pgcc mpicc
C++ pgCC mpicxx

Recommended optimization options

The -fast option is appropriate with all PGI compilers:

-fast
Note:  The PGI compilers can generate code for accelerators such as GPUs. Description of these capabilities is beyond the scope of this guide.
Note:  Many other options are available. See the “man” pages for the compilers.

OpenMP

Add this flag to any of the above:

-mp

gnu

Compilers

  non-MPI MPI
Fortran 90 or 95 gfortran mpif90
Fortran 77 g77 mpif77
C gcc mpicc
C++ g++ mpicxx

Recommended optimization options

The following option is recommended with the gnu compilers:

-O3

 

Note:  Many other options are available. See the “man” pages.

OpenMP

Add this flag to any of the above (except g77 and mpif77):

-fopenmp

Glenn

PGI

Compilers

  non-MPI MPI
Fortran 90 or 95 pgfortran or pgf90 mpif90
Fortran 77 pgf77 mpif77
C pgcc mpicc
C++ pgCC mpicxx

Recommended optimization options

The -fast option is appropriate with all PGI compilers:

-fast
Note:  The PGI compilers can generate code for accelerators such as GPUs. Description of these capabilities is beyond the scope of this guide.
Note:  Many other options are available. See the “man” pages for the compilers.

OpenMP

Add this flag to any of the above:

-mp

Intel

Compilers

  non-MPI MPI
Fortran ifort mpif90
C icc mpicc
C++ icpc mpicxx

Recommended optimization options

The following option is recommended with the Intel compilers on Glenn:

-O2

OpenMP

Add this flag to any of the above:

-openmp

gnu

Compilers

  non-MPI MPI
Fortran 90 or 95 gfortran mpif90
Fortran 77 g77 mpif77
C gcc mpicc
C++ g++ mpicxx

Recommended optimization options

The following option is recommended with the gnu compilers:

-O3
Note:  Many other options are available. See the “man” pages.

OpenMP

Add this flag to any of the above (except g77 and mpif77):

-fopenmp
Supercomputer: 
Technologies: 
Fields of Science: 

Out-of-Memory (OOM) or Excessive Memory Usage

Problem description

A common problem on our systems is for a user job to run a node out of memory or to use more than its allocated share of memory if the node is shared with other jobs.

If a job exhausts both the physical memory and the swap space on a node, it causes the node to crash. With a parallel job, there may be many nodes that crash. When a node crashes, the systems staff has to manually reboot and clean up the node. If other jobs were running on the same node, the users have to be notified that their jobs failed.

If your job requests less than a full node, for example, -l nodes=1:ppn=1, it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested. For example, if a system has 4GB per core and you request one core, it is your responsibility to make sure your job uses no more than 4GB. Otherwise your job will interfere with the execution of other jobs.

The memory limit you set in PBS does not work the way one might expect it to. The only thing the -l mem=xxx flag is good for is requesting a large-memory node. It does not cause your job to be allocated the requested amount of memory, nor does it limit your job’s memory usage.
Note that even if your job isn’t causing problems, swapping is extremely inefficient. Your job will run orders of magnitude slower than it would with effective memory management.

Background

Each node has a fixed amount of physical memory and a fixed amount of disk space designated as swap space. If your program and data don’t fit in physical memory, the virtual memory system writes pages from physical memory to disk as necessary and reads in the pages it needs. This is called swapping. If you use up all the memory and all the swap space, the node crashes with an out-of-memory error.

This explanation really applies to the total memory usage of all programs running on the system. If someone else’s program is using too much memory, it may be pages from your program that get swapped out, and vice versa. This is the reason we aggressively terminate programs using more than their share of memory when there are other jobs on the node.

In the world of high performance computing, swapping is almost always undesirable. If your program does a lot of swapping, it will spend most of its time doing disk I/O and won’t get much computation done. You should consider the suggestions below.

You can find the amount of memory on our systems by following the links on our Supercomputers page. You can see the memory and swap values for a node by running the Linux command free on the node. As shown below, a standard node on Oakley has 48GB physical memory and 46GB swap space.

[n0123]$ free -mo
             total       used       free     shared    buffers     cached
Mem:         48386       2782      45603          0        161       1395
Swap:        46874          0      46874

Suggested solutions

Here are some suggestions for fixing jobs that use too much memory. Feel free to contact OSC Help for assistance with any of these options.

Some of these remedies involve requesting more processors (cores) for your job. As a general rule we require you to request a number of processors proportional to the amount of memory you require. You need to think in terms of using some fraction of a node rather than treating processors and memory separately. If some of the processors remain idle, that’s not a problem. Memory is just as valuable a resource as processors.

Request whole node or more processors

Jobs requesting less than a whole node are those that have nodes=1 with ppn<12 on Oakley or ppn<8 on Glenn, for example nodes=1:ppn=1. These jobs can be problematic for two reasons. First, they are entitled to use an amount of memory proportional to the ppn value requested; if they use more they interfere with other jobs. Second, if they cause a node to crash, it typically affects multiple jobs and multiple users.

If you’re sure about your memory usage, it’s fine to request just the number of processors you need, as long as it’s enough to cover the amount of memory you need. If you’re not sure, play it safe and request all the processors on the node.

Standard Oakley nodes have 4GB per core; standard Glenn nodes have 3GB per core.

Reduce memory usage

Consider whether your job’s memory usage is reasonable in light of the work it’s doing. The code itself typically doesn’t require much memory, so you need to look mostly at the data size.

If you’re developing the code yourself, look for memory leaks. In MATLAB look for large arrays that can be cleared.

An out-of-core algorithm will typically use disk more efficiently than an in-memory algorithm that relies on swapping. Some third-party software gives you a choice of algorithms or allows you to set a limit on the memory the algorithm will use.

Use more nodes for a parallel job

If you have a parallel job you can get more total memory by requesting more nodes. Depending on the characteristics of your code you may also need to run fewer processes per node.

Here’s an example. Suppose your job on Oakley includes the following lines:

#PBS -l nodes=5:ppn=12
…
mpiexec mycode

This job uses 5 nodes, so it has 5*48=240GB total memory available to it. The mpiexec command by default runs one process per core, which in this case is 5*12=60 copies of mycode.

If this job uses too much memory you can spread those 60 processes over more nodes. The following lines request 10 nodes, giving you a total of 10*48=480GB total memory. The -npernode 6 option on the mpiexec command says to run 6 processes per node instead of 12, for a total of 60 as before.

#PBS -l nodes=10:ppn=12
…
mpiexec -npernode 6 mycode

Since parallel jobs are always assigned whole nodes, the following lines will also run 6 processes per node on 10 nodes.

#PBS -l nodes=10:ppn=6
…
mpiexec mycode

Request large-memory nodes

Oakley has eight nodes with 192GB each, four times the memory of a standard node. Oakley also has one huge-memory node with 1TB of memory; it has 32 cores.

Since there are so few of these nodes, compared to hundreds of standard nodes, jobs requesting them will often have a long wait in the queue. The wait will be worthwhile, though, If these nodes solve your memory problem.

To use the large-memory nodes on Oakley, request between 48gb and 192gb memory and 1 to 12 processors per node. Remember to request a number of processors per node proportional to your memory requirements. In most cases you’ll want to request the whole node (ppn=12). You can request up to 8 nodes but the more you request the longer your queue wait is likely to be.

Example:

#PBS -l nodes=1:ppn=12
#PBS -l mem=192gb
…

To use the huge-memory node on Oakley you must request the whole node (ppn=32). Let the memory default.

#PBS -l nodes=1:ppn=32
…

Put a virtual memory limit on your job

The sections above are intended to help you get your job running correctly. This section is about forcing your job to fail gracefully if it consumes too much memory. If your memory usage is unpredictable, it is preferable to terminate the job when it exceeds a memory usage limit rather than allow it to crowd other jobs or crash a node.

The memory limit enforced by PBS is ineffective because it only limits physical memory usage (resident set size or RSS). When your job reaches its memory limit it simply starts using virtual memory, or swap. PBS allows you to put a limit on virtual memory, but that has problems also.

We will use Linux terminology. Each process has several virtual memory values associated with it. VmSize is virtual memory size; VmRSS is resident set size, or physical memory used; VmSwap is swap space used. The number we care about is the total memory used by the process, which is VmRSS + VmSwap. What PBS allows a job to limit is VmRSS (using -l mem=xxx) or VmSize (using -l vmem=xxx).

The relationship among VmSize, VmRSS, and VmSwap is:  VmSize >= VmRSS+VmSwap. For many programs this bound is fairly tight; for others VmSize can be much larger than the memory actually used.

If the bound is reasonably tight, -l vmem=4gb provides an effective mechanism for limiting memory usage to 4gb (for example). If the bound is not tight, VmSize may prevent the program from starting even if VmRSS+VmSwap would have been perfectly reasonable. Java and some FORTRAN 77 programs in particular have this problem.

The vmem limit in PBS is for the entire job, not just one node, so it isn’t useful with parallel (multimode) jobs. PBS also has a per-process virtual memory limit, pvmem. This limit is trickier to use, but it can be useful in some cases.

Here are suggestions for some specific cases.

Serial (single-node) job using program written in C/C++

This case applies to programs written in any language if VmSize is not much larger than VmRSS+VmSwap. If your program doesn’t use any swap space, this means that vmem as reported by qstat -f or the ja command (see below) is not much larger mem as reported by the same tools.

Set the vmem limit equal to, or slightly larger than, the number of processors requested (ppn) times the memory available per processor. Example for Oakley:

#PBS -l nodes=1:ppn=1
#PBS -l vmem=4gb
Parallel (multinode) job using program written in C/C++

This suggestion applies if your processes use approximately equal amounts of memory. See also the comments about other languages under the previous case.

Set the pvmem limit equal to, or slightly larger than, the amount of physical memory on the node divided by the number of processes per node. Example for Oakley, running 12 processes per node:

#PBS -l nodes=5:ppn=12
#PBS -l pvmem=4gb
…
mpiexec mycode
Serial (single-node) job using program written in Java

I’ve only slightly tested this suggestion so far, so please provide feedback to judithg@osc.edu.

Start Java with a virtual memory limit equal to, or slightly larger than, the number of processors requested (ppn) times the memory available per processor. Example for Oakley:

#PBS -l nodes=1:ppn=1
#PBS -l vmem=4gb
…
java -Xms4096m -Xmx4096m MyJavaCode
Other situations

If you have other situations that aren’t covered here, please share them. Contact judithg@osc.edu.

How to monitor your memory usage

qstat -f

While your job is running the command qstat -f jobid will tell you the peak physical and virtual memory usage of the job so far. For a parallel job, these numbers are the aggregate usage across all nodes of the job. The values reported by qstat may lag the true values by a couple of minutes.

free

For parallel (multinode) jobs you can check your per-node memory usage while your job is running by using pdsh -j jobid free -mo on Oakley or all -j jobid free -mo on Glenn.

ja

You can put the command ja (job accounting) at the end of your batch script to capture the resource usage reported by qstat -f. The information will be written to your job output log, job_name.o123456.

OnDemand

You can also view node status graphically the OSC OnDemand Portal (ondemand.osc.edu).  Under "Jobs" select "Active Jobs". Click on "Job Status" and scroll down to see memory usage. This shows the total memory usage for the node; if your job is not the only one running there, it may be hard to interpret.

Below is a typical graph for jobs using too much memory. It shows two jobs that ran back-to-back on the same node. The first peak is a job that used all the available physical memory (blue) and a large amount of swap (purple). It completed successfully without crashing the node. The second job followed the same pattern but actually crashed the node.

Notes

If it appears that your job is close to crashing a node, we may preemptively delete the job.

If your job is interfering with other jobs by using more memory than it should be, we may delete the job.

In extreme cases OSC staff may restrict your ability to submit jobs. If you crash a large number of nodes or continue to submit problem jobs after we have notified you of the situation, this may be the only way to protect the system and our other users. If this happens, we will restore your privileges as soon as you demonstrate that you have resolved the problem.

For assistance

OSC has staff available to help you resolve your memory issues. See our Support Services page for contact information.

System Email

Occasionally, jobs that experience problems may generate emails from staff or automated systems at the center with some information about the nature of the problem. These pages provide additional information about the various emails sent, and steps that can be taken to address the problem.

Batch job aborted

Purpose

Notify you when your job terminates abnormally.

Sample subject line

PBS JOB 944666.oak-batch.osc.edu

Apparent sender

  • root <adm@oak-batch.osc.edu> (Oakley)
  • root <pbs-opt@hpc.osc.edu> (Glenn)

Sample contents

PBS Job Id: 935619.oak-batch.osc.edu
Job Name:   mailtest.job
Exec host:  n0587/5
Aborted by PBS Server
Job exceeded some resource limit (walltime, mem, etc.). Job was aborted See Administrator for help

Sent under these circumstances

These are fully automated emails send by the batch system.

Some reasons a job might terminate abnormally:

  • The job exceeded its allotted walltime, memory, virtual memory, or other limited resource. More information is available in your job log file, e.g., jobname.o123456.
  • An unexpected system problem caused your job to fail.

To turn off the emails

There is no way to turn them off at this time.

To prevent these problems

For advice on monitoring and controlling resource usage, see Monitoring and Managing Your Job.

There’s not much you can do about system failures, which fortunately are rare.

Notes

Under some circumstances you can retrieve your job output log if your job aborts due to a system failure. Contact oschelp@osc.edu for assistance.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Batch job begin or end

Purpose

Notify you when your job begins or ends.

Sample subject line

PBS JOB 944666.oak-batch.osc.edu

Apparent sender

  • root <adm@oak-batch.osc.edu> (Oakley)
  • root <pbs-opt@hpc.osc.edu> (Glenn)

Sample contents

PBS Job Id: 944666.oak-batch.osc.edu
Job Name:   mailtest.job
Exec host:  n0587/1
Begun execution
 
PBS Job Id: 944666.oak-batch.osc.edu
Job Name:   mailtest.job
Exec host:  n0587/1
Execution terminated
Exit_status=0
resources_used.cput=00:00:00
resources_used.mem=2228kb
resources_used.vmem=211324kb
resources_used.walltime=00:01:00

Sent under these circumstances

These are fully automated emails sent by the batch system. You control them through the headers in your job script. The following line requests emails at the beginning, ending, and abnormal termination of your job.

#PBS -m abe

To turn off the emails

Remove the -m option from your script and/or command line or use -m n. See PBS Directives Summary.

Notes

You can add the following command at the end of your script to have resource information written to your job output log:

ja

For more information

See PBS Directives Summary.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Batch job deleted by an administrator

Purpose

Notify you when your job is deleted by an administrator.

Sample subject line

PBS JOB 9657213.opt-batch.osc.edu

Apparent sender

  • root adm@oak-batch.osc.edu (Oakley)
  • root pbs-opt@hpc.osc.edu (Glenn)

Sample contents

PBS Job Id: 9657213.opt-batch.osc.edu
Job Name:   mailtest.job
job deleted
Job deleted at request of staff@opt-login04.osc.edu Job using too much memory. Contact oschelp@osc.edu.

Sent under these circumstances

These emails are sent automatically, but the administrator can add a note with the reason.

Some reasons a running job might be deleted:

  • The job is using so much memory that it threatens to crash the node it is running on.
  • The job is using more resources than it requested and is interfering with other jobs running on the same node.
  • The job is causing excessive load on some part of the system, typically a network file server.
  • The job is still running at the start of a scheduled downtime.

Some reasons a queued job might be deleted:

  • The job requests non-existent resources.
  • A job apparently intended for Oakley (ppn=12) was submitted on Glenn.
  • The job can never run because it requests combinations of resources that are disallowed by policy.
  • The user’s credentials are blocked on the system the job was submitted on.

To turn off the emails

There is no way to turn them off at this time.

To prevent these problems

See the Supercomputing FAQ for suggestions on dealing with specific problems.

For assistance

We will work with you to get your jobs running within the constraints of the system. Contact OSC Help for assistance. See our Support Services page for more contact information.

Emails exceeded the expected volume

Purpose

Notify you that we have placed a hold on emails sent to you from the HPC system.

Sample subject line

Emails sent to email address student@buckeyemail.osu.edu in the last hour exceeded the expected volume

Apparent sender

OSC Help <OSCHelp@osc.edu>

Explanation

When a job fails or is deleted by an administrator, the system sends you an email. If this happens with a large number of jobs, it generates a volume of email that may be viewed as spam by your email provider. To avoid having OSC blacklisted, and to avoid overloading your email account, we hold your emails from OSC.

Please note that these held emails will eventually be deleted if you do not contact us.

Sent under these circumstances

These emails are sent automatically when your email usage from OSC is deferred.

To turn off the emails

Turn off emails related to your batch jobs to reduce your overall email volume from OSC. See the -m option on the PBS Directives Summary page.

Notes

To re-enable email you must contact OSC Help.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

 

 

File system load problem

Purpose

Notify you that one or more of your jobs caused excessive load on one of the network file system directory servers.

Sample subject line

Your jobs on Oakley are causing excessive load on fs14

Apparent sender

OSC Help <OSCHelp@osc.edu> or an individual staff member

Explanation

Your jobs are causing problems with one of the network file servers. This is usually caused by submitting a large number of jobs that start at the same time and execute in lockstep.

Sent under these circumstances

These emails are sent by a staff member when the high load is traced to your jobs. Often the jobs have to be stopped or deleted.

To turn off the emails

You cannot turn off these emails. Please don’t ignore them because they report a problem that you must correct.

To prevent these problems

See the Knowledge Base article (coming soon) for suggestions on dealing with file system load problems.

For information on the different file systems available at OSC, see Available File Systems.

Notes

If you continue to submit jobs that cause these problems, your HPC account may be blocked.

For assistance

We will work with you to get your jobs running within the constraints of the system. Contact OSC Help for assistance. See our Support Services page for more contact information.

Job failure due to a system hardware problem

Purpose

Notify you that one or more of your jobs was running on a compute node that crashed due to a hardware problem.

Sample subject line

Failure of job(s) 919137 due to a hardware problem at OSC

Apparent sender

OSC Help <OSCHelp@osc.edu>

Explanation

Your job failed and was not at fault. You should resubmit the job.

Sent under these circumstances

These emails are sent by a systems administrator after a node crashes.

To turn off the emails

We don’t have a mechanism to turn off these emails. If they really bother you, contact OSC Help and we’ll try to accommodate you.

To prevent these problems

Hardware crashes are quite rare and in most cases there’s nothing you can do to prevent them. Certain types of bus errors on Glenn correlate strongly with certain applications (suggesting that they’re not really hardware errors). If you encounter this type of error you may be advised to use Oakley rather than Glenn.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Job failure due to a system software problem

Purpose

Notify you that one or more of your jobs was running on a compute node that crashed due to a system software problem.

Sample subject line

Failure of job(s) 919137 due to a system software problem at OSC

Apparent sender

OSC Help <OSCHelp@osc.edu>

Explanation

Your job failed and was not at fault. You should resubmit the job. Usually the problems are caused by another job running on the node.

Sent under these circumstances

These emails are sent by a systems administrator as part of the node cleanup process.

To turn off the emails

We don’t have a mechanism to turn off these emails. If they really bother you, contact OSC Help and we’ll try to accommodate you.

To prevent these problems

If you request a whole node (nodes=1:ppn=12 on Oakley or nodes=1:ppn=8 on Glenn) your jobs will be less susceptible to problems caused by other jobs. Other than that, be assured that we work hard to keep jobs from interfering with each other.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Job failure due to exhaustion of physical memory

Purpose

Notify you that one or more of your jobs caused compute nodes to crash with an out-of-memory error.

Sample subject line

Failure of job(s) 933014,933174 at OSC due to exhaustion of physical memory

Apparent sender

OSC Help <oschelp@osc.edu>

Explanation

Your job(s) exhausted both physical memory and swap space during job execution. This failure caused the compute node(s) used by the job(s) to crash, requiring a reboot.

Sent under these circumstances

These emails are sent by a systems administrator as part of the node cleanup process.

To turn off the emails

You cannot turn off these emails. Please don’t ignore them because they report a problem that you must correct.

To prevent these problems

See the Knowledge Base article "Out-of-Memory (OOM) or Excessive Memory Usage" for suggestions on dealing with out-of-memory problems.

For information on the memory available on the various systems, see our Supercomputing page.

Notes

If you continue to submit jobs that cause these problems, your HPC account may be blocked.

For assistance

We will work with you to get your jobs running within the constraints of the system. Contact OSC Help for assistance. See our Support Services page for more contact information.