Technical Support

Technical Support icon

OSC Help consists of technical support and consulting services for OSC's high performance computing resources. Members of OSC's HPC Client Services group comprise OSC Help.

Before contacting OSC Help, please check to see if your question is answered in either the FAQ or the Knowledge Base. Many of the questions asked by both new and experienced OSC users are answered in these web pages.

If you still cannot solve your problem, please do not hesitate to contact OSC Help:

Toll Free: (800) 686-6472
Local: (614) 292-1800
Email: oschelp@osc.edu
Submit your issue online

OSC Help hours of operation:

Level 1 support is available 24x7x365
Level 2 advanced support is available Monday through Friday, 9 am - 5 pm, except OSU holidays

OSC users also have the ability to directly impact OSC operational decisions by participating in the Statewide Users Group. Activities include managing the allocation process, advising on software licensing and hardware acquisition.

We recommend following HPCNotices on Twitter to get up-to-the-minute information on system outages and important operations-related updates.

HPC Changelog

Changes to HPC systems are listed below, optionally filtered by system.

Known Issues

Abaqus does not run in parallel on Owens

Abaqus does not run correctly in parallel (multiple nodes) on Owens with input files in $TMPDIR. You need to use scratch file system ($PFSDIR) instead. For more information, see: https://www.osc.edu/resources/available_software/software_list/abaqus

 

Service: 

Search Documentation

Search our client documentation below, optionally filtered by one or more systems.

Supercomputers

We currently operate three major systems:

  • Owens Cluster, a 23,000+ core Dell Intel Xeon machine available to clients later in 2016
  • Ruby Cluster, an 4800 core HP Intel Xeon machine
    • 20 nodes have Nvidia Tesla K40 GPUs
    • One node has 1 TB of RAM and 32 cores, for large SMP style jobs.
  • Oakley Cluster, an 8,300+ core HP Intel Xeon machineOakley computing cluster
    • One in every 10 nodes has 2 Nvidia Tesla GPU accelerators
    • One node has 1 TB of RAM and 32 cores, for large SMP style jobs

Our clusters share a common environment, and we have several guides available.

OSC also provides more than 5 PB of storage, and another 5.5 PB of tape backup.

  • Learn how that space is made available to users, and how to best utilize the resources, in our storage environment guide.

Finally, you can keep up to date with any known issues on our systems (and the available workarounds). An archive of resolved issues can be found here.

Service: 

Oakley

TIP: Remember to check the menu to the right of the page for related pages with more information about Oakley's specifics.

Oakley is an HP-built, Intel® Xeon® processor-based supercomputer, featuring more cores (8,328) on half as many nodes (694) as the center’s former flagshipsystem, the IBM Opteron 1350 Glenn Cluster. The Oakley Cluster can achieve 88 teraflops, tech-speak for performing 88 trillion floating point operations per second, or, with acceleration from 128 NVIDIA® Tesla graphic processing units (GPUs), a total peak performance of just over 154 teraflops.

Oakley infographic

Hardware

Photo: OSC Oakley HP Intel Xeon ClusterDetailed system specifications:

  • 8,328 total cores
  • Compute Node:
    • HP SL390 G7 two-socket servers with Intel Xeon x5650 (Westmere-EP, 6 cores, 2.67GHz) processors
    • 12 cores/node  & 48 gigabytes of memory/node
  • GPU Node:
    • 128 NVIDIA Tesla M2070 GPUs
  • 873 GB of local disk space in '/tmp'
  • QDR IB Interconnect (40Gbps)
    • Low latency
    • High throughput
    • High quality-of-service.
  • Theoretical system peak performance
    • 88.6 teraflops
  • GPU acceleration
    • Additional 65.5 teraflops
  • Total peak performance
    • 154.1 teraflops
  • Memory Increase
    • Increases memory from 2.5 gigabytes per core of Glenn system to 4.0 gigabytes per core.
  • System Efficiency
    • 1.5x the performance of former Glenn system at just 60 percent of current power consumption.

How to Connect

  • SSH Method

To login to Oakley at OSC, ssh to the following hostname:

oakley.osc.edu

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@ruby.osc.edu

You may see warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprint listed here, then type yes.

From there, you have access to the compilers and other software development tools. You can run programs interactively or through batch requests. See the following sections for details.

  • OnDemand Method

You can also login to Oakley at OSC with our OnDemand tool. The first is step is to login to OnDemand. Then once logged in you can access Ruby by clicking on "Clusters", and then selecting ">_Oakley Shell Access".

Instructions on how to connect to OnDemand can be found at the OnDemand documentation page.

Batch Specifics

We have recently updated qsub to provide more information to clients about the job they just submitted, including both informational (NOTE) and ERROR messages. To better understand these messages, please visit the messages from qsub page.

Refer to the documentation for our batch environment to understand how to use PBS on OSC hardware. Some specifics you will need to know to create well-formed batch scripts:

  • Compute nodes on Oakley are 12 cores/processors per node (ppn). Parallel jobs must use ppn=12 .
  • If you need more than 48 GB of RAM per node, you may run on the 8 large memory (192 GB) nodes  on Oakley ("bigmem"). You can request a large memory node on Oakley by using the following directive in your batch script: nodes=XX:ppn=12:bigmem , where XX can be 1-8.

  • We have a single huge memory node ("hugemem"), with 1 TB of RAM and 32 cores. You can schedule this node by adding the following directive to your batch script: #PBS -l nodes=1:ppn=32 . This node is only for serial jobs, and can only have one job running on it at a time, so you must request the entire node to be scheduled on it. In addition, there is a walltime limit of 48 hours for jobs on this node.
Requesting less than 32 cores but a memory requirement greater than 192 GB will not schedule the 1 TB node! Just request nodes=1:ppn=32 with a walltime of 48 hours or less, and the scheduler will put you on the 1 TB node.
  • GPU jobs may request any number of cores and either 1 or 2 GPUs.  Request  2 GPUs per a node by adding the following directive to your batch script: #PBS -l nodes=1:ppn=12:gpus=2

Using OSC Resources

For more information about how to use OSC resources, please see our guide on batch processing at OSC. For specific information about modules and file storage, please see the Batch Execution Environment page.

Service: 

Technical Specifications

The following are technical specifications for Oakley.  We hope these may be of use to the advanced user.

  Oakley System (2012)
Number oF nodes 670 nodes
Number of CPU Cores 8,328 (12 cores/node)
Cores per Node 12 cores/node
Local Disk Space per Node ~810GB in /tmp, SATA
Compute CPU Specifications

Intel Xeon x5650 (Westmere-EP) CPUs

  • 2.67 GHz 
  • 6 cores per processor
Computer Server Specifications

HP SL390 G7

Accelerator Specifications

NVIDIA Tesla M2070 

Number of accelerator Nodes

128 GPUs

Memory Per Node

48GB

Memory Per Core 4GB
Interconnect 

QDR IB Interconnect

  • 40Gbps
Login Specifications

2 Intel Xeon x5650

  • 2.67 GHz
  • 12 cores
  • 118GB memory
Special Nodes

Large Memory (8)

  • 2 Intel Xeon x5650 CPUs
    • 2.67 GHz
  • 12 cores (6 cores/CPU)
  • 192GB Memory

Huge Memory (1)

  • 4 Intel Xeon E7-8837 CPUs
    • 2.67 GHz
  • 32 cores (8 cores/CPU)
  • 1 TB Memory

 

Batch Limit Rules

Memory Limit:

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. On Oakley, it equates to 4GB/core and 48GB/node.

If your job requests less than a full node ( ppn< 12), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4GB/core).  For example, without any memory request ( mem=XX ), a job that requests  nodes=1:ppn=1  will be assigned one core and should use no more than 4GB of RAM, a job that requests  nodes=1:ppn=3  will be assigned 3 cores and should use no more than 12GB of RAM, and a job that requests  nodes=1:ppn=12  will be assigned the whole node (12 cores) with 48GB of RAM.  However, a job that requests  nodes=1:ppn=1,mem=12GB  will be assigned one core but have access to 12GB of RAM, and charged for 3 cores worth of Resource Units (RU).  See Charging for memory use for more details.

A multi-node job ( nodes>1 ) will be assigned the entire nodes with 48GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests nodes=10:ppn=1 will be charged for 10 whole nodes (12 cores/node*10 nodes, which is 120 cores worth of RU). A job that requests large-memory node ( nodes=XX:ppn=12:bigmem, XX can be 1-8) will be allocated the entire large-memory node with 192GB of RAM and charged for the whole node (12 cores worth of RU). A job that requests huge-memory node ( nodes=1:ppn=32 ) will be allocated the entire huge-memory node with 1TB of RAM and charged for the whole node (32 cores worth of RU).

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

GPU Limit:

On Oakley, GPU jobs may request any number of cores and either 1 or 2 GPUs ( nodes=XX:ppn=XX: gpus=1 or gpus=2 ). The memory limit depends on the ppn request and follows the rules in Memory Limit.

Walltime Limit

Here are the queues available on Oakley:

NAME

MAX WALLTIME

MAX JOB SIZE

NOTES

Serial

168 hours

1 node

 

Longserial

336 hours

1 node

Restricted access

Parallel

96 hours

125 nodes

 

Longparallel

250 hours

230 nodes

Restricted access

Hugemem

48 hours

1 node

32 core with 1 TB RAM

nodes=1:ppn=32

Debug

1 hour

12 nodes

 

Job Limit

An individual user can have up to 128 concurrently running jobs and/or up to 2040 processors/cores in use. All the users in a particular group/project can among them have up to 192 concurrently running jobs and/or up to 2040 processors/cores in use. Jobs submitted in excess of these limits are queued but blocked by the scheduler until other jobs exit and free up resources.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. Jobs submitted in excess of this limit will be rejected.

Citation

For more information about citations of OSC, visit https://www.osc.edu/citation.

To cite Oakley, please use the following Archival Resource Key:

ark:/19495/hpc0cvqn

Please adjust this citation to fit the citation style guidelines required.

Ohio Supercomputer Center. 2012. Oakley Supercomputer. Columbus, OH: Ohio Supercomputer Center. http://osc.edu/ark:19495/hpc0cvqn

Here is the citation in BibTeX format:

@misc{Oakley2012,
ark = {ark:/19495/hpc0cvqn},
howpublished = {\url{http://osc.edu/ark:/19495/hpc0cvqn}},
year  = {2012},
author = {Ohio Supercomputer Center},
title = {Oakley Supercomputer}
}

And in EndNote format:

%0 Generic
%T Oakley Supercomputer
%A Ohio Supercomputer Center
%R ark:/19495/hpc0cvqn
%U http://osc.edu/ark:/19495/hpc0cvqn
%D 2012

Here is an .ris file to better suit your needs. Please change the import option to .ris.

Documentation Attachment: 

Oakley SSH key fingerprints

These are the public key fingerprints for Oakley (in hexadecimal format):
oakley: ssh_host_key.pub = 01:21:16:c4:cd:43:d3:87:6d:fe:da:d1:ab:20:ba:4a
oakley: ssh_host_rsa_key.pub = eb:83:d9:ca:88:ba:e1:70:c9:a2:12:4b:61:ce:02:72
oakley: ssh_host_dsa_key.pub = ef:4c:f6:cd:83:88:d1:ad:13:50:f2:af:90:33:e9:70


These are the SHA256 hashes (in base64 format):​
oakley: ssh_host_key.pub = SHA256:685FBToLX5PCXfUoCkDrxosNg7w6L08lDTVsjLiyLQU
oakley: ssh_host_rsa_key.pub = SHA256:D7HjrL4rsYDGagmihFRqy284kAcscqhthYdzT4w0aUo
oakley: ssh_host_dsa_key.pub = SHA256:XplFCsSu7+RDFC6V/1DGt+XXfBjDLk78DNP0crf341U

Queues and Reservations

Here are the queues available on Oakley. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

 

Longserial

Available minus reservations

336 hours

1 node

Restricted access

Parallel

Available minus reservations

96 hours

125 nodes

 

Longparallel

Available minus reservations

250 hours

230 nodes

Restricted access

Hugemem

1

48 hours

1 node

 

"Available minus reservations" means all nodes in the cluster currently operational (this will fluctuate slightly), less the reservations listed below. To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

In addition, there are a few standing reservations.

Name Times Nodes Available Max Walltime Max job size notes
Debug 8AM-6PM Weekdays 12 1 hour 12 nodes For small interactive and test jobs.
GPU ALL 62 336 hours 62 nodes

Small jobs not requiring GPUs from the serial and parallel queues will backfill on this reservation.

OneTB ALL 1 48 hours 1 node Holds the 32 core, 1 TB RAM node aside for the hugemem queue.

 

Occasionally, reservations will be created for specific projects that will not be reflected in these tables.

Service: 

Ruby

Ruby is unavailable for general access. Please follow this link to request access.
TIP: Remember to check the menu to the right of the page for related pages with more information about Ruby's specifics.
On 10/13/2016, Intel Xeon Phi coprocessors on Ruby were removed from service. Please contact OSC Help if you have any questions or want to help get access to alternative resources. 

Ruby is named after the Ohio native actress Ruby Dee.  An HP built, Intel® Xeon® processor-based supercomputer, Ruby provides almost the same amount of total computing power (~144 TF) as our former flagship system Oakley on less than half the number of nodes (240 nodes).  Ruby now has 20 nodes are outfitted with NVIDIA® Tesla K40 accelerators (Ruby used to feature two distinct sets of hardware accelerators; 20 nodes are outfitted with NVIDIA® Tesla K40 and another 20 nodes feature Intel® Xeon® Phi coprocessors).

Ruby infographic

Hardware

Detailed system specifications:

  • 4800 total cores
    • 20 cores/node  & 64 gigabytes of memory/node
  • Intel Xeon E5 2670 V2 (Ivy Bridge) CPUs
  • HP SL250 Nodes
  • 20 Intel Xeon Phi 5110p coprocessors (remove from service on 10/13/2016)
  • 20 NVIDIA Tesla K40 GPUs
  • 2 NVIDIA Tesla K20X GPUs 
    • Both equiped on single "debug" queue node
  • 850 GB of local disk space in '/tmp'
  • FDR IB Interconnect
    • Low latency
    • High throughput
    • High quality-of-service.
  • Theoretical system peak performance
    • 96 teraflops
  • NVIDIA GPU performance
    • 28.6 additional teraflops
  • Intel Xeon Phi performance
    • 20 additional teraflops
  • Total peak performance
    • ~125 teraflops

Ruby has one huge memory node.

  • 32 cores (Intel Xeon E5 4640 CPUs)
  • 1 TB of memory
  • 483 GB of local disk space in '/tmp'

Ruby is configured with two login nodes.

  • Intel Xeon E5-2670 (Sandy Bridge) CPUs
  • 16 cores/node & 128 gigabytes of memory/node

How to Connect

  • SSH Method

To login to Ruby at OSC, ssh to the following hostname:

ruby.osc.edu 

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@ruby.osc.edu

You may see warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprint listed here, then type yes.

From there, you have access to the compilers and other software development tools. You can run programs interactively or through batch requests. See the following sections for details.

  • OnDemand Method

You can also login to Ruby at OSC with our OnDemand tool. The first is step is to login to OnDemand. Then once logged in you can access Ruby by clicking on "Clusters", and then selecting ">_Ruby Shell Access".

Instructions on how to connect to OnDemand can be found at the OnDemand documentation page.

File Systems

Ruby accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Oakley cluster. Full details of the storage environment are available in our storage environment guide.

Software Environment

The module system on Ruby is the same as on the Oakley system. Use  module load <package>  to add a software package to your environment. Use  module list  to see what modules are currently loaded and  module avail  to see the module that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use  module spider . By default, you will have the batch scheduling software modules, the Intel compiler and an appropriate version of mvapich2 loaded.

You can keep up to on the software packages that have been made available on Ruby by viewing the Software by System page and selecting the Ruby system.

Understanding the Xeon Phi

Guidance on what the Phis are, how they can be utilized, and other general information can be found on our Ruby Phi FAQ.

Compiling for the Xeon Phis

For information on compiling for and running software on our Phi coprocessors, see our Phi Compiling Guide.

Batch Specifics

We have recently updated  qsub  to provide more information to clients about the job they just submitted, including both informational (NOTE) and ERROR messages. To better understand these messages, please visit the messages from  qsub  page.

Refer to the documentation for our batch environment to understand how to use PBS on OSC hardware. Some specifics you will need to know to create well-formed batch scripts:

  • Compute nodes on Ruby have 20 cores/processors per node (ppn).  
  • If you need more than 64 GB of RAM per node you may run on Ruby's huge memory node ("hugemem").  This node has four Intel Xeon E5-4640 CPUs (8 cores/CPU) for a total of 32 cores.  The node also has 1TB of RAM.  You can schedule this node by adding the following directive to your batch script: #PBS -l nodes=1:ppn=32 .  This node is only for serial jobs, and can only have one job running on it at a time, so you must request the entire node to be scheduled on it.  In addition, there is a walltime limit of 48 hours for jobs on this node.
  • 20 nodes on Ruby are equiped with a single NVIDIA Tesla K40 GPUs.  These nodes can be requested by adding gpus=1 to your nodes request, like so: #PBS -l nodes=1:ppn=20:gpus=1 .
    • By default a GPU is set to the Exclusive Process and Thread compute mode at the beginning of each job.  To request the GPU be set to Default compute mode, add default to your nodes request, like so: #PBS -l nodes=1:ppn=20:gpus=1:default .
  • Ruby has 5 debug nodes which are specifically configured for short (< 1 hour) debugging type work.  These nodes have a walltime limit of 1 hour.  These nodes are equiped with E5-2670 V1 CPUs with 16 cores per a node. 
    • To schedule a debug node:
      #PBS -l nodes=1:ppn=16 -q debug

Using OSC Resources

For more information about how to use OSC resources, please see our guide on batch processing at OSC. For specific information about modules and file storage, please see the Batch Execution Environment page.

 

Service: 

Technical Specifications

The following are technical specifications for Ruby. 

  Ruby System (2014)
Number oF nodes 240 nodes
Number of CPU Sockets 480 (2 sockets/node)
Number of CPU Cores 4800 (20 cores/node)
Cores per Node 20 cores/node (32 cores/node for Huge Mem Node)
Local Disk Space per Node ~850GB in /tmp, SATA
Compute CPU Specifications

Intel Xeon E5-2670 V2 (Ivy Bridge) for compute

  • 2.5 GHz 
  • 10 cores per processor
Computer Server Specifications

200 HP SL230

40 HP SL250 (for accelerator nodes)

Accelerator Specifications

20 NVIDIA Tesla K40 

  • 1.43 TF peak double-precision performance
  • 1 GK110B GPU 
  • 2880 CUDA cores
  • 12GB memory
Number of accelerator Nodes

20 total 

  • 20 NVIDIA Tesla K40 equiped nodes
Total Memory ~16TB
Memory Per Node

64GB 

Memory Per Core 3.2GB
Interconnect  FDR/EN Infiniband (56 Gbps)
Login Specifications

2 Intel Xeon E5-2670 (Sandy Bridge) CPUs

  • 2.6 GHz
  • 16 cores/node
  • 128GB of memory/node
Special Nodes

1 Huge Memory Node

  • Dell PowerEdge R820 Server
  • 4 Intel Xeon E5-4640 CPUs
    • 2.4 GHz
  • 32 cores (8 cores/CPU)
  • 1 TB Memory

 

Programming Environment

Compilers

C, C++ and Fortran are supported on the Ruby cluster. Intel, PGI and GNU compiler suites are available. The Intel development tool chain is loaded by default. Compiler commands and recommended options for serial programs are listed in the table below. See also our compilation guide.

LANGUAGE INTEL EXAMPLE PGI EXAMPLE GNU EXAMPLE
C icc -O2 -xHost hello.c pgcc -fast hello.c gcc -O2 -march=native hello.c
Fortran 90 ifort -O2 -xHost hello.f90 pgf90 -fast hello.f90 gfortran -O2 -march=native hello.f90

Parallel Programming

MPI

The system uses the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect. MPI is a standard library for performing parallel processing using a distributed-memory model. For more information on building your MPI codes, please visit the MPI Library documentation.

Ruby uses a different version of mpiexec than Oakley. This is necessary because of changes in Torque. All OSC systems use the mpiexec command, but the underlying code on Ruby is mpiexec.hydra while the code on Oakley was developed at OSC. They are largely compatible, but a few differences should be noted.

Caution: There are many variations on mpiexec and mpiexec.hydra. Information found on non-OSC websites may not be applicable to our installation.
Note: Oakley has been updated to use the same mpiexec as Ruby.

The table below shows some commonly used options. Use mpiexec -help for more information.

OAKLEY (old) RUBY COMMENT
mpiexec mpiexec Same command on both systems
mpiexec a.out mpiexec ./a.out Program must be in path on Ruby, not necessary on Oakley.
-pernode -ppn 1 One process per node
-npernode procs -ppn procs procs processes per node
-n totalprocs
-np totalprocs
-n totalprocs
-np totalprocs
At most totalprocs processes per node (same on both systems)
-comm none   Omit for simple cases. If using $MPIEXEC_RANK, consider using pbsdsh with $PBS_VNODENUM.
-comm anything_else   Omit. Ignored on Oakley, will fail on Ruby.
  -prepend-rank Prepend rank to output
-help -help Get a list of available options

mpiexec will normally spawn one MPI process per CPU core requested in a batch job. The -pernode option is not supported by mpiexec on Ruby, instead use -ppn 1 as mentioned in the table above.

OpenMP

The Intel, PGI and gnu compilers understand the OpenMP set of directives, which give the programmer a finer control over the parallelization. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

GPU Programming

To request the GPU node on Ruby, use nodes=1:ppn=20:gpus=1. For GPU programming with CUDA, please refer to CUDA documentation. Also refer to the page of each software to check whether it is GPU enabled.

Service: 

Executing Programs

Batch Requests

Batch requests are handled by the TORQUE resource manager and Moab Scheduler as on the Oakley system. Use the qsub command to submit a batch request, qstat to view the status of your requests, and qdel to delete unwanted requests. For more information, see the manual pages for each command.

There are some changes for Ruby, they are listed here:

  • Ruby nodes have 20 cores per node, and 64 GB of memory per node. This is less memory per core than on Oakley.
  • Ruby will be allocated on the basis of whole nodes even for jobs using less than 20 cores.
  • The amount of local disk space available on a node is approximately 800 GB.
  • MPI Parallel Programs should be run with mpiexec, as on Oakley, but the underlying program is mpiexec.hydra instead of OSC's mpiexec. Type mpiexec --help for information on the command line options.

Example Serial Job

This particular example uses OpenMP.

  #PBS -l walltime=1:00:00
  #PBS -l nodes=1:ppn=20
  #PBS -N my_job
  #PBS -j oe

  cd $TMPDIR
  cp $HOME/science/my_program.f .
  ifort -O2 -openmp my_program.f
  export OMP_NUM_PROCS=20
  ./a.out > my_results
  cp my_results $HOME/science

Please remember that jobs on Ruby must use a complete node.

Example Parallel Job

    #PBS -l walltime=1:00:00
    #PBS -l nodes=4:ppn=20
    #PBS -N my_job
    #PBS -j oe

    cd $HOME/science
    mpif90 -O3 mpiprogram.f
    cp a.out $TMPDIR
    cd $TMPDIR
    mpiexec ./a.out > my_results
    cp my_results $HOME/science

For more information about how to use OSC resources, please see our guide on batch processing at OSC. For specific information about modules and file storage, please see the Batch Execution Environment page.

Service: 

Queues and Reservations

Here are the queues available on Ruby. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

 

Parallel

Available minus reservations

96 hours

40 nodes

 

Hugemem

1

48 hours

1 node

32 core with 1 TB RAM
Debug 5 1 hour 2 nodes

16 core with 128GB RAM

For small interactive and test jobs. 

Use "-q debug" to request it 

"Available minus reservations" means all nodes in the cluster currently operational (this will fluctuate slightly), less the reservations. To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

Occasionally, reservations will be created for specific projects.

Approximately half of the Ruby nodes are a part of client condo reservations. Only jobs of short duration are eligible to run on these nodes, and only when they are not in use by the condo clients. As a result, your job(s) may have to wait for eligible resources to come available while it appears that much of the cluster is idle.
Service: 

Batch Limit Rules

Full Node Charging Policy

On Ruby, we always allocate whole nodes to jobs and charge for the whole node. If a job requests less than a full node (nodes=1:ppn<20), the job execution environment is what is requested (the job only has access to the # of cores according to ppn request) with 64GB of RAM; however, the job will be allocated whole node and charge for the whole node. A job that requests nodes>1 will be assigned the entire nodes with 64GB/node and charged for the entire nodes regardless of ppn request.  A job that requests huge-memory node (nodes=1:ppn=32) will be allocated the entire huge-memory node with 1TB of RAM and charged for the whole node (32 cores worth of RU).

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

Queue Default

Please keep in mind that if you submits a job with no node specification, the default is nodes=1:ppn=20, while if you submits a job with no ppn specified, the default is nodes=N:ppn=1

Debug Node

Ruby has 5 debug nodes which are specifically configured for short (< 1 hour) debugging type work. These nodes have a walltime limit of 1 hour. These nodes are equiped with E5-2670 V1 CPUs with 16 cores per a node.  To schedule a debug node, use nodes=1:ppn=16 -q debug

GPU Node

On Ruby, 20 nodes are equipped with NVIDIA Tesla K40 GPUs (one GPU with each node).  These nodes can be requested by adding gpus=1 to your nodes request (nodes=1:ppn=20:gpus=1). 

Walltime Limit

Here are the queues available on Ruby:

NAME

MAX WALLTIME

MAX JOB SIZE

NOTES

Serial

168 hours

1 node

 

Parallel

96 hours

40 nodes

 

Hugemem

48 hours

1 node

32 core with 1 TB RAM

Debug

1 hour

5 nodes

16 core with 128GB RAM

Job Limit

An individual user can have up to 40 concurrently running jobs and/or up to 800 processors/cores in use. All the users in a particular group/project can among them have up to 80 concurrently running jobs and/or up to 1600 processors/cores in use if the system is busy. Debug queue is 1 job at a time per user. For Condo users, please contact OSC Help for more instructions.

Citation

For more information about citations of OSC, visit https://www.osc.edu/citation.

To cite Ruby, please use the following Archival Resource Key:

ark:/19495/hpc93fc8

Please adjust this citation to fit the citation style guidelines required.

Ohio Supercomputer Center. 2015. Ruby Supercomputer. Columbus, OH: Ohio Supercomputer Center. http://osc.edu/ark:19495/hpc93fc8

Here is the citation in BibTeX format:

@article{Ruby2015,
ark = {ark:/19495/hpc93fc8},
url = {http://osc.edu/ark:/19495/hpc93fc8},
year  = {2015},
author = {Ohio Supercomputer Center},
title = {Ruby Supercomputer}
}

And in EndNote format:

%0 Generic
%T Ruby Supercomputer
%A Ohio Supercomputer Center
%R ark:/19495/hpc93fc8
%U http://osc.edu/ark:/19495/hpc93fc8
%D 2015

Here is an .ris file to better suit your needs. Please change the import option to .ris.

Documentation Attachment: 

Request Access

Projects who would like to use the Ruby cluster will need to request access.  This is because of the particulars of the Ruby environment, which includes its size, GPUs, and scheduling policies.  

Motivation

Access to Ruby is done on a case by case basis because:

  • It is a smaller machine than Oakley, and thus has limited space for users
    • Oakley has 694 nodes, while Ruby only has 240 nodes.
  • It's CPUs are less general, and therefore more consideration is required to get optimal performance
  • Scheduling is done on a per-node basis, and therefore jobs must scale to this level at a bare minimum 
  • Additional consideration is required to get full performance out of its GPUs

Good Ruby Workload Characteristics

Those interested in using Ruby should check that their work is well suited for it by using the following list.  Ideal workloads will exhibit one or more of the following characteristics:

  • Work scales well to large core counts
    • No single core jobs
    • Scales well past 2 nodes on Oakley
  • Needs access to Ruby specific hardware (GPUs)
  • Memory bound work
  • Software:
    • Supports GPUs
    • Takes advantage of:
      • Long vector length
      • Higher core count
      • Improved Memory Bandwidth

Applying for Access

Those who would like to be considered for Ruby access should send the following in a email to OSC Help:

  • Name
  • Project ID
  • Plan for using Ruby
  • Evidence of workload being well suited for Ruby

Ruby SSH key fingerprints

These are the public key fingerprints for Ruby (in hexadecimal format):
ruby: ssh_host_key.pub = 01:21:16:c4:cd:43:d3:87:6d:fe:da:d1:ab:20:ba:4a
ruby: ssh_host_rsa_key.pub = eb:83:d9:ca:88:ba:e1:70:c9:a2:12:4b:61:ce:02:72
ruby: ssh_host_dsa_key.pub = ef:4c:f6:cd:83:88:d1:ad:13:50:f2:af:90:33:e9:70


These are the SHA256 hashes (in base64 format):​
ruby: ssh_host_key.pub = SHA256:685FBToLX5PCXfUoCkDrxosNg7w6L08lDTVsjLiyLQU
ruby: ssh_host_rsa_key.pub = SHA256:D7HjrL4rsYDGagmihFRqy284kAcscqhthYdzT4w0aUo
ruby: ssh_host_dsa_key.pub = SHA256:XplFCsSu7+RDFC6V/1DGt+XXfBjDLk78DNP0crf341U

Owens

TIP: Remember to check the menu to the right of the page for related pages with more information about Owens' specifics.

OSC's Owens cluster being installed in 2016 is a Dell-built, Intel® Xeon® processor-based supercomputer.

Owens infographic,

Hardware

Detailed system specifications:

  • 824 Dell Nodes
  • Dense Compute
    • 648 compute nodes (Dell PowerEdge C6320 two-socket servers with Intel Xeon E5-2680 v4 (Broadwell, 14 cores, 2.40GHz) processors, 128GB memory)

  • ​GPU Compute

    • 1​60 ‘GPU ready’ compute nodes -- Dell PowerEdge R730 two-socket servers with Intel Xeon E5-2680 v4 (Broadwell, 14 cores, 2.40GHz) processors, 128GB memory

    • NVIDIA Tesla P100 (Pascal) GPUs -- 5.3TF peak (double precision), 16GB memory

  • ​Analytics

    • 16 huge memory nodes (Dell PowerEdge R930 four-socket server with Intel Xeon E5-4830 v3 (Haswell 12 core, 2.10GHz) processors, 1,536GB memory, 12 x 2TB drives)​

  • 23,392 total cores
    • 28 cores/node  & 128GB of memory/node
  • Mellanox EDR (100Gbps) Infiniband networking
  • Theoretical system peak performance
    • ~750 teraflops (CPU only)
  • 4 login nodes:
    • Intel Xeon E5-2680 (Broadwell) CPUs
    • 28 cores/node and 256GB of memory/node

How to Connect

  • SSH Method

To login to Owens at OSC, ssh to the following hostname:

owens.osc.edu 

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@owens.osc.edu

You may see warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprint listed here, then type yes.

From there, you have access to the compilers and other software development tools. You can run programs interactively or through batch requests. See the following sections for details.

  • OnDemand Method

You can also login to Owens at OSC with our OnDemand tool. The first step is to login to OnDemand. Then once logged in you can access Owens by clicking on "Clusters", and then selecting ">_Owens Shell Access".

Instructions on how to connect to OnDemand can be found at the OnDemand documention page.

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Oakley and Ruby clusters. Full details of the storage environment are available in our storage environment guide.

Home directories should be accessed through either the $HOME environment variable or the tilde notation ( ~username ). Project directories are located at /fs/project . Scratch storage is located at /fs/scratch .

Owens will not have symlinks allowing use of the old file system paths. This is in contrast to Oakley and Ruby, which will have the symlinks.

Software Environment

The module system on Owens is the same as on the Oakley and Ruby systems. Use  module load <package>  to add a software package to your environment. Use  module list  to see what modules are currently loaded and  module avail  to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use  module spider . By default, you will have the batch scheduling software modules, the Intel compiler and an appropriate version of mvapich2 loaded.

You can keep up to on the software packages that have been made available on Owens by viewing the Software by System page and selecting the Owens system.

Compiling Code to Use Advanced Vector Extensions (AVX2)

The Haswell and Broadwell processors that make up Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. AVX2 has the potential to speed up your code by a factor of 4 or more, depending on the compiler and options you would otherwise use.

In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the gnu compilers, use -march=native and -O3 . The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Owens. The executables will not be portable.

See the Owens Programming Environment page for details.

Batch Specifics

Refer to the documentation for our batch environment to understand how to use PBS on OSC hardware. Some specifics you will need to know to create well-formed batch scripts:

  • The qsub syntax for node requests is the same on Owens as on Ruby and Oakley
  • Most compute nodes on Owens have 28 cores/processors per node (ppn).  Huge-memory (analytics) nodes have 48 cores/processors per node.
  • Jobs on Owens may request partial nodes.  This is in contrast to Ruby but similar to Oakley.
  • Owens has 6 debug nodes which are specifically configured for short (< 1 hour) debugging type work.  These nodes have a walltime limit of 1 hour.
    • To schedule a debug node:
      #PBS -l nodes=1:ppn=28 -q debug

Using OSC Resources

For more information about how to use OSC resources, please see our guide on batch processing at OSC. For specific information about modules and file storage, please see the Batch Execution Environment page.

Service: 

Owens Programming Environment

Compilers

C, C++ and Fortran are supported on the Owens cluster. Intel, PGI and GNU compiler suites are available. The Intel development tool chain is loaded by default. Compiler commands and recommended options for serial programs are listed in the table below. See also our compilation guide.

The Haswell and Broadwell processors that make up Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. AVX2 has the potential to speed up your code by a factor of 4 or more, depending on the compiler and options you would otherwise use.

In our experience, the Intel and PGI compilers do a much better job than the GNU compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the GNU compilers, use -march=native and -O3. The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Owens. The executables will not be portable.

LANGUAGE INTEL EXAMPLE PGI EXAMPLE GNU EXAMPLE
C icc -O2 -xHost hello.c pgcc -fast hello.c gcc -O3 -march=native hello.c
Fortran 90 ifort -O2 -xHost hello.f90 pgf90 -fast hello.f90 gfortran -O3 -march=native hello.f90
C++ icpc -O2 -xHost hello.cpp pgc++ -fast hello.cpp g++ -O3 -march=native hello.cpp

Parallel Programming

MPI

OSC systems use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect. MPI is a standard library for performing parallel processing using a distributed-memory model. For more information on building your MPI codes, please visit the MPI Library documentation.

Parallel programs are started with the mpiexec command. For example,

mpiexec ./myprog

The program to be run must either be in your path or have its path specified.

The mpiexec command will normally spawn one MPI process per CPU core requested in a batch job. Use the -n and/or -ppn option to change that behavior.

The table below shows some commonly used options. Use mpiexec -help for more information.

MPIEXEC Option COMMENT
-ppn 1 One process per node
-ppn procs procs processes per node
-n totalprocs
-np totalprocs
At most totalprocs processes per node
-prepend-rank Prepend rank to output
-help Get a list of available options

 

Caution: There are many variations on mpiexec and mpiexec.hydra. Information found on non-OSC websites may not be applicable to our installation.
The information above applies to the MVAPICH2 and IntelMPI installations at OSC. See the OpenMPI software page for mpiexec usage with OpenMPI.

OpenMP

The Intel, PGI and GNU compilers understand the OpenMP set of directives, which support multithreaded programming. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

GPU Programming

160 Nvidia P100 GPUs are available on Owens.  Please visit our GPU documentation.

Service: 
Technologies: 

Technical Specifications

The following are technical specifications for Owens.  

  Owens SYSTEM (2016)
NUMBER OF NODES 824 nodes
NUMBER OF CPU SOCKETS 1648 (2 sockets/node)
NUMBER OF CPU CORES 23,392 (28 cores/node)
CORES PER NODE 28 cores/node (48 cores/node for Huge Mem Nodes)
LOCAL DISK SPACE PER NODE

~1500GB in /tmp

COMPUTE CPU SPECIFICATIONS

Intel Xeon E5-2680 v4 (Broadwell) for compute

  • 2.4 GHz 
  • 14 cores per processor
COMPUTER SERVER SPECIFICATIONS

648 Dell PowerEdge C6320

160 Dell PowerEdge R730 (for accelerator nodes)

ACCELERATOR SPECIFICATIONS

NVIDIA P100 "Pascal" GPUs 16GB memory

NUMBER OF ACCELERATOR NODES

160 total

TOTAL MEMORY ~ 127 TB
MEMORY PER NODE

128 GB (1.5 TB for Huge Mem Nodes)

MEMORY PER CORE 4.5 GB (31 GB for Huge Mem)
INTERCONNECT  Mellanox EDR Infiniband Networking (100Gbps)
LOGIN SPECIFICATIONS

4 Intel Xeon E5-2680 (Broadwell) CPUs

  • 28 cores/node and 256GB of memory/node
SPECIAL NODES

16 Huge Memory Nodes

  • Dell PowerEdge R930 
  • 4 Intel Xeon E5-4830 v3 (Haswell)
    • 12 Cores
    • 2.1 GHz
  • 48 cores (12 cores/CPU)
  • 1.5 TB Memory
  • 12 x 2 TB Drive (20TB usable)

 

Service: 

Batch Limit Rules

Memory Limit:

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. On Owens, it equates to 4GB/core or 124GB/node.

If your job requests less than a full node ( ppn< 28 ), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4GB/core).  For example, without any memory request ( mem=XX ), a job that requests  nodes=1:ppn=1  will be assigned one core and should use no more than 4GB of RAM, a job that requests  nodes=1:ppn=3  will be assigned 3 cores and should use no more than 12GB of RAM, and a job that requests  nodes=1:ppn=28  will be assigned the whole node (28 cores) with 124GB of RAM.  

Please be careful if you include memory request (mem=XX ) in your job. A job that requests  nodes=1:ppn=1,mem=12GB  will be assigned one core and have access to 12GB of RAM, and charged for 3 cores worth of Resource Units (RU).  However, a job that requests  nodes=1:ppn=5,mem=12GB  will be assigned 5 cores but have access to only 12GB of RAM, and charged for 5 cores worth of Resource Units (RU).  See Charging for memory use for more details

A multi-node job ( nodes>1 ) will be assigned the entire nodes with 124 GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests  nodes=10:ppn=1 will be charged for 10 whole nodes (28 cores/node*10 nodes, which is 280 cores worth of RU).  

A job that requests huge-memory node ( nodes=1:ppn=48  ) will be allocated the entire huge-memory node with 1.5 TB of RAM and charged for the whole node (48 cores worth of RU).

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

Walltime Limit

Here are the queues available on Owens:

NAME

MAX WALLTIME

MAX JOB SIZE

NOTES

Serial

 168 hours

1 node

 

Parallel

96 hours

27 nodes

Jobs are scheduled to run within a single IB leaf switch

Largeparallel

96 hours

81 nodes

 

Hugemem

168 hours

1 node

16 nodes in this class
Parallel hugemem 96 hours 16 nodes
  • Restricted access (contact OSC Help if you need access)
  • Use "-q parhugemem" to access it

Debug

1 hour

2 nodes

  • 6 nodes in this class
  • Use "-q debug" to request it 

Job Limit

An individual user can have up to 256 concurrently running jobs and/or up to 3080 processors/cores in use. All the users in a particular group/project can among them have up to 384 concurrently running jobs and/or up to 4620 processors/cores in use. Jobs submitted in excess of these limits are queued but blocked by the scheduler until other jobs exit and free up resources.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. 

Service: 

Citation

For more information about citations of OSC, visit https://www.osc.edu/citation.

To cite Owens, please use the following Archival Resource Key:

ark:/19495/hpc6h5b1

Please adjust this citation to fit the citation style guidelines required.

Ohio Supercomputer Center. 2016. Owens Supercomputer. Columbus, OH: Ohio Supercomputer Center. http://osc.edu/ark:19495/hpc6h5b1

Here is the citation in BibTeX format:

@article{Owens2016,
ark = {ark:/19495/hpc93fc8},
url = {http://osc.edu/ark:/19495/hpc6h5b1},
year  = {2016},
author = {Ohio Supercomputer Center},
title = {Owens supercomputer}
}

And in EndNote format:

%0 Generic
%T Owens supercomputer
%A Ohio Supercomputer Center
%R ark:/19495/hpc6h5b1
%U http://osc.edu/ark:/19495/hpc6h5b1
%D 2016

Here is an .ris file to better suit your needs. Please change the import option to .ris.

Documentation Attachment: 
Service: 

Migrating jobs from Oakley or Ruby to Owens

This page includes a summary of differences to keep in mind when migrating jobs from Oakley or Ruby to Owens

Guidance for Oakley Users

Hardware Specifications

  Owens (per node) oakley (per node)
Most compute node 28 cores and 125GB of RAM 12 cores and 48GB of RAM
Large memory node    

12 cores and 192GB of RAM

(8 nodes in this class)

Huge memory node

48 cores and 1.5 TB of RAM, 12 x 2TB drives

(16 nodes in this class)

32 cores and 1TB of RAM

(1 node in this class)

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Oakley cluster.

    owens Oakley
Home directories Accessed through either the  $HOME  environment variable or the tilde notation ( ~username )

Do NOT have symlinks allowing use of the old file system paths.

Please modify your script with the new paths before you submit jobs to Owens cluster

 

 

Have the symlinks allowing use of the old file system paths. 

No action is required on your part to continue using your existing job scripts on Oakley cluster

 

 

 

Project directories Located at  /fs/project
Scratch storage Located at  /fs/scratch

See the 2016 Storage Service Upgrades page for details. 

Software Environment

Owens uses the same module system as Oakley.

Use   module load <package to add a software package to your environment. Use   module list   to see what modules are currently loaded and  module avail   to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use   module spider 

You can keep up to on the software packages that have been made available on Owens by viewing the Software by System page and selecting the Owens system.

Programming Environment

Like Oakley, Owens supports three compilers: Intel, PGI, and gnu. The default is Intel. To switch to a different compiler, use  module swap intel gnu  or  module swap intel pgi

Owens also use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect.

In addition, Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

See the Owens Programming Environment page for details.

PBS Batch-Related Command

qpeek Command is not needed on Owens. 

On Oakley, a job’s stdout and stderr data streams, which normally show up on the screen, are written to log files. These log files are stored on a server until the job ends, so you can’t look at them directly. The  qpeek  command allows you to peek at their contents. If you used the PBS header line to join the stdout and stderr streams ( #PBS -j oe ), the two streams are combined in the output log.

On Owens, a job’s stdout and stderr data streams are written to log files stored on the current working directory, i.e. $PBS_O_WORKDIR . You will see the log files immediately after your job get started. 

Accounting

The Owens cluster will charged at a rate of 1 RU per 10 core-hours.

The Oakley cluster will be charged at a rate of 1 RU per 20 core-hours.

Like Oakley, Owens will accept partial-node jobs and charge you for the number of cores proportional to the amount of memory your job requests.

Below is a comparison of job limits between Owens and Oakley:

  owens oakley
Per User Up to 256 concurrently running jobs and/or up to 3080 processors/cores in use  Up to 128 concurrently running jobs and/or up to 1500 processors/cores in use
Per group Up to 384 concurrently running jobs and/or up to 3080 processors/cores in use Up to 192 concurrently running jobs and/or up to 1500 processors/cores in use

 

Please see Queues and Reservations for Owens for more details.

Guidance for Ruby Users

Hardware Specifications

  OWENS (PER NODE) Ruby (PER NODE)
Most compute node 28 cores and 125GB of RAM 20 cores and 64GB of RAM
Huge memory node

48 cores and 1.5 TB of RAM, 12 x 2TB drives

(16 nodes in this class)

32 cores and 1TB of RAM 

(1 node in this class)

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Ruby cluster.

    OWENS ruby
Home directories Accessed through either the  $HOME  environment variable or the tilde notation ( ~username )

Do NOT have symlinks allowing use of the old file system paths.

Please modify your script with the new paths before you submit jobs to Owens cluster

 

 

Have the symlinks allowing use of the old file system paths. 

No action is required on your part to continue using your existing job scripts on Oakley cluster

 

 

 

Project directories Located at  /fs/project
Scratch storage Located at  /fs/scratch

See the 2016 Storage Service Upgrades page for details. 

Software Environment

Owens uses the same module system as Ruby.

Use   module load <package to add a software package to your environment. Use   module list   to see what modules are currently loaded and  module avail   to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use   module spider 

You can keep up to on the software packages that have been made available on Owens by viewing the Software by System page and selecting the Owens system.

Programming Environment

Like Ruby, Owens supports three compilers: Intel, PGI, and gnu. The default is Intel. To switch to a different compiler, use  module swap intel gnu  or  module swap intel pgi

Owens also use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect.

In addition, Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

See the Owens Programming Environment page for details.

PBS Batch-Related Command

qpeek Command is not needed on Owens. 

On Ruby, a job’s stdout and stderr data streams, which normally show up on the screen, are written to log files. These log files are stored on a server until the job ends, so you can’t look at them directly. The   qpeek  command allows you to peek at their contents. If you used the PBS header line to join the stdout and stderr streams ( #PBS -j oe ), the two streams are combined in the output log.

On Owens, a job’s stdout and stderr data streams are written to log files stored on the current working directory, i.e. $PBS_O_WORKDIR . You will see the log files immediately after your job get started. 

Accounting

The Owens cluster will charged at a rate of 1 RU per 10 core-hours.

The Ruby cluster will be charged at a rate of 1 RU per 20 core-hours.

However, Owens will accept partial-node jobs and charge you for the number of cores proportional to the amount of memory your job requests. By contrast, Ruby only accepts full-node jobs and charge for the whole node. 

Below is a comparison of job limits between Owens and Ruby:

  OWENS Ruby
Per User Up to 256 concurrently running jobs and/or up to 3080 processors/cores in use  Up to 40 concurrently running jobs and/or up to 800 processors/cores in use
Per group Up to 384 concurrently running jobs and/or up to 3080 processors/cores in use Up to 80 concurrently running jobs and/or up to 1600 processors/cores in use

 

Please see Queues and Reservations for Owens for more details.

 

Service: 

Owens SSH key fingerprints

These are the public key fingerprints for Owens (in hexadecimal format):
owens: ssh_host_rsa_key.pub = 18:68:d4:b0:44:a8:e2:74:59:cc:c8:e3:3a:fa:a5:3f
owens: ssh_host_ed25519_key.pub = 1c:3d:f9:99:79:06:ac:6e:3a:4b:26:81:69:1a:ce:83
owens: ssh_host_ecdsa_key.pub = d6:92:d1:b0:eb:bc:18:86:0c:df:c5:48:29:71:24:af


These are the SHA256 hashes (in base64 format):​
owens: ssh_host_rsa_key.pub = SHA256:vYIOstM2e8xp7WDy5Dua1pt/FxmMJEsHtubqEowOaxo
owens: ssh_host_ed25519_key.pub = SHA256:FSb9ZxUoj5biXhAX85tcJ/+OmTnyFenaSy5ynkRIgV8
owens: ssh_host_ecdsa_key.pub = SHA256:+fqAIqaMW/DUJDB0v/FTxMT9rkbvi/qVdMKVROHmAP4

Queues and Reservations

Here are the queues available on Owens. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

 

Parallel

Available minus reservations

 96 hours

27 nodes

 

Largeparallel

Available minus reservations

96 hours

81 nodes

 

Hugemem

16

96 hours

1 node

 
Parhugemem 16 96 hours 16

Restricted access. 

Use "-q parhugemem" to request it

Debug 8 1 hour 2 nodes

For small interactive and test jobs. 

Use "-q debug" to request it 

"Available minus reservations" means all nodes in the cluster currently operational (this will fluctuate slightly), less the reservations listed below. To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

 

Occasionally, reservations will be created for specific projects that will not be reflected in these tables.

Service: 

GPU Computing

With the addition of 160 Nvidia P100 GPUs to the Owens cluster OSC now offers GPU computing on all its systems.  While GPUs can provide a significant boost in performance for some applications the computing model is very different from the CPU.  This page will discuss some of the ways you can use GPU computing at OSC.

Accessing GPU Resources

To request nodes with a GPU add the gpus=# attribute to the PBS nodes directive in your batch script, for example, on Owens,

#PBS -l nodes=2:ppn=28:gpus=1

On Oakley you can request 1 or 2 GPUs.

In most cases you'll need to load the cuda module (module load cuda) to make the necessary Nvidia libraries available.

There is no additional RU charge for GPUs.

Using GPU-enabled Applications

We have several supported applications that can use GPUs.  This includes

Please see the software pages for each application.  They have different levels of support for multi-node jobs, cpu/gpu work sharing, and environment set-up.

Libraries with GPU Support

There are a few libraries that provide GPU implementations of commonly used routines. While they mostly hide the details of using a GPU there are still some GPU specifics you'll need to be aware of, e.g. device initialization, threading, and memory allocation.

MAGMA

MAGMA is an implementation of BLAS and LAPACK with multi-core (SMP) and GPU support. There are some differences in the API of standard BLAS and LAPACK.

cuBLAS and cuSPARSE

cuBLAS is a highly optimized BLAS from NVIDIA. There are a few versions of this library, from very GPU-specific to nearly transparent. cuSPARSE is a BLAS-like library for sparse matrices.

The MAGMA library is built on cuBLAS.

cuFFT

cuFFT is NVIDIA's Fourier transform library with an API similar to FFTW.

cuDNN

cuDNN is NVIDIA's Deep Neural Network machine learning library. Many ML applications are built on cuDNN.

Direct GPU Programming

GPUs present a different programming model from CPUs so there is a significant time investment in going this route.

OpenACC

OpenACC is a directives-based model similar to OpenMP. Currently this is only supported by the Portland Group C/C++ and Fortran compilers.

OpenCL

OpenCL is a set of libraries and C/C++ compiler extensions supporting GPUs (NVIDIA and AMD) and other hardware accelerators. The CUDA module provides an OpenCL library.

CUDA

CUDA is the standard NVIDIA development environnment. In this model explicit GPU code is written in the CUDA C/C++ dialect, compiled with the CUDA compiler NVCC, and linked with a native driver program.

About OSC GPU Hardware

Our GPUs span several generations with different capabilites and ease-of-use. Many of the differences won't be visible when using applications or libraries, but some features and applications may not be supported on the older models.

Oakley M2070

The M2070 is now a legacy product. It has a CUDA compute capability of 2.0. It is not supported by the latest CUDA 8 drivers and development environment.

Each M2070 has 5.5GB of memory. And they still provide a significant speed-up over the CPU.

Ruby K40

The K40 has a compute capability of 3.5, which is supported by most applications.

Each K40 has 12GB of memory.

Owens P100

The P100 is NVIDIA's flagship GPU with a compute capability of 6.0. The 6.0 capability includes unified shared CPU/GPU memory -- the GPU now has its own virtual memory capability and can map CPU memory into its address space.

Each P100 has 16GB of on-board memory

Examples

There are example jobs and code at GitHub

Tutorials & Training

Training is an important part of our services. We are working to expand our portfolio; we currently provide the following:

  • Training classes. OSC provides training classes, at our facility, on-site and remotely.
  • HOWTOs. Step-by-step guides to accomplish certain tasks on our systems.
  • Tutorials. Online content designed for self-paced learning.

Other good sources for information:

  • Knowledge Base.  Useful information that does not fit our existing documentation.
  • FAQ.  List of commonly asked questions.

Knowledge Base

This knowledge base is a collection of important, useful information about OSC systems that does not fit into a guide or tutorial, and is too long to be answered in a simple FAQ.

Changes of Default Memory Limits

Problem Description

Our current GPFS file system is a distributed process with significant interactions between the clients. As the compute nodes being GPFS flle system clients, a certain amount of memory of each node needs to be reserved for these interactions. As a result, the maximum physical memory of each node allowed to be used by users' jobs are reduced, in order to keep the healthy performance of the file system. In addition, using swap memory is not allowed anymore. 

The table below summarizes the maximum physical memory allowed for each type of nodes on our systems:

Oakley Cluster

Node type physical memory per node Maximum memory allowed per Node 
Regular node 48GB 45GB
Big memory node 192GB 187GB
Huge memory node 1024GB (1TB) 1008GB

Ruby Cluster

NODE TYPE PHYSICAL MEMORY per node MAXIMUM MEMORY ALLOWED per node
Regular node 64GB 61GB
Debug node 128GB 124GB
Huge memory node 1024GB (1TB) 1008GB

Owens Cluster

NODE TYPE PHYSICAL MEMORY per node MAXIMUM MEMORY ALLOWED per node
Regular node 128GB 124GB
Huge memory node 1536GB

1510GB

Solutions When You Need Regular Nodes

Starting from October 27, 2016, we'll implement a new scheduling policy on all of our clusters, reflecting the reduced default memory limits. 

If you do not request memory explicitly in your job (no -l mem

Your job can be submitted and scheduled as before, and resouces will be allocated according to your requests of cores/nodes ( nodes=XX:ppn=XX ).  If you request partial node, the memory allocated to your job is proportional to the number of cores requested (4GB/core on Oakley and Owens); if you request the whole node, the memory allocated to your job is decreased, following the information summarized in the above tables. Some examples are provided below.

A request of partial node:

On Oakley, a request of nodes=1:ppn=1  will be allocated with 4GB memory, and charged for 1 core.  A request of  nodes=1:ppn=4  will be allocated with 16GB memory, and charged for 4 cores. A request of  nodes=1:ppn=11  will be allocated with 44GB memory, and charged for 11 cores. 

On Ruby, we always allocate whole nodes to jobs and charge for the whole node, with 61GB memory allocated to your job.  

On Owens, a request of  nodes=1:ppn=1   will be allocated with 4GB memory, and charged for 1 core. A request of  nodes=1:ppn=4  will be allocated with 16GB memory, and charged for 4 cores.

A request of the whole node:

A request of the whole regular node will be allocated with maximum memory allowed per node and charged for the whole node, as summarized below:

  Request memory allocated charged for
Oakley nodes=1:ppn=12  45GB 12 cores
Ruby nodes=1:ppn=20  61GB 20 cores
Owens nodes=1:ppn=28 124GB 28 cores

A request of multiple nodes:

If you have a multi-node job (  nodes>1  ), your job will be assigned the entire nodes with maximum memory allowed per node (45GB on Oakley, 61GB for Ruby, and 124GB for Owens) and charged for the entire nodes regardless of ppn request.

If you do request memory explicitly in your job (with  -l mem 

If you request memory explicily in your scirpt, please re-visit your script according to the following information. 

A request of partial node:

On Oakley, a request of  nodes=1:ppn=1,mem=4gb   will be allocated with 4GB memory, and charged for 1 core; a request of  nodes=1:ppn=2,mem=8gb   will be allocated with 8GB memory, and charged for 2 cores; a request of  nodes=1:ppn=1,mem=40gb   will be allocated with 40GB memory, and charged for 10 cores.

On Owens, a request of  nodes=1:ppn=1, mem=4gb   will be allocated with 4GB memory, and charged for 1 core.

On Ruby, we always allocate whole nodes to jobs and charge for the whole node, with 61GB memory allocated to your job. 

 A request of the whole node:

On Oakley, the maximum value you can use for -l mem is 45gb, i.e. -l mem=45gb. A request of  nodes=1:ppn=12,mem=45gb will be allocated with 45GB memory, and charged for the whole node. If you need more than 45GB memory for the job, please submit your job to big/huge memory nodes on Oakley, or switch to Owens cluster. Any request requesting mem>45gb may be re-scheduled on big memory node on Oakley, or will not be scheduled, depending on what you put in the request. 

On Ruby, the maximum value you can use for -l mem is 61gb, i.e. -l mem=61gb. A request of  nodes=1:ppn=20,mem=61gb will be allocated with 61GB memory, and charged for the whole node. If you need more than 61GB memory for the job, please submit your job to huge memory nodes on Ruby, or switch to Owens cluster. Any request requesting mem>61gb will not be scheduled. 

On Owens, the maximum value you can use for -l mem is 125gb, i.e. -l mem=125gb. A request of  nodes=1:ppn=28,mem=124gb will be allocated with 124GB memory, and charged for the whole node. If you need more than 124GB memory for the job, please submit your job to huge memory nodes. Any request requesting mem=>126gb will not be scheduled. 

A request of multiple nodes:

If you have a multi-node job (   nodes>1), your job will be assigned the entire nodes with maximum memory allowed per node (45GB on Oakley, 61GB for Ruby, and 124GB for Owens) and charged for the entire nodes.

Solutions When You Need Special Nodes

It is highly recommended that you do not put any memory request and follow the syntax below if you need any special resources.

 Oakley Cluster:

node type how to request MEMORY ALLOCATED CHARGED FOR
Big memory node

nodes=XX:ppn=12:bigmem

(XX can be 1-8)

187GB 12 cores
Huge memory node nodes=1:ppn=32 1008GB 32 cores

Ruby Cluster:

NODE TYPE HOW TO REQUEST MEMORY ALLOCATED CHARGED FOR
Debug node nodes=1:ppn=16 -q debug 124GB 16 cores
Huge memory node nodes=1:ppn=32 1008GB 32 cores

Owens Cluster:

NODE TYPE HOW TO REQUEST MEMORY ALLOCATED CHARGED FOR
Huge memory node nodes=1:ppn=48 1510GB 48 cores

 

Compilation Guide

As a general recommendation, we suggest selecting the newest compilers available for a new project. For repeatability, you may not want to change compilers in the middle of an experiment.

Owens Compilers

The Haswell and Broadwell processors that make up Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. AVX2 has the potential to speed up your code by a factor of 4 or more, depending on the compiler and options you would otherwise use.

With the Intel compilers, use -xHost and -O2 or higher. With the gnu compilers, use -march=native and -O3. The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Owens. The executables will not be portable.

Intel (recommended)

  NON-MPI MPI
FORTRAN 90 ifort mpif90
C icc mpicc
C++ icpc mpicxx

Recommended Optimization Options

The   -O2 -xHost  options are recommended with the Intel compilers. (For more options, see the "man" pages for the compilers.

OpenMP

Add this flag to any of the above:  -qopenmp  or  -openmp

PGI

  NON-MPI MPI
FORTRAN 90 pgfortran   or   pgf90 mpif90
C pgcc mpicc
C++ pgc++ mpicxx

Recommended Optimization Options

The   -fast  option is appropriate with all PGI compilers. (For more options, see the "man" pages for the compilers)

Note: The PGI compilers can generate code for accelerators such as GPUs. Description of these capabilities is beyond the scope of this guide.

OpenMP

Add this flag to any of the above:  -mp

GNU

  NON-MPI MPI
FORTRAN 90 gfortran mpif90
C gcc mpicc
C++ g++ mpicxx

Recommended Optimization Options

The  -O2 -march=native  options are recommended with the GNU compilers. (For more options, see the "man" pages for the compilers)

OpenMP

Add this flag to any of the above:  -fopenmp

 

Ruby Compilers

Intel (recommended)

  NON-MPI MPI
FORTRAN 90 ifort mpif90
C icc mpicc
C++ icpc mpicxx

Recommended Optimization Options

The  -O2 -xHost  options are recommended with the Intel compilers. (For more options, see the "man" pages for the compilers.

OpenMP

Add this flag to any of the above: -qopenmp or -openmp

PGI

  NON-MPI MPI
FORTRAN 90 pgfortran  or  pgf90 mpif90
C pgcc mpicc
C++ pgc++ mpicxx
NOTE: The C++ compiler used to be pgCC, but newer versions of PGI do not support this name.

Recommended Optimization Options

The  -fast  option is appropriate with all PGI compilers. (For more options, see the "man" pages for the compilers)

Note: The PGI compilers can generate code for accelerators such as GPUs. Description of these capabilities is beyond the scope of this guide.

OpenMP

Add this flag to any of the above: -mp

GNU

  NON-MPI MPI
FORTRAN 90 gfortran mpif90
C gcc mpicc
C++ g++ mpicxx

Recommended Optimization Options

The -O2 -march=native  options are recommended with the GNU compilers. (For more options, see the "man" pages for the compilers)

OpenMP

Add this flag to any of the above: -fopenmp

 

Oakley Compilers

Intel (Recommended)

  non-MPI MPI
Fortran ifort mpif90
C icc mpicc
C++ icpc mpicxx

Recommended Optimization Options

Sequential (not numerically sensitive) -fast
Sequential (numerically sensitive) -ipo -O2 -static -xHost
MPI (not numerically sensitive) -ipo -O3 -no-prec-div -xHost
MPI (numerically sensitive) -ipo -O2 -xHost
Note:  The -fast flag is equivalent to -ipo -O3 -no-prec-div -static -xHost .
Note:  Other options are available for code with extreme numerical sensitivity; their description is beyond the scope of this guide.
Note:  Intel 14.0.0.080 has a bug related to generation of portable code. Add the flag -msse3  to get around it.

OpenMP

Add this flag to any of the above: -qopenmp or -openmp

PGI

  non-MPI MPI
Fortran 90 or 95 pgfortran or pgf90 mpif90
Fortran 77 pgf77 mpif77
C pgcc mpicc
C++ pgc++ mpicxx

NOTE: The C++ compiler used to be pgCC, but newer versions of PGI do not support this name.

Recommended Optimization Options

The -fast  option is appropriate with all PGI compilers.  (For more options, see the "man" pages for the compilers)

Note: The PGI compilers can generate code for accelerators such as GPUs. Description of these capabilities is beyond the scope of this guide.

OpenMP

Add this flag to any of the above: -mp

GNU

  non-MPI MPI
Fortran 90 or 95 gfortran mpif90
Fortran 77 g77 mpif77
C gcc mpicc
C++ g++ mpicxx

Recommended Optimization Options

The -O3 -march=native options are recommended with the GNU compilers.  (For more options, see the "man" pages for the compilers)

OpenMP

Add this flag to any of the above (except g77 and mpif77): -fopenmp

Further Reading:

Intel Compiler Page

PGI Compiler Page

GNU Complier Page

Technologies: 
Fields of Science: 

Firewall and Proxy Settings

Connections to OSC

In order for users to access OSC resources through the web your firewall rules should allow for connections to the following publicly-facing IP ranges.  Otherwise, users may be blocked or denied access to our services.

  • 192.148.248.0/24
  • 192.148.247.0/24
  • 192.157.5.0/25

The followingg TCP ports should be opened:

  • 80 (HTTP)
  • 443 (HTTPS)
  • 22 (SSH)

The following domain should be allowed:

  • *.osc.edu

Users may follow the instructions below "Test your configuration" to ensure that your system is not blocked from accessing our services. If you are still unsure of whether their network is blocking theses hosts or ports should contact their local IT administrator.

Test your configuration

[Windows] Test your connection using PuTTY

  1. Open the PuTTY application.
  2. Enter IP address listed in "Connections to OSC" in the "Host Name" field.
  3. Enter 22 in the "Port" field.
  4. Click the 'Telnet' radio button under "Connection Type".
  5. Click "Open" to test the connection.
  6. Confirm the response. If the connection is successful, you will see a message that says "SSH-2.0-OpenSSH_5.3", as shown below. If you receive a PuTTY error, consult your system administrator for network access troubleshooting.

putty

[OSX/Linux] Test your configuration using telnet

  1. Open a terminal.
  2. Type telnet IPaddress 22 (Here, IPaddress is IP address listed in "Connections to OSC").
  3. Confirm the connection. 

Connections from OSC

All outbound network traffic from all of OSC's compute nodes are routed through a network address translation host (NAT), or two backup servers:

  • nat.osc.edu (192.157.5.13)
  • 192.148.248.35
  • 192.148.248.186

IT and Network Administrators

Please use the above information in order to assit users in acessing our resources.  

Occasionally new services may be stood up using hosts and ports not described here.  If you believe our list needs correcting please let us know at oschelp@osc.edu.

Service: 

Messages from qsub

We have been adding some output from qsub that should aid you in creating better job scripts. We've documented the various messages here.

NOTE

A "NOTE" message is informational; your job has been submitted, but qsub made some assumptions about your job that you may not have intended.

No account/project specified

Your job did not specify a project to charge against, but qsub was able to select one for you. Typically, this will be because your username can only charge against one project, but it may be because you specified a preference by setting the OSC_DEFAULT_ACCOUNT environment variable. The output should indicate which project was assumed to be the correct one; if it was not correct, you should delete the job and resubmit after setting the correct job in the job script using the -A flag. For example:

#PBS -A PZS0530

Replace PZS0530 with the correct project code. Explicitly setting the -A flag will cause this informational message to not appear.

No memory limit set

Your job did not specify an explicit memory limit. Since we limit access to memory based on the number of cores set, qsub set this limit on your behalf, and will have mentioned in the message what the memory limit was set to.

You can suppress this informational message by explicitly setting the memory limit. For example:

#PBS -l mem=4gb

Please remember that the memory to core ratios are different on each cluster we operate. Please review the main documentation page for the cluster you are using for more information.

ERROR

A "ERROR" message indicates that your job was not submitted to the queue. Typically, this is because qsub is unsure of how to resolve an ambiguous setting in your job parameters. You will need to fix the problem in your job script, and resubmit.

You have not specified an account and have more than one available

Your username has the ability to charge jobs to more than one project, and qsub is unable to determine which one this job should be charged against. You can fix this by specifying the project using the -A flag. For example, you should add this line to your job script:

#PBS -A PZS0530

If you get this error, qsub will inform you of which projects you can charge against. Select the appropriate project, and replace "PZS0530" in the example above with the correct code.

You have the ability to tell qsub which project should be charged if no charge code is specified in the job script by setting the OSC_DEFAULT_ACCOUNT environment variable. For example, if you use the "bash" shell, you could put the line export OSC_DEFAULT_ACCOUNT=PZS0530, again replacing PZS0530 with the correct project code.

Service: 

Migrating jobs from Glenn to Oakley or Ruby

This page includes a summary of differences to keep in mind when migrating jobs from Glenn to one of our other clusters.

Hardware

Most Oakley nodes have 12 cores and 48GB memory. There are eight large-memory nodes with 12 cores and 192GB memory, and one huge-memory node with 32 cores and 1TB of memory. Most Ruby nodes have 20 cores and 64GB of memory. There is one huge-memory node with 32 cores and 1TB of memory. By contrast, most Glenn nodes have 8 cores and 24GB memory, with eight nodes having 16 cores and 64GB memory.

Module System

Oakley and Ruby use a different module system than Glenn. It looks very similar, but it enforces module dependencies, and thus may prevent certain module combinations from being loaded that were permitted on Glenn. For example, only one compiler may be loaded at a time.

module avail will only show modules compatible with your currently loaded modules, but not all installed modules on the system. To see all modules on the cluster, use the command module spider. Both module avail and module spider can take a partial module name as a search parameter, such as module spider dyna.

Version numbers are indicated with a slash “/” rather than a dash “-” and need not be specified if you want the default version.

Compilers

Like Glenn, Oakley and Ruby support three compilers: Intel, PGI, and gnu. Unlike Glenn, Oakley and Ruby only let you have one compiler module loaded at any one time. The default is Intel. To switch to a different compiler, use module swap intel gnu or module swap intel pgi.

Important note: The gnu compilers are part of the Linux distribution, so they’re always available. It’s important to use the gnu module, however, to link with the correct libraries for MVAPICH, MKL, etc.

MPI

MPI-2 is available on Oakley and Ruby through the MVAPICH2 modules. The MVAPICH2 libraries are linked differently than on Glenn, requiring you to have the correct compiler and MVAPICH2 modules loaded at execution time as well as at compile time. (This doesn’t apply if you’re using a software package that was installed by OSC.)

Software you build and/or install

If your software uses any libraries installed by OSC, including MVAPICH, you will have to rebuild it. If you link to certain libraries, including MVAPICH, MKL, and others, you must have the same compiler module loaded at run time that you do at build time. Please refer to the compilation guide in our Knowledge Base for guidance on optimizing your compilations for our hardware.

OSC installed software

Most of the software installed on Glenn is also installed on Oakley or Ruby, although old versions may no longer be available. We recommend migrating to a newer version of the application if at all possible. Please review the software documentation to see what versions are available, and examine sample batch scripts.

Accounting

All OSC clusters currently use the same core-hour to RU conversion factor. Oakley will charge you for the number of cores proportional to the amount of memory your job requests, while Ruby only accepts full-node jobs. Please review the system documentation for each cluster.

“all” replaced by “pdsh”

The “all” command is not available on Oakley or Ruby; “pdsh” is available on all clusters.

pdsh –j jobid command

pdsh –g feature command

pdsh –w nodelist command

Service: 

Out-of-Memory (OOM) or Excessive Memory Usage

Problem description

A common problem on our systems is for a user job to run a node out of memory or to use more than its allocated share of memory if the node is shared with other jobs.

If a job exhausts both the physical memory and the swap space on a node, it causes the node to crash. With a parallel job, there may be many nodes that crash. When a node crashes, the systems staff has to manually reboot and clean up the node. If other jobs were running on the same node, the users have to be notified that their jobs failed.

If your job requests less than a full node, for example, -l nodes=1:ppn=1, it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested. For example, if a system has 4GB per core and you request one core, it is your responsibility to make sure your job uses no more than 4GB. Otherwise your job will interfere with the execution of other jobs.

The memory limit you set in PBS does not work the way one might expect it to. The only thing the -l mem=xxx flag is good for is requesting a large-memory node. It does not cause your job to be allocated the requested amount of memory, nor does it limit your job’s memory usage.
Note that even if your job isn’t causing problems, swapping is extremely inefficient. Your job will run orders of magnitude slower than it would with effective memory management.

Background

Each node has a fixed amount of physical memory and a fixed amount of disk space designated as swap space. If your program and data don’t fit in physical memory, the virtual memory system writes pages from physical memory to disk as necessary and reads in the pages it needs. This is called swapping. If you use up all the memory and all the swap space, the node crashes with an out-of-memory error.

This explanation really applies to the total memory usage of all programs running on the system. If someone else’s program is using too much memory, it may be pages from your program that get swapped out, and vice versa. This is the reason we aggressively terminate programs using more than their share of memory when there are other jobs on the node.

In the world of high performance computing, swapping is almost always undesirable. If your program does a lot of swapping, it will spend most of its time doing disk I/O and won’t get much computation done. You should consider the suggestions below.

You can find the amount of memory on our systems by following the links on our Supercomputers page. You can see the memory and swap values for a node by running the Linux command free on the node. As shown below, a standard node on Oakley has 48GB physical memory and 46GB swap space.

[n0123]$ free -mo
             total       used       free     shared    buffers     cached
Mem:         48386       2782      45603          0        161       1395
Swap:        46874          0      46874

Suggested solutions

Here are some suggestions for fixing jobs that use too much memory. Feel free to contact OSC Help for assistance with any of these options.

Some of these remedies involve requesting more processors (cores) for your job. As a general rule we require you to request a number of processors proportional to the amount of memory you require. You need to think in terms of using some fraction of a node rather than treating processors and memory separately. If some of the processors remain idle, that’s not a problem. Memory is just as valuable a resource as processors.

Request whole node or more processors

Jobs requesting less than a whole node are those that have nodes=1 with ppn<12 on Oakley, for example nodes=1:ppn=1. These jobs can be problematic for two reasons. First, they are entitled to use an amount of memory proportional to the ppn value requested; if they use more they interfere with other jobs. Second, if they cause a node to crash, it typically affects multiple jobs and multiple users.

If you’re sure about your memory usage, it’s fine to request just the number of processors you need, as long as it’s enough to cover the amount of memory you need. If you’re not sure, play it safe and request all the processors on the node.

Standard Oakley nodes have 4GB per core.

Reduce memory usage

Consider whether your job’s memory usage is reasonable in light of the work it’s doing. The code itself typically doesn’t require much memory, so you need to look mostly at the data size.

If you’re developing the code yourself, look for memory leaks. In MATLAB look for large arrays that can be cleared.

An out-of-core algorithm will typically use disk more efficiently than an in-memory algorithm that relies on swapping. Some third-party software gives you a choice of algorithms or allows you to set a limit on the memory the algorithm will use.

Use more nodes for a parallel job

If you have a parallel job you can get more total memory by requesting more nodes. Depending on the characteristics of your code you may also need to run fewer processes per node.

Here’s an example. Suppose your job on Oakley includes the following lines:

#PBS -l nodes=5:ppn=12
…
mpiexec mycode

This job uses 5 nodes, so it has 5*48=240GB total memory available to it. The mpiexec command by default runs one process per core, which in this case is 5*12=60 copies of mycode.

If this job uses too much memory you can spread those 60 processes over more nodes. The following lines request 10 nodes, giving you a total of 10*48=480GB total memory. The -ppn 6 option on the mpiexec command says to run 6 processes per node instead of 12, for a total of 60 as before.

#PBS -l nodes=10:ppn=12
…
mpiexec -ppn 6 mycode

Since parallel jobs are always assigned whole nodes, the following lines will also run 6 processes per node on 10 nodes.

#PBS -l nodes=10:ppn=6
…
mpiexec mycode

Request large-memory nodes

Oakley has eight nodes with 192GB each, four times the memory of a standard node. Oakley also has one huge-memory node with 1TB of memory; it has 32 cores.

Since there are so few of these nodes, compared to hundreds of standard nodes, jobs requesting them will often have a long wait in the queue. The wait will be worthwhile, though, If these nodes solve your memory problem.

To use the large-memory nodes on Oakley, request between 48gb and 192gb memory and 1 to 12 processors per node. Remember to request a number of processors per node proportional to your memory requirements. In most cases you’ll want to request the whole node (ppn=12). You can request up to 8 nodes but the more you request the longer your queue wait is likely to be.

Example:

#PBS -l nodes=1:ppn=12
#PBS -l mem=192gb
…

To use the huge-memory node on Oakley you must request the whole node (ppn=32). Let the memory default.

#PBS -l nodes=1:ppn=32
…

Put a virtual memory limit on your job

The sections above are intended to help you get your job running correctly. This section is about forcing your job to fail gracefully if it consumes too much memory. If your memory usage is unpredictable, it is preferable to terminate the job when it exceeds a memory usage limit rather than allow it to crowd other jobs or crash a node.

The memory limit enforced by PBS is ineffective because it only limits physical memory usage (resident set size or RSS). When your job reaches its memory limit it simply starts using virtual memory, or swap. PBS allows you to put a limit on virtual memory, but that has problems also.

We will use Linux terminology. Each process has several virtual memory values associated with it. VmSize is virtual memory size; VmRSS is resident set size, or physical memory used; VmSwap is swap space used. The number we care about is the total memory used by the process, which is VmRSS + VmSwap. What PBS allows a job to limit is VmRSS (using -l mem=xxx) or VmSize (using -l vmem=xxx).

The relationship among VmSize, VmRSS, and VmSwap is:  VmSize >= VmRSS+VmSwap. For many programs this bound is fairly tight; for others VmSize can be much larger than the memory actually used.

If the bound is reasonably tight, -l vmem=4gb provides an effective mechanism for limiting memory usage to 4gb (for example). If the bound is not tight, VmSize may prevent the program from starting even if VmRSS+VmSwap would have been perfectly reasonable. Java and some FORTRAN 77 programs in particular have this problem.

The vmem limit in PBS is for the entire job, not just one node, so it isn’t useful with parallel (multimode) jobs. PBS also has a per-process virtual memory limit, pvmem. This limit is trickier to use, but it can be useful in some cases.

Here are suggestions for some specific cases.

Serial (single-node) job using program written in C/C++

This case applies to programs written in any language if VmSize is not much larger than VmRSS+VmSwap. If your program doesn’t use any swap space, this means that vmem as reported by qstat -f or the ja command (see below) is not much larger mem as reported by the same tools.

Set the vmem limit equal to, or slightly larger than, the number of processors requested (ppn) times the memory available per processor. Example for Oakley:

#PBS -l nodes=1:ppn=1
#PBS -l vmem=4gb
Parallel (multinode) job using program written in C/C++

This suggestion applies if your processes use approximately equal amounts of memory. See also the comments about other languages under the previous case.

Set the pvmem limit equal to, or slightly larger than, the amount of physical memory on the node divided by the number of processes per node. Example for Oakley, running 12 processes per node:

#PBS -l nodes=5:ppn=12
#PBS -l pvmem=4gb
…
mpiexec mycode
Serial (single-node) job using program written in Java

I’ve only slightly tested this suggestion so far, so please provide feedback to judithg@osc.edu.

Start Java with a virtual memory limit equal to, or slightly larger than, the number of processors requested (ppn) times the memory available per processor. Example for Oakley:

#PBS -l nodes=1:ppn=1
#PBS -l vmem=4gb
…
java -Xms4096m -Xmx4096m MyJavaCode
Other situations

If you have other situations that aren’t covered here, please share them. Contact judithg@osc.edu.

How to monitor your memory usage

qstat -f

While your job is running the command qstat -f jobid will tell you the peak physical and virtual memory usage of the job so far. For a parallel job, these numbers are the aggregate usage across all nodes of the job. The values reported by qstat may lag the true values by a couple of minutes.

free

For parallel (multinode) jobs you can check your per-node memory usage while your job is running by using pdsh -j jobid free -mo on Oakley.

ja

You can put the command ja (job accounting) at the end of your batch script to capture the resource usage reported by qstat -f. The information will be written to your job output log, job_name.o123456.

OnDemand

You can also view node status graphically the OSC OnDemand Portal (ondemand.osc.edu).  Under "Jobs" select "Active Jobs". Click on "Job Status" and scroll down to see memory usage. This shows the total memory usage for the node; if your job is not the only one running there, it may be hard to interpret.

Below is a typical graph for jobs using too much memory. It shows two jobs that ran back-to-back on the same node. The first peak is a job that used all the available physical memory (blue) and a large amount of swap (purple). It completed successfully without crashing the node. The second job followed the same pattern but actually crashed the node.

Notes

If it appears that your job is close to crashing a node, we may preemptively delete the job.

If your job is interfering with other jobs by using more memory than it should be, we may delete the job.

In extreme cases OSC staff may restrict your ability to submit jobs. If you crash a large number of nodes or continue to submit problem jobs after we have notified you of the situation, this may be the only way to protect the system and our other users. If this happens, we will restore your privileges as soon as you demonstrate that you have resolved the problem.

For details on retrieving files from unexpectedly terminated jobs see this FAQ.

For assistance

OSC has staff available to help you resolve your memory issues. See our Support Services page for contact information.

System Email

Occasionally, jobs that experience problems may generate emails from staff or automated systems at the center with some information about the nature of the problem. These pages provide additional information about the various emails sent, and steps that can be taken to address the problem.

Batch job aborted

Purpose

Notify you when your job terminates abnormally.

Sample subject line

PBS JOB 944666.oak-batch.osc.edu

Apparent sender

  • root <adm@oak-batch.osc.edu> (Oakley)
  • root <pbs-opt@hpc.osc.edu> (Glenn)

Sample contents

PBS Job Id: 935619.oak-batch.osc.edu
Job Name:   mailtest.job
Exec host:  n0587/5
Aborted by PBS Server
Job exceeded some resource limit (walltime, mem, etc.). Job was aborted See Administrator for help

Sent under these circumstances

These are fully automated emails send by the batch system.

Some reasons a job might terminate abnormally:

  • The job exceeded its allotted walltime, memory, virtual memory, or other limited resource. More information is available in your job log file, e.g., jobname.o123456.
  • An unexpected system problem caused your job to fail.

To turn off the emails

There is no way to turn them off at this time.

To prevent these problems

For advice on monitoring and controlling resource usage, see Monitoring and Managing Your Job.

There’s not much you can do about system failures, which fortunately are rare.

Notes

Under some circumstances you can retrieve your job output log if your job aborts due to a system failure. Contact oschelp@osc.edu for assistance.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Batch job begin or end

Purpose

Notify you when your job begins or ends.

Sample subject line

PBS JOB 944666.oak-batch.osc.edu

Apparent sender

  • root <adm@oak-batch.osc.edu> (Oakley)
  • root <pbs-opt@hpc.osc.edu> (Glenn)

Sample contents

PBS Job Id: 944666.oak-batch.osc.edu
Job Name:   mailtest.job
Exec host:  n0587/1
Begun execution
 
PBS Job Id: 944666.oak-batch.osc.edu
Job Name:   mailtest.job
Exec host:  n0587/1
Execution terminated
Exit_status=0
resources_used.cput=00:00:00
resources_used.mem=2228kb
resources_used.vmem=211324kb
resources_used.walltime=00:01:00

Sent under these circumstances

These are fully automated emails sent by the batch system. You control them through the headers in your job script. The following line requests emails at the beginning, ending, and abnormal termination of your job.

#PBS -m abe

To turn off the emails

Remove the -m option from your script and/or command line or use -m n. See PBS Directives Summary.

Notes

You can add the following command at the end of your script to have resource information written to your job output log:

ja

For more information

See PBS Directives Summary.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Batch job deleted by an administrator

Purpose

Notify you when your job is deleted by an administrator.

Sample subject line

PBS JOB 9657213.opt-batch.osc.edu

Apparent sender

  • root adm@oak-batch.osc.edu (Oakley)
  • root pbs-opt@hpc.osc.edu (Glenn)

Sample contents

PBS Job Id: 9657213.opt-batch.osc.edu
Job Name:   mailtest.job
job deleted
Job deleted at request of staff@opt-login04.osc.edu Job using too much memory. Contact oschelp@osc.edu.

Sent under these circumstances

These emails are sent automatically, but the administrator can add a note with the reason.

Some reasons a running job might be deleted:

  • The job is using so much memory that it threatens to crash the node it is running on.
  • The job is using more resources than it requested and is interfering with other jobs running on the same node.
  • The job is causing excessive load on some part of the system, typically a network file server.
  • The job is still running at the start of a scheduled downtime.

Some reasons a queued job might be deleted:

  • The job requests non-existent resources.
  • A job apparently intended for Oakley (ppn=12) was submitted on Glenn.
  • The job can never run because it requests combinations of resources that are disallowed by policy.
  • The user’s credentials are blocked on the system the job was submitted on.

To turn off the emails

There is no way to turn them off at this time.

To prevent these problems

See the Supercomputing FAQ for suggestions on dealing with specific problems.

For assistance

We will work with you to get your jobs running within the constraints of the system. Contact OSC Help for assistance. See our Support Services page for more contact information.

Emails exceeded the expected volume

Purpose

Notify you that we have placed a hold on emails sent to you from the HPC system.

Sample subject line

Emails sent to email address student@buckeyemail.osu.edu in the last hour exceeded the expected volume

Apparent sender

OSC Help <OSCHelp@osc.edu>

Explanation

When a job fails or is deleted by an administrator, the system sends you an email. If this happens with a large number of jobs, it generates a volume of email that may be viewed as spam by your email provider. To avoid having OSC blacklisted, and to avoid overloading your email account, we hold your emails from OSC.

Please note that these held emails will eventually be deleted if you do not contact us.

Sent under these circumstances

These emails are sent automatically when your email usage from OSC is deferred.

To turn off the emails

Turn off emails related to your batch jobs to reduce your overall email volume from OSC. See the -m option on the PBS Directives Summary page.

Notes

To re-enable email you must contact OSC Help.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

 

 

File system load problem

Purpose

Notify you that one or more of your jobs caused excessive load on one of the network file system directory servers.

Sample subject line

Your jobs on Oakley are causing excessive load on fs14

Apparent sender

OSC Help <OSCHelp@osc.edu> or an individual staff member

Explanation

Your jobs are causing problems with one of the network file servers. This is usually caused by submitting a large number of jobs that start at the same time and execute in lockstep.

Sent under these circumstances

These emails are sent by a staff member when the high load is traced to your jobs. Often the jobs have to be stopped or deleted.

To turn off the emails

You cannot turn off these emails. Please don’t ignore them because they report a problem that you must correct.

To prevent these problems

See the Knowledge Base article (coming soon) for suggestions on dealing with file system load problems.

For information on the different file systems available at OSC, see Available File Systems.

Notes

If you continue to submit jobs that cause these problems, your HPC account may be blocked.

For assistance

We will work with you to get your jobs running within the constraints of the system. Contact OSC Help for assistance. See our Support Services page for more contact information.

Job failure due to a system hardware problem

Purpose

Notify you that one or more of your jobs was running on a compute node that crashed due to a hardware problem.

Sample subject line

Failure of job(s) 919137 due to a hardware problem at OSC

Apparent sender

OSC Help <OSCHelp@osc.edu>

Explanation

Your job failed and was not at fault. You should resubmit the job.

Sent under these circumstances

These emails are sent by a systems administrator after a node crashes.

To turn off the emails

We don’t have a mechanism to turn off these emails. If they really bother you, contact OSC Help and we’ll try to accommodate you.

To prevent these problems

Hardware crashes are quite rare and in most cases there’s nothing you can do to prevent them. Certain types of bus errors on Glenn correlate strongly with certain applications (suggesting that they’re not really hardware errors). If you encounter this type of error you may be advised to use Oakley rather than Glenn.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Job failure due to a system software problem

Purpose

Notify you that one or more of your jobs was running on a compute node that crashed due to a system software problem.

Sample subject line

Failure of job(s) 919137 due to a system software problem at OSC

Apparent sender

OSC Help <OSCHelp@osc.edu>

Explanation

Your job failed and was not at fault. You should resubmit the job. Usually the problems are caused by another job running on the node.

Sent under these circumstances

These emails are sent by a systems administrator as part of the node cleanup process.

To turn off the emails

We don’t have a mechanism to turn off these emails. If they really bother you, contact OSC Help and we’ll try to accommodate you.

To prevent these problems

If you request a whole node (nodes=1:ppn=12 on Oakley or nodes=1:ppn=8 on Glenn) your jobs will be less susceptible to problems caused by other jobs. Other than that, be assured that we work hard to keep jobs from interfering with each other.

For assistance

Contact OSC Help. See our Support Services page for more contact information.

Job failure due to exhaustion of physical memory

Purpose

Notify you that one or more of your jobs caused compute nodes to crash with an out-of-memory error.

Sample subject line

Failure of job(s) 933014,933174 at OSC due to exhaustion of physical memory

Apparent sender

OSC Help <oschelp@osc.edu>

Explanation

Your job(s) exhausted both physical memory and swap space during job execution. This failure caused the compute node(s) used by the job(s) to crash, requiring a reboot.

Sent under these circumstances

These emails are sent by a systems administrator as part of the node cleanup process.

To turn off the emails

You cannot turn off these emails. Please don’t ignore them because they report a problem that you must correct.

To prevent these problems

See the Knowledge Base article "Out-of-Memory (OOM) or Excessive Memory Usage" for suggestions on dealing with out-of-memory problems.

For information on the memory available on the various systems, see our Supercomputing page.

Notes

If you continue to submit jobs that cause these problems, your HPC account may be blocked.

For assistance

We will work with you to get your jobs running within the constraints of the system. Contact OSC Help for assistance. See our Support Services page for more contact information.

Supercomputing Policies

OSC-1, OSC Data Lifecycle Management Policy - Data storage space is a limited resource, and in an effort to keep the resource available to the largest amount of active users, the following policy and accompanying procedures and resources have been developed to reduce system management overhead and the impact on other users of OSC systems.

OSC-2, OSC Media Inventory Management - The purpose of this policy and accompanying procedures and resources is to help ensure the protection of the media containing the data from accidental or intentional unauthorized access, damage, alteration or disclosure while preserving the ability of authorized users to access and use the data.

OSC-3, OSC Information Security Framework - This policy and its supporting sub-policies provide a foundation for the security of OSC information technology systems. The requirements put forth in this policy and its supporting sub-policies are designed to ensure that due diligence is exercised in the protection of information, systems and services. This policy describes fundamental practices of information security that are to be applied by OSC to ensure that protective measures are implemented and maintained.

OSC-4, OSC Malicious Code Security - This policy is to implement and operate a malicious code security program. The program should help to ensure that adequate protective measures are in place against introduction of malicious code into OSC information systems and that computer system users are able to maintain a high degree of malicious code awareness.

OSC-5, OSC Remote Access Security - This policy is to establish practices wherever a remote access capability is provided to OSC systems so that inherent vulnerabilities in such services may be compensated.

OSC-6, OSC Security Education and Awareness - This policy requires OSC to provide information technology security education and awareness to employees, contractors, temporary personnel and other agents of OSC who use and administer computer and telecommunications systems.

OSC-7, OSC Security Incident Response - This policy defines adequate security response for identified security incidents.

OSC-8, OSC Password PIN Security - We have implemented a new password change policy. This portion of our policies page is currently under construction. The Password PIN Security policy is dated, but established minimum requirements regarding the proper selection, use and management of passwords and personal identification numbers (PINs); references in this policy to passwords also apply to PINs, except where explicitly noted.

OSC-9, OSC Portable Security Computing - This policy addresses information technology (IT) security concerns with portable computing devices and provides direction for their use, management and control. This policy includes security concerns with the physical device itself, as well as its applications and data.

OSC-10, OSC Security Notifications - This OSC policy identifies the methods used to inform users of their duty, limitations on use, legal requirements and personal privacy expectations associated with the use of OSC and university computers, networks or telecommunications systems.

OSC-11, OSC User Management Policy - This policy establishes the information and qualifications required to establish an account to use OSC resources. This policy will also define the basic levels of support that users of OSC IT environments can expect.

OSC-12, OSC Intrusion Prevention and Detection - The purpose of this state policy is to establish an intrusion prevention and detection capability that is designed to prevent, monitor and identify system intrusions or misuse.

OSC-13, OSC IT Business Continuity Planning - This document provides guidance in the development and implementation of a comprehensive information technology business continuity plan that, in the event of a business disruption, will help enable the continuation of critical processes and the delivery of essential services at an acceptable level.

OSC-14, OSC Virtual Machine Lifecycle Management - Virtual Machines at OSC are a resource that must be maintained and protected. This document provides guidance in the hosting and maintaining of all systems and virtual environment infrastructure that require support from a limited resource. The following policy and accompanying procedures and resources have been developed to reduce system management overhead and the impact on other users of OSC systems.

 

Proposed OSC Policies for Public Comments

This page lists all proposed OSC policies for public comments. Your comments help inform our policies and are encouraged. We will provide response to comments on this webpage after the public comment period closes. Please submit your comments via our online form by the deadline. 

Currently Open for Public Comment:

Scratch Storage Policy

OSC provides a scratch file system as high-performance, high-capacity, shared space. It is temporary without backup. No customer should rely upon the retention of data placed on scratch storage. The purpose of this policy is to (1) keep sufficient scratch space available at all times, (2) maintain good performance of the scratch file system, (3) protect users from misuse of scratch which will result in data loss and (4) reduce manual management of scratch data by OSC staff. 

View this PDF for the proposed policy. 

We need to receive your comments via online form by Friday, April 28, 2017

 

Comment Form 

Service: