Owens

TIP: Remember to check the menu to the right of the page for related pages with more information about Owens' specifics.

OSC's Owens cluster being installed in 2016 is a Dell-built, Intel® Xeon® processor-based supercomputer.

Owens infographic,

Hardware

Detailed system specifications:

  • 824 Dell Nodes
  • Dense Compute
    • 648 compute nodes (Dell PowerEdge C6320 two-socket servers with Intel Xeon E5-2680 v4 (Broadwell, 14 cores, 2.40GHz) processors, 128GB memory)

  • GPU Compute

    • 1 60 ‘GPU ready’ compute nodes -- Dell PowerEdge R730 two-socket servers with Intel Xeon E5-2680 v4 (Broadwell, 14 cores, 2.40GHz) processors, 128GB memory

    • NVIDIA Tesla P100 (Pascal) GPUs -- 5.3TF peak (double precision), 16GB memory

  • Analytics

    • 16 huge memory nodes (Dell PowerEdge R930 four-socket server with Intel Xeon E5-4830 v3 (Haswell 12 core, 2.10GHz) processors, 1,536GB memory, 12 x 2TB drives)

  • 23,392 total cores
    • 28 cores/node  & 128GB of memory/node
  • Mellanox EDR (100Gbps) Infiniband networking
  • Theoretical system peak performance
    • ~750 teraflops (CPU only)
  • 4 login nodes:
    • Intel Xeon E5-2680 (Broadwell) CPUs
    • 28 cores/node and 256GB of memory/node

How to Connect

  • SSH Method

To login to Owens at OSC, ssh to the following hostname:

owens.osc.edu 

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@owens.osc.edu

You may see warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprint listed here, then type yes.

From there, you are connected to Owens login node and have access to the compilers and other software development tools. You can run programs interactively or through batch requests. We use control groups on login nodes to keep the login nodes stable. Please use batch jobs for any compute-intensive or memory-intensive work. See the following sections for details.

  • OnDemand Method

You can also login to Owens at OSC with our OnDemand tool. The first step is to login to OnDemand. Then once logged in you can access Owens by clicking on "Clusters", and then selecting ">_Owens Shell Access".

Instructions on how to connect to OnDemand can be found at the OnDemand documention page.

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Oakley and Ruby clusters. Full details of the storage environment are available in our storage environment guide.

Home directories should be accessed through either the $HOME environment variable or the tilde notation ( ~username ). Project directories are located at /fs/project . Scratch storage is located at /fs/scratch .

Owens will not have symlinks allowing use of the old file system paths. This is in contrast to Oakley and Ruby, which will have the symlinks.

Software Environment

The module system on Owens is the same as on the Oakley and Ruby systems. Use  module load <package>  to add a software package to your environment. Use  module list  to see what modules are currently loaded and  module avail  to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use  module spider . By default, you will have the batch scheduling software modules, the Intel compiler and an appropriate version of mvapich2 loaded.

You can keep up to on the software packages that have been made available on Owens by viewing the Software by System page and selecting the Owens system.

Compiling Code to Use Advanced Vector Extensions (AVX2)

The Haswell and Broadwell processors that make up Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. AVX2 has the potential to speed up your code by a factor of 4 or more, depending on the compiler and options you would otherwise use.

In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the gnu compilers, use -march=native and -O3 . The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Owens. The executables will not be portable.

See the Owens Programming Environment page for details.

Batch Specifics

Refer to the documentation for our batch environment to understand how to use PBS on OSC hardware. Some specifics you will need to know to create well-formed batch scripts:

  • The qsub syntax for node requests is the same on Owens as on Ruby and Oakley
  • Most compute nodes on Owens have 28 cores/processors per node (ppn).  Huge-memory (analytics) nodes have 48 cores/processors per node.
  • Jobs on Owens may request partial nodes.  This is in contrast to Ruby but similar to Oakley.
  • Owens has 6 debug nodes which are specifically configured for short (< 1 hour) debugging type work.  These nodes have a walltime limit of 1 hour.
    • To schedule a debug node:
      #PBS -l nodes=1:ppn=28 -q debug

Using OSC Resources

For more information about how to use OSC resources, please see our guide on batch processing at OSC. For specific information about modules and file storage, please see the Batch Execution Environment page.

Supercomputer: 
Service: 

Owens Programming Environment

Compilers

C, C++ and Fortran are supported on the Owens cluster. Intel, PGI and GNU compiler suites are available. The Intel development tool chain is loaded by default. Compiler commands and recommended options for serial programs are listed in the table below. See also our compilation guide.

The Haswell and Broadwell processors that make up Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. AVX2 has the potential to speed up your code by a factor of 4 or more, depending on the compiler and options you would otherwise use.

In our experience, the Intel and PGI compilers do a much better job than the GNU compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the GNU compilers, use -march=native and -O3. The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Owens. The executables will not be portable.

LANGUAGE INTEL EXAMPLE PGI EXAMPLE GNU EXAMPLE
C icc -O2 -xHost hello.c pgcc -fast hello.c gcc -O3 -march=native hello.c
Fortran 90 ifort -O2 -xHost hello.f90 pgf90 -fast hello.f90 gfortran -O3 -march=native hello.f90
C++ icpc -O2 -xHost hello.cpp pgc++ -fast hello.cpp g++ -O3 -march=native hello.cpp

Parallel Programming

MPI

OSC systems use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect. MPI is a standard library for performing parallel processing using a distributed-memory model. For more information on building your MPI codes, please visit the MPI Library documentation.

Parallel programs are started with the mpiexec command. For example,

mpiexec ./myprog

The program to be run must either be in your path or have its path specified.

The mpiexec command will normally spawn one MPI process per CPU core requested in a batch job. Use the -n and/or -ppn option to change that behavior.

The table below shows some commonly used options. Use mpiexec -help for more information.

MPIEXEC Option COMMENT
-ppn 1 One process per node
-ppn procs procs processes per node
-n totalprocs
-np totalprocs
At most totalprocs processes per node
-prepend-rank Prepend rank to output
-help Get a list of available options

 

Caution: There are many variations on mpiexec and mpiexec.hydra. Information found on non-OSC websites may not be applicable to our installation.
The information above applies to the MVAPICH2 and IntelMPI installations at OSC. See the OpenMPI software page for mpiexec usage with OpenMPI.

OpenMP

The Intel, PGI and GNU compilers understand the OpenMP set of directives, which support multithreaded programming. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

GPU Programming

160 Nvidia P100 GPUs are available on Owens.  Please visit our GPU documentation.

Supercomputer: 
Service: 
Technologies: 

Technical Specifications

The following are technical specifications for Owens.  

  Owens SYSTEM (2016)
NUMBER OF NODES 824 nodes
NUMBER OF CPU SOCKETS 1648 (2 sockets/node)
NUMBER OF CPU CORES 23,392 (28 cores/node)
CORES PER NODE 28 cores/node (48 cores/node for Huge Mem Nodes)
LOCAL DISK SPACE PER NODE

~1500GB in /tmp

COMPUTE CPU SPECIFICATIONS

Intel Xeon E5-2680 v4 (Broadwell) for compute

  • 2.4 GHz 
  • 14 cores per processor
COMPUTER SERVER SPECIFICATIONS

648 Dell PowerEdge C6320

160 Dell PowerEdge R730 (for accelerator nodes)

ACCELERATOR SPECIFICATIONS

NVIDIA P100 "Pascal" GPUs 16GB memory

NUMBER OF ACCELERATOR NODES

160 total

TOTAL MEMORY ~ 127 TB
MEMORY PER NODE

128 GB (1.5 TB for Huge Mem Nodes)

MEMORY PER CORE 4.5 GB (31 GB for Huge Mem)
INTERCONNECT  Mellanox EDR Infiniband Networking (100Gbps)
LOGIN SPECIFICATIONS

4 Intel Xeon E5-2680 (Broadwell) CPUs

  • 28 cores/node and 256GB of memory/node
SPECIAL NODES

16 Huge Memory Nodes

  • Dell PowerEdge R930 
  • 4 Intel Xeon E5-4830 v3 (Haswell)
    • 12 Cores
    • 2.1 GHz
  • 48 cores (12 cores/CPU)
  • 1.5 TB Memory
  • 12 x 2 TB Drive (20TB usable)

 

Supercomputer: 
Service: 

Batch Limit Rules

Memory Limit:

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. On Owens, it equates to 4GB/core or 124GB/node.

If your job requests less than a full node ( ppn< 28 ), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4GB/core).  For example, without any memory request ( mem=XX ), a job that requests  nodes=1:ppn=1  will be assigned one core and should use no more than 4GB of RAM, a job that requests  nodes=1:ppn=3  will be assigned 3 cores and should use no more than 12GB of RAM, and a job that requests  nodes=1:ppn=28  will be assigned the whole node (28 cores) with 124GB of RAM.  

Please be careful if you include memory request (mem=XX ) in your job. A job that requests  nodes=1:ppn=1,mem=12GB  will be assigned one core and have access to 12GB of RAM, and charged for 3 cores worth of Resource Units (RU).  However, a job that requests  nodes=1:ppn=5,mem=12GB  will be assigned 5 cores but have access to only 12GB of RAM, and charged for 5 cores worth of Resource Units (RU).  See Charging for memory use for more details

A multi-node job ( nodes>1 ) will be assigned the entire nodes with 124 GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests  nodes=10:ppn=1 will be charged for 10 whole nodes (28 cores/node*10 nodes, which is 280 cores worth of RU).  

A job that requests huge-memory node ( nodes=1:ppn=48  ) will be allocated the entire huge-memory node with 1.5 TB of RAM and charged for the whole node (48 cores worth of RU).

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

Walltime Limit

Here are the queues available on Owens:

NAME

MAX WALLTIME

MAX JOB SIZE

NOTES

Serial

 168 hours

1 node

 

longserial 336 hours 1 node
  • Restricted access (contact OSC Help if you need access)

Parallel

96 hours

8 nodes

Jobs are scheduled to run within a single IB leaf switch

Largeparallel

96 hours

81 nodes

Jobs are scheduled across multiple switches

Hugemem

168 hours

1 node

16 nodes in this class
Parallel hugemem 96 hours 16 nodes
  • Restricted access (contact OSC Help if you need access)
  • Use "-q parhugemem" to access it

Debug

1 hour

2 nodes

  • 6 nodes in this class
  • Use "-q debug" to request it 

Job Limit

An individual user can have up to 256 concurrently running jobs and/or up to 3080 processors/cores in use. All the users in a particular group/project can among them have up to 384 concurrently running jobs and/or up to 4620 processors/cores in use. Jobs submitted in excess of these limits are queued but blocked by the scheduler until other jobs exit and free up resources.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. 

Supercomputer: 
Service: 

Citation

For more information about citations of OSC, visit https://www.osc.edu/citation.

To cite Owens, please use the following Archival Resource Key:

ark:/19495/hpc6h5b1

Please adjust this citation to fit the citation style guidelines required.

Ohio Supercomputer Center. 2016. Owens Supercomputer. Columbus, OH: Ohio Supercomputer Center. http://osc.edu/ark:19495/hpc6h5b1

Here is the citation in BibTeX format:

@article{Owens2016,
ark = {ark:/19495/hpc93fc8},
url = {http://osc.edu/ark:/19495/hpc6h5b1},
year  = {2016},
author = {Ohio Supercomputer Center},
title = {Owens supercomputer}
}

And in EndNote format:

%0 Generic
%T Owens supercomputer
%A Ohio Supercomputer Center
%R ark:/19495/hpc6h5b1
%U http://osc.edu/ark:/19495/hpc6h5b1
%D 2016

Here is an .ris file to better suit your needs. Please change the import option to .ris.

Documentation Attachment: 
Supercomputer: 
Service: 

Migrating jobs from Oakley or Ruby to Owens

This page includes a summary of differences to keep in mind when migrating jobs from Oakley or Ruby to Owens

Guidance for Oakley Users

Hardware Specifications

  Owens (per node) oakley (per node)
Most compute node 28 cores and 125GB of RAM 12 cores and 48GB of RAM
Large memory node    

12 cores and 192GB of RAM

(8 nodes in this class)

Huge memory node

48 cores and 1.5 TB of RAM, 12 x 2TB drives

(16 nodes in this class)

32 cores and 1TB of RAM

(1 node in this class)

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Oakley cluster.

    owens Oakley
Home directories Accessed through either the  $HOME  environment variable or the tilde notation ( ~username )

Do NOT have symlinks allowing use of the old file system paths.

Please modify your script with the new paths before you submit jobs to Owens cluster

 

 

Have the symlinks allowing use of the old file system paths. 

No action is required on your part to continue using your existing job scripts on Oakley cluster

 

 

 

Project directories Located at  /fs/project
Scratch storage Located at  /fs/scratch

See the 2016 Storage Service Upgrades page for details. 

Software Environment

Owens uses the same module system as Oakley.

Use   module load <package to add a software package to your environment. Use   module list   to see what modules are currently loaded and  module avail   to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use   module spider 

You can keep up to on the software packages that have been made available on Owens by viewing the Software by System page and selecting the Owens system.

Programming Environment

Like Oakley, Owens supports three compilers: Intel, PGI, and gnu. The default is Intel. To switch to a different compiler, use  module swap intel gnu  or  module swap intel pgi

Owens also use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect.

In addition, Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

See the Owens Programming Environment page for details.

PBS Batch-Related Command

qpeek Command is not needed on Owens. 

On Oakley, a job’s stdout and stderr data streams, which normally show up on the screen, are written to log files. These log files are stored on a server until the job ends, so you can’t look at them directly. The  qpeek  command allows you to peek at their contents. If you used the PBS header line to join the stdout and stderr streams ( #PBS -j oe ), the two streams are combined in the output log.

On Owens, a job’s stdout and stderr data streams are written to log files stored on the current working directory, i.e. $PBS_O_WORKDIR . You will see the log files immediately after your job get started. 

Accounting

The Owens cluster will charged at a rate of 1 RU per 10 core-hours.

The Oakley cluster will be charged at a rate of 1 RU per 20 core-hours.

Like Oakley, Owens will accept partial-node jobs and charge you for the number of cores proportional to the amount of memory your job requests.

Below is a comparison of job limits between Owens and Oakley:

  owens oakley
Per User Up to 256 concurrently running jobs and/or up to 3080 processors/cores in use  Up to 128 concurrently running jobs and/or up to 1500 processors/cores in use
Per group Up to 384 concurrently running jobs and/or up to 3080 processors/cores in use Up to 192 concurrently running jobs and/or up to 1500 processors/cores in use

 

Please see Queues and Reservations for Owens for more details.

Guidance for Ruby Users

Hardware Specifications

  OWENS (PER NODE) Ruby (PER NODE)
Most compute node 28 cores and 125GB of RAM 20 cores and 64GB of RAM
Huge memory node

48 cores and 1.5 TB of RAM, 12 x 2TB drives

(16 nodes in this class)

32 cores and 1TB of RAM 

(1 node in this class)

File Systems

Owens accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the Ruby cluster.

    OWENS ruby
Home directories Accessed through either the  $HOME  environment variable or the tilde notation ( ~username )

Do NOT have symlinks allowing use of the old file system paths.

Please modify your script with the new paths before you submit jobs to Owens cluster

 

 

Have the symlinks allowing use of the old file system paths. 

No action is required on your part to continue using your existing job scripts on Oakley cluster

 

 

 

Project directories Located at  /fs/project
Scratch storage Located at  /fs/scratch

See the 2016 Storage Service Upgrades page for details. 

Software Environment

Owens uses the same module system as Ruby.

Use   module load <package to add a software package to your environment. Use   module list   to see what modules are currently loaded and  module avail   to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use   module spider 

You can keep up to on the software packages that have been made available on Owens by viewing the Software by System page and selecting the Owens system.

Programming Environment

Like Ruby, Owens supports three compilers: Intel, PGI, and gnu. The default is Intel. To switch to a different compiler, use  module swap intel gnu  or  module swap intel pgi

Owens also use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect.

In addition, Owens support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

See the Owens Programming Environment page for details.

PBS Batch-Related Command

qpeek Command is not needed on Owens. 

On Ruby, a job’s stdout and stderr data streams, which normally show up on the screen, are written to log files. These log files are stored on a server until the job ends, so you can’t look at them directly. The   qpeek  command allows you to peek at their contents. If you used the PBS header line to join the stdout and stderr streams ( #PBS -j oe ), the two streams are combined in the output log.

On Owens, a job’s stdout and stderr data streams are written to log files stored on the current working directory, i.e. $PBS_O_WORKDIR . You will see the log files immediately after your job get started. 

Accounting

The Owens cluster will charged at a rate of 1 RU per 10 core-hours.

The Ruby cluster will be charged at a rate of 1 RU per 20 core-hours.

However, Owens will accept partial-node jobs and charge you for the number of cores proportional to the amount of memory your job requests. By contrast, Ruby only accepts full-node jobs and charge for the whole node. 

Below is a comparison of job limits between Owens and Ruby:

  OWENS Ruby
Per User Up to 256 concurrently running jobs and/or up to 3080 processors/cores in use  Up to 40 concurrently running jobs and/or up to 800 processors/cores in use
Per group Up to 384 concurrently running jobs and/or up to 3080 processors/cores in use Up to 80 concurrently running jobs and/or up to 1600 processors/cores in use

 

Please see Queues and Reservations for Owens for more details.

 

Supercomputer: 
Service: 

Owens SSH key fingerprints

These are the public key fingerprints for Owens (in hexadecimal format):
owens: ssh_host_rsa_key.pub = 18:68:d4:b0:44:a8:e2:74:59:cc:c8:e3:3a:fa:a5:3f
owens: ssh_host_ed25519_key.pub = 1c:3d:f9:99:79:06:ac:6e:3a:4b:26:81:69:1a:ce:83
owens: ssh_host_ecdsa_key.pub = d6:92:d1:b0:eb:bc:18:86:0c:df:c5:48:29:71:24:af


These are the SHA256 hashes (in base64 format):​
owens: ssh_host_rsa_key.pub = SHA256:vYIOstM2e8xp7WDy5Dua1pt/FxmMJEsHtubqEowOaxo
owens: ssh_host_ed25519_key.pub = SHA256:FSb9ZxUoj5biXhAX85tcJ/+OmTnyFenaSy5ynkRIgV8
owens: ssh_host_ecdsa_key.pub = SHA256:+fqAIqaMW/DUJDB0v/FTxMT9rkbvi/qVdMKVROHmAP4

Supercomputer: 

Queues and Reservations

Here are the queues available on Owens. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

 

Parallel

Available minus reservations

 96 hours

8 nodes

 

Largeparallel

Available minus reservations

96 hours

81 nodes

 

Hugemem

16

96 hours

1 node

 
Parhugemem 16 96 hours 16

Restricted access. 

Use "-q parhugemem" to request it

Debug

6 regular nodes

4 GPU nodes

1 hour 2 nodes

For small interactive and test jobs during 8AM-6PM, Monday - Friday. 

Use "-q debug" to request it 

"Available minus reservations" means all nodes in the cluster currently operational (this will fluctuate slightly), less the reservations listed below. To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if the performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

 

Occasionally, reservations will be created for specific projects that will not be reflected in these tables.

Supercomputer: 
Service: