HPC

2014/03 - OSC OnDemand

The March 2014 HPC Tech Talk (Tuesday, March 18th from 4-5PM) will provide some brief OSC updates, have a user-driven Q&A session, and will close with a live demonstration of OSC's OnDemand service. You can register for the WebEX session here. Slides are available below.

Queues and Reservations

Here are the queues available on Glenn. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

Queues and Reservations

Here are the queues available on Oakley. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

Out-of-Memory (OOM) or Excessive Memory Usage

Problem description

A common problem on our systems is that a user's job causes a node out of memory or uses more than its allocated memory if the node is shared with other jobs.

If a job exhausts both the physical memory and the swap space on a node, it causes the node to crash. With a parallel job, there may be many nodes that crash. When a node crashes, the OSC staff has to manually reboot and clean up the node. If other jobs were running on the same node, the users have to be notified that their jobs failed.

HOWTO: Use VNC in a batch job

SSHing directly to a compute node at OSC - even if that node has been assigned to you in a current batch job - and starting VNC is an "unsafe" thing to do. When your batch job ends (and the node is assigned to other users), stray processes will be left behind and negatively impact other users. However, it is possible to use VNC on compute nodes safely.

Ruby

TIP: Remember to check the menu to the right of the page for related pages with more information about Ruby's specifics.
On 10/13/2016, Intel Xeon Phi coprocessors on Ruby were removed from service. Please contact OSC Help if you have any questions or want to help get access to alternative resources. 

Ruby was named after the Ohio native actress Ruby Dee.  An HP built, Intel® Xeon® processor-based supercomputer, Ruby provided almost the same amount of total computing power (~125 TF, used to be ~144 TF with Intel® Xeon® Phi coprocessors) as our former flagship system Oakley on less than half the number of nodes (240 nodes).  Ruby had has 20 nodes are outfitted with NVIDIA® Tesla K40 accelerators (Ruby used to feature two distinct sets of hardware accelerators; 20 nodes were outfitted with NVIDIA® Tesla K40 and another 20 nodes feature Intel® Xeon® Phi coprocessors).

Storage Documentation

Home directory

Policy

Please revew the OSC Home storage policy in our Policy page.

Usage

Each user ID has a home directory on the NetApp WAFL service. You have the same home directory regardless of what system you’re on, including all login nodes and all compute nodes, so your files are accessible everywhere. Most of your work in the login environment will be done in your home directory.

Pages