HPC

2014/07 - OSC Roadmap

The July 2014 HPC Tech Talk (Tuesday, July 22nd from 4-5PM) will provide a talk about OSC Roadmap, which includes OSC business model and service catalog, "Condo" pilot project, Ruby cluster, and FY15 capital budget. To get the WebEX information and add a calendar entry, go here. Slides are available below.

2014/04 - Invited talk: MVAPICH

The April 2014 HPC Tech Talk (Tuesday, April 22th from 4-5PM) will provide some brief OSC updates, have a user-driven Q&A session, and will close with an invited talk about MPI-3 from the MVAPICH developers from The Ohio State University. To get the WebEX information and add a calendar entry, go here. Slides are available below.

2014/03 - OSC OnDemand

The March 2014 HPC Tech Talk (Tuesday, March 18th from 4-5PM) will provide some brief OSC updates, have a user-driven Q&A session, and will close with a live demonstration of OSC's OnDemand service. You can register for the WebEX session here. Slides are available below.

Queues and Reservations

Here are the queues available on Glenn. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

Queues and Reservations

Here are the queues available on Oakley. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

Out-of-Memory (OOM) or Excessive Memory Usage

Problem description

A common problem on our systems is that a user's job causes a node out of memory or uses more than its allocated memory if the node is shared with other jobs.

If a job exhausts both the physical memory and the swap space on a node, it causes the node to crash. With a parallel job, there may be many nodes that crash. When a node crashes, the OSC staff has to manually reboot and clean up the node. If other jobs were running on the same node, the users have to be notified that their jobs failed.

HOWTO: Use VNC in a batch job

SSHing directly to a compute node at OSC - even if that node has been assigned to you in a current batch job - and starting VNC is an "unsafe" thing to do. When your batch job ends (and the node is assigned to other users), stray processes will be left behind and negatively impact other users. However, it is possible to use VNC on compute nodes safely.

Ruby

TIP: Remember to check the menu to the right of the page for related pages with more information about Ruby's specifics.
On 10/13/2016, Intel Xeon Phi coprocessors on Ruby were removed from service. Please contact OSC Help if you have any questions or want to help get access to alternative resources. 

Ruby was named after the Ohio native actress Ruby Dee.  An HP built, Intel® Xeon® processor-based supercomputer, Ruby provided almost the same amount of total computing power (~125 TF, used to be ~144 TF with Intel® Xeon® Phi coprocessors) as our former flagship system Oakley on less than half the number of nodes (240 nodes).  Ruby had has 20 nodes are outfitted with NVIDIA® Tesla K40 accelerators (Ruby used to feature two distinct sets of hardware accelerators; 20 nodes were outfitted with NVIDIA® Tesla K40 and another 20 nodes feature Intel® Xeon® Phi coprocessors).

Pages