Oakley

Oakley is an HP-built, Intel® Xeon® processor-based supercomputer, featuring more cores (8,328) on half as many nodes (694) as the center’s former flagshipsystem, the IBM Opteron 1350 Glenn Cluster. The Oakley Cluster can achieve 88 teraflops, tech-speak for performing 88 trillion floating point operations per second, or, with acceleration from 128 NVIDIA® Tesla graphic processing units (GPUs), a total peak performance of just over 154 teraflops.

 

Hardware

Photo: OSC Oakley HP Intel Xeon ClusterDetailed system specifications:

  • 8,328 total cores
    • 12 cores/node  & 48 gigabytes of memory/node
  • Intel Xeon x5650 CPUs
  • HP SL390 G7 Nodes
  • 128 NVIDIA Tesla M2070 GPUs
  • 873 GB of local disk space in '/tmp'
  • QDR IB Interconnect
    • Low latency
    • High throughput
    • High quality-of-service.
  • Theoretical system peak performance
    • 88.6 teraflops
  • GPU acceleration
    • Additional 65.5 teraflops
  • Total peak performance
    • 154.1 teraflops
  • Memory Increase
    • Increases memory from 2.5 gigabytes per core to 4.0 gigabytes per core.
  • Storage Expansion
    • Adds 600 terabytes of DataDirect Networks Lustre storage for a total of nearly two petabytes of available disk storage.
  • System Efficiency
    • 1.5x the performance of former system at just 60 percent of current power consumption.

How to Connect

To connect to Oakley, ssh to oakley.osc.edu.

Batch Specifics

Refer to the documentation for our batch environment to understand how to use PBS on OSC hardware. Some specifics you will need to know to create well-formed batch scripts:

  • Compute nodes on Oakley are 12 cores/processors per node (ppn). Parallel jobs must use ppn=12 .
  • If you need more than 48 GB of RAM per node, you may run on the 8 large memory (192 GB) nodes  on Oakley ("bigmem"). You can request a large memory node on Oakley by using the following directive in your batch script: nodes=XX:ppn=12:bigmem , where XX can be 1-8.

  • We have a single huge memory node ("hugemem"), with 1 TB of RAM and 32 cores. You can schedule this node by adding the following directive to your batch script: #PBS -l nodes=1:ppn=32 . This node is only for serial jobs, and can only have one job running on it at a time, so you must request the entire node to be scheduled on it. In addition, there is a walltime limit of 48 hours for jobs on this node.
Requesting less than 32 cores but a memory requirement greater than 192 GB will not schedule the 1 TB node! Just request nodes=1:ppn=32 with a walltime of 48 hours or less, and the scheduler will put you on the 1 TB node.
  • GPU jobs may request any number of cores and either 1 or 2 GPUs.

Using OSC Resources

For more information about how to use OSC resources, please see our guide on batch processing at OSC. For specific information about modules and file storage, please see the Batch Execution Environment page.

Supercomputer: 
Service: 

Batch Limit Rules

Memory Limit:

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. On Oakley, it equates to 4GB/core and 48GB/node.

If your job requests less than a full node ( ppn< 12), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4GB/core).  For example, without any memory request ( mem=XX ), a job that requests  nodes=1:ppn=1  will be assigned one core and should use no more than 4GB of RAM, a job that requests  nodes=1:ppn=3  will be assigned 3 cores and should use no more than 12GB of RAM, and a job that requests  nodes=1:ppn=12  will be assigned the whole node (12 cores) with 48GB of RAM.  However, a job that requests  nodes=1:ppn=1,mem=12GB  will be assigned one core but have access to 12GB of RAM, and charged for 3 cores worth of Resource Units (RU).  See Charging for memory use for more details.

A multi-node job ( nodes>1 ) will be assigned the entire nodes with 48GB/node and charged for the entire nodes regardless of ppn request. For example, a job that requests nodes=10:ppn=1 will be charged for 10 whole nodes (12 cores/node*10 nodes, which is 120 cores worth of RU). A job that requests large-memory node ( nodes=XX:ppn=12:bigmem, XX can be 1-8) will be allocated the entire large-memory node with 192GB of RAM and charged for the whole node (12 cores worth of RU). A job that requests huge-memory node ( nodes=1:ppn=32 ) will be allocated the entire huge-memory node with 1TB of RAM and charged for the whole node (32 cores worth of RU).

To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

GPU Limit:

On Oakley, GPU jobs may request any number of cores and either 1 or 2 GPUs ( nodes=XX:ppn=XX: gpus=1 or gpus=2 ). The memory limit depends on the ppn request and follows the rules in Memory Limit.

Walltime Limit

Here are the queues available on Oakley:

NAME

MAX WALLTIME

MAX JOB SIZE

NOTES

Serial

168 hours

1 node

 

Longserial

336 hours

1 node

Restricted access

Parallel

96 hours

170 nodes

 

Longparallel

250 hours

230 nodes

Restricted access

Hugemem

48 hours

1 node

32 core with 1 TB RAM

nodes=1:ppn=32

Debug

1 hour

12 nodes

 

Job Limit

An individual user can have up to 128 concurrently running jobs and/or up to 2040 processors/cores in use. All the users in a particular group/project can among them have up to 192 concurrently running jobs and/or up to 2040 processors/cores in use. Jobs submitted in excess of these limits are queued but blocked by the scheduler until other jobs exit and free up resources.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. Jobs submitted in excess of this limit will be rejected.

Supercomputer: 

Citation

To cite Oakley, please use the following Archival Resource Key:

ark:/19495/hpc0cvqn

Here is the citation in BibTeX format:

@misc{Oakley2012,
ark = {ark:/19495/hpc0cvqn},
howpublished = {\url{http://osc.edu/ark:/19495/hpc0cvqn}},
year  = {2012},
author = {Ohio Supercomputer Center},
title = {Oakley supercomputer}
}

And in EndNote format:

%0 Generic
%T Oakley supercomputer
%A Ohio Supercomputer Center
%R ark:/19495/hpc0cvqn
%U http://osc.edu/ark:/19495/hpc0cvqn
%D 2012
Supercomputer: 

Queues and Reservations

Here are the queues available on Oakley. Please note that you will be routed to the appropriate queue based on your walltime and job size request.

Name Nodes available max walltime max job size notes

Serial

Available minus reservations

168 hours

1 node

 

Longserial

Available minus reservations

336 hours

1 node

Restricted access

Parallel

Available minus reservations

96 hours

170 nodes

 

Longparallel

Available minus reservations

250 hours

230 nodes

Restricted access

Hugemem

1

48 hours

1 node

 

"Available minus reservations" means all nodes in the cluster currently operational (this will fluctuate slightly), less the reservations listed below. To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

In addition, there are a few standing reservations.

Name Times Nodes Available Max Walltime Max job size notes
Debug 8AM-6PM Weekdays 12 1 hour 12 nodes For small interactive and test jobs.
GPU ALL 62 336 hours 62 nodes

Small jobs not requiring GPUs from the serial and parallel queues will backfill on this reservation.

OneTB ALL 1 48 hours 1 node Holds the 32 core, 1 TB RAM node aside for the hugemem queue.

 

Occasionally, reservations will be created for specific projects that will not be reflected in these tables.

Supercomputer: 
Service: