Changes of Default Memory Limits

Problem Description

Our current GPFS file system is a distributed process with significant interactions between the clients. As the compute nodes being GPFS flle system clients, a certain amount of memory of each node needs to be reserved for these interactions. As a result, the maximum physical memory of each node allowed to be used by users' jobs are reduced, in order to keep the healthy performance of the file system. In addition, using swap memory is not allowed anymore. 

The table below summarizes the maximum physical memory allowed for each type of nodes on our systems:

Oakley Cluster

Node type physical memory per node Maximum memory allowed per Node 
Regular node 48GB 45GB
Big memory node 192GB 187GB
Huge memory node 1024GB (1TB) 1008GB

Ruby Cluster

NODE TYPE PHYSICAL MEMORY per node MAXIMUM MEMORY ALLOWED per node
Regular node 64GB 61GB
Debug node 128GB 124GB
Huge memory node 1024GB (1TB) 1008GB

Owens Cluster

NODE TYPE PHYSICAL MEMORY per node MAXIMUM MEMORY ALLOWED per node
Regular node 128GB 124GB
Huge memory node 1536GB

1510GB

Solutions When You Need Regular Nodes

Starting from October 27, 2016, we'll implement a new scheduling policy on all of our clusters, reflecting the reduced default memory limits. 

If you do not request memory explicitly in your job (no -l mem

Your job can be submitted and scheduled as before, and resouces will be allocated according to your requests of cores/nodes ( nodes=XX:ppn=XX ).  If you request partial node, the memory allocated to your job is proportional to the number of cores requested (4GB/core on Oakley and Owens); if you request the whole node, the memory allocated to your job is decreased, following the information summarized in the above tables. Some examples are provided below.

A request of partial node:

On Oakley, a request of nodes=1:ppn=1  will be allocated with 4GB memory, and charged for 1 core.  A request of  nodes=1:ppn=4  will be allocated with 16GB memory, and charged for 4 cores. A request of  nodes=1:ppn=11  will be allocated with 44GB memory, and charged for 11 cores. 

On Ruby, we always allocate whole nodes to jobs and charge for the whole node, with 61GB memory allocated to your job.  

On Owens, a request of  nodes=1:ppn=1   will be allocated with 4GB memory, and charged for 1 core. A request of  nodes=1:ppn=4  will be allocated with 16GB memory, and charged for 4 cores.

A request of the whole node:

A request of the whole regular node will be allocated with maximum memory allowed per node and charged for the whole node, as summarized below:

  Request memory allocated charged for
Oakley nodes=1:ppn=12  45GB 12 cores
Ruby nodes=1:ppn=20  61GB 20 cores
Owens nodes=1:ppn=28 124GB 28 cores

A request of multiple nodes:

If you have a multi-node job (  nodes>1  ), your job will be assigned the entire nodes with maximum memory allowed per node (45GB on Oakley, 61GB for Ruby, and 124GB for Owens) and charged for the entire nodes regardless of ppn request.

If you do request memory explicitly in your job (with  -l mem 

If you request memory explicily in your scirpt, please re-visit your script according to the following information. 

A request of partial node:

On Oakley, a request of  nodes=1:ppn=1,mem=4gb   will be allocated with 4GB memory, and charged for 1 core; a request of  nodes=1:ppn=2,mem=8gb   will be allocated with 8GB memory, and charged for 2 cores; a request of  nodes=1:ppn=1,mem=40gb   will be allocated with 40GB memory, and charged for 10 cores.

On Owens, a request of  nodes=1:ppn=1, mem=4gb   will be allocated with 4GB memory, and charged for 1 core.

On Ruby, we always allocate whole nodes to jobs and charge for the whole node, with 61GB memory allocated to your job. 

 A request of the whole node:

On Oakley, the maximum value you can use for -l mem is 45gb, i.e. -l mem=45gb. A request of  nodes=1:ppn=12,mem=45gb will be allocated with 45GB memory, and charged for the whole node. If you need more than 45GB memory for the job, please submit your job to big/huge memory nodes on Oakley, or switch to Owens cluster. Any request requesting mem>45gb may be re-scheduled on big memory node on Oakley, or will not be scheduled, depending on what you put in the request. 

On Ruby, the maximum value you can use for -l mem is 61gb, i.e. -l mem=61gb. A request of  nodes=1:ppn=20,mem=61gb will be allocated with 61GB memory, and charged for the whole node. If you need more than 61GB memory for the job, please submit your job to huge memory nodes on Ruby, or switch to Owens cluster. Any request requesting mem>61gb will not be scheduled. 

On Owens, the maximum value you can use for -l mem is 125gb, i.e. -l mem=125gb. A request of  nodes=1:ppn=28,mem=124gb will be allocated with 124GB memory, and charged for the whole node. If you need more than 124GB memory for the job, please submit your job to huge memory nodes. Any request requesting mem=>126gb will not be scheduled. 

A request of multiple nodes:

If you have a multi-node job (   nodes>1), your job will be assigned the entire nodes with maximum memory allowed per node (45GB on Oakley, 61GB for Ruby, and 124GB for Owens) and charged for the entire nodes.

Solutions When You Need Special Nodes

It is highly recommended that you do not put any memory request and follow the syntax below if you need any special resources.

 Oakley Cluster:

node type how to request MEMORY ALLOCATED CHARGED FOR
Big memory node

nodes=XX:ppn=12:bigmem

(XX can be 1-8)

187GB 12 cores
Huge memory node nodes=1:ppn=32 1008GB 32 cores

Ruby Cluster:

NODE TYPE HOW TO REQUEST MEMORY ALLOCATED CHARGED FOR
Debug node nodes=1:ppn=16 -q debug 124GB 16 cores
Huge memory node nodes=1:ppn=32 1008GB 32 cores

Owens Cluster:

NODE TYPE HOW TO REQUEST MEMORY ALLOCATED CHARGED FOR
Huge memory node nodes=1:ppn=48 1510GB 48 cores