It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. See Charging for memory use for more details.
Regular Compute Node
For regular compute node, the physical memory equates to 4.8 GB/core or 192 GB/node; while the usable memory equates to 4761 MB/core or 183 GB/node. See Changes of Default Memory Limits for more discussions.
If your job requests less than a full node (
ppn < 40 ), it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4761 MB/core). For example, without any memory request (
mem=XX ), a job that requests
nodes=1:ppn=1 will be assigned one core and should use no more than 4761 MB of RAM, a job that requests
nodes=1:ppn=3 will be assigned 3 cores and should use no more than 14283 MB of RAM, and a job that requests
nodes=1:ppn=40 will be assigned the whole node (40 cores).
Please be careful if you include memory request (
mem=XX ) in your job. A job that requests
nodes=1:ppn=1,mem=14283mb will be assigned one core and have access to 14283 MB of RAM, and charged for 3 cores worth of Resource Units (RU). However, a job that requests
nodes=1:ppn=5,mem=14283mB will be assigned 5 cores but have access to only 14283 MB of RAM , and charged for 5 cores worth of Resource Units (RU).
A multi-node job (
nodes > 1 ) will be assigned the entire nodes and charged for the entire nodes regardless of ppn request. For example, a job that requests
nodes=10:ppn=1 will be charged for 10 whole nodes (40 cores/node*10 nodes, which is 400 cores worth of RU).
For GPU node, the physical memory equates to 9.6 GB/core or 384 GB/node; while the memory used by the submit filter equates to 4761 MB/core or 374 GB/node.
Huge Memory Node
Node sharing is not allowed for huge memory node. A job that requests huge-memory node (
nodes=1:ppn=80 ) will be allocated the entire huge-memory node with 3019 GB of RAM and charged for the whole node (80 cores worth of RU).
In summary, for serial jobs, we will allocate the resources considering both the ppn and memory request if requesting a regular compute or GPU node. For parallel jobs (n>1) or huge memory jobs, we will allocate the entire nodes with the whole memory regardless of ppn request. Below is the summary of the physical and usable memory of different types of nodes on Pitzer. To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.
|Type of node||Physical Memory||Usable Memory|
|Regular compute||Per core||4.8 GB||4761 MB|
|Per node||192 GB (40 cores)||183 GB|
|GPU||Per core||9.6 GB||4761 MB|
|Per node||384 GB (40 cores)||374 GB|
|Huge memory||Per core||37.5 GB||n/a|
|Per node||3 TB (80 cores)||3019 GB|
There are 2 GPUs per node on Pitzer.
For serial jobs, we will allow node sharing on GPU nodes so a job may request any number of cores (up to 40) and either 1 or 2 GPUs (
nodes=1:ppn=XX: gpus=1 or gpus=2 )
For parallel jobs (n>1), we will not allow node sharing. A job may request 1 or 2 GPUs (
gpus=1 or gpus=2 ) but both GPUs will be allocated to the job.
Here are the queues available on Pitzer:
|Name||Max walltime||nodes available||min job size||max job size||notes|
|Serial||168 hours||Available minus reservations||1 core||1 node|
|Longserial||336 hours||Available minus reservations||1 core||1 node||Restricted access|
|Parallel||96 hours||Available minus reservations||2 nodes||40 nodes|
|Longparallel||TBD||Available minus reservations||2 nodes||TBD||Restricted access|
|Hugemem||48 hours||4 nodes||1 node||1 node|
|Parallel hugemem||TBD||4 nodes||2 nodes||4 nodes||Do not support for now|
|Debug-regular||1 hour||6 nodes||1 core||2 nodes||
|Debug-GPU||1 hour||2 nodes||1 core||2 nodes||
An individual user can have up to 128 concurrently running jobs and/or up to 2040 processors/cores (51 nodes, ~22% of the whole system) in use. All the users in a particular group/project can have up to 192 concurrently running jobs and/or up to 2040 processors/cores (51 nodes, ~22% of the whole system) in use.
A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately. Jobs submitted in excess of this limit will be rejected.