Pitzer

TIP: Remember to check the menu to the right of the page for related pages with more information about Pitzer's specifics.

OSC's original Pitzer cluster was installed in late 2018 and is a Dell-built, Intel® Xeon® 'Skylake' processor-based supercomputer with 260 nodes.

In September 2020, OSC installed additional 398 Intel® Xeon® 'Cascade Lake' processor-based nodes as part of a Pitzer Expansion cluster.

Hardware

Photo of Pitzer Cluster

Detailed system specifications:

	Deployed in 2018	Deployed in 2020	Total
Total Compute Nodes	260 Dell nodes	398 Dell nodes	658 Dell nodes
Total CPU Cores	10,560 total cores	19,104 total cores	29,664 total cores
Standard Dense Compute Nodes	224 nodes Dual Intel Xeon 6148s Skylakes 40 cores per node @ 2.4 GHz 192 GB memory 1 TB disk space	340 nodes Dual Intel Xeon 8268s Cascade Lakes 48 cores per node @ 2.9 GHz 192 GB memory 1 TB disk space	564 nodes
Dual GPU Compute Nodes	32 nodes Dual Intel Xeon 6148s Dual NVIDIA Volta V100 w/ 16 GB GPU memory 40 cores per node @ 2.4 GHz 384 GB memory 1 TB disk space	42 nodes Dual Intel Xeon 8268s Dual NVIDIA Volta V100 w/32 GB GPU memory 48 cores per node @ 2.9 GHz 384 GB memory 1 TB disk space	74 dual GPU nodes
Quad GPU Compute Nodes	N/A	4 nodes Dual Intel Xeon 8260s Cascade Lakes Quad NVIDIA Volta V100s w/32 GB GPU memory and NVLink 48 cores per node @ 2.4 GHz 768 GB memory 4 TB disk space	4 quad GPU nodes
Large Memory Compute Nodes	4 nodes Quad Processor Intel Xeon 6148 Skylakes 80 cores per node @ 2.4 GHz 3 TB memory 1 TB disk space	12 nodes Dual Intel Xeon 8268 Cascade Lakes 48 cores per node @ 2.9 GHz 768 GB memory 0.5 TB disk space	16 nodes
Interactive Login Nodes	4 nodes Dual Intel Xeon 6148s 368 GB memory IP address: 192.148.247.[176-179]		4 nodes
InfiniBand High-Speed Network	Mellanox EDR (100 Gbps) Infiniband networking	Mellanox EDR (100 Gbps) Infiniband networking
Theoretical Peak Performance	~850 TFLOPS (CPU only) ~450 TFLOPS (GPU only) ~1300 TFLOPS (total)	~1900 TFLOPS (CPU only) ~700 TFLOPS (GPU only) ~2600 TFLOPS (total)	~2750 TFLOPS (CPU only) ~1150 TFLOPS (GPU only) ~3900 TFLOPS (total)

How to Connect

SSH Method

To login to Pitzer at OSC, ssh to the following hostname:

pitzer.osc.edu

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@pitzer.osc.edu

You may see a warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprints listed here, then type yes.

From there, you are connected to the Pitzer login node and have access to the compilers and other software development tools. You can run programs interactively or through batch requests. We use control groups on login nodes to keep the login nodes stable. Please use batch jobs for any compute-intensive or memory-intensive work. See the following sections for details.

OnDemand Method

You can also login to Pitzer at OSC with our OnDemand tool. The first step is to log into OnDemand. Then once logged in you can access Pitzer by clicking on "Clusters", and then selecting ">_Pitzer Shell Access".

Instructions on how to connect to OnDemand can be found at the OnDemand documentation page.

File Systems

Pitzer accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory as on the old clusters. Full details of the storage environment are available in our storage environment guide.

Software Environment

The module system on Pitzer is the same as on the Owens and Ruby systems. Use module load <package> to add a software package to your environment. Use module list to see what modules are currently loaded and module avail to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use module spider . By default, you will have the batch scheduling software modules, the Intel compiler, and an appropriate version of mvapich2 loaded.

You can keep up to the software packages that have been made available on Pitzer by viewing the Software by System page and selecting the Pitzer system.

Compiling Code to Use Advanced Vector Extensions (AVX2)

The Skylake processors that make Pitzer support the Advanced Vector Extensions (AVX2) instruction set, but you must set the correct compiler flags to take advantage of it. AVX2 has the potential to speed up your code by a factor of 4 or more, depending on the compiler and options you would otherwise use.

In our experience, the Intel and PGI compilers do a much better job than the gnu compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the gnu compilers, use -march=native and -O3 . The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Pitzer. The executables will not be portable. Of course, any highly optimized builds, such as those employing the options above, should be thoroughly validated for correctness.

See the Pitzer Programming Environment page for details.

Batch Specifics

On September 22, 2020, OSC switches to Slurm for job scheduling and resource management on the Pitzer Cluster.

Refer to this Slurm migration page to understand how to use Slurm on the Pitzer cluster. Some specifics you will need to know to create well-formed batch scripts:

OSC enables PBS compatibility layer provided by Slurm such that PBS batch scripts that used to work in the previous Torque/Moab environment mostly still work in Slurm.
Pitzer is a heterogeneous system with mixed types of CPUs after the expansion as shown in the above table. Please be cautious when requesting resources on Pitzer and check this page for more detailed discussions
Jobs on Pitzer may request partial nodes.

Using OSC Resources

For more information about how to use OSC resources, please see our guide on batch processing at OSC and Slurm migration. For specific information about modules and file storage, please see the Batch Execution Environment page.

Technical Specifications

Login Specifications

4 Intel Xeon Gold 6148 (Skylake) CPUs

40 cores/node and 384 GB of memory/node

Technical specifications for 2018 Pitzer:

Number of Nodes

260 nodes

Number of CPU Sockets

528 (2 sockets/node for standard node)

Number of CPU Cores

10,560 (40 cores/node for standard node)

Cores Per Node

40 cores/node (80 cores/node for Huge Mem Nodes)

Local Disk Space Per Node

850 GB in /tmp

Compute CPU Specifications

Intel Xeon Gold 6148 (Skylake) for compute

2.4 GHz
20 cores per processor

Computer Server Specifications

224 Dell PowerEdge C6420
32 Dell PowerEdge R740 (for accelerator nodes)
4 Dell PowerEdge R940

Accelerator Specifications

NVIDIA V100 "Volta" GPUs 16GB memory

Number of Accelerator Nodes

32 total (2 GPUs per node)

Total Memory

~67 TB

Memory Per Node

192 GB for standard nodes
384 GB for accelerator nodes
3 TB for Huge Mem Nodes

Memory Per Core

4.8 GB for standard nodes
9.6 GB for accelerator nodes
76.8 GB for Huge Mem

Interconnect

Mellanox EDR Infiniband Networking (100Gbps)

Special Nodes

4 Huge Memory Nodes

Dell PowerEdge R940
4 Intel Xeon Gold 6148 (Skylake)
- 20 Cores
- 2.4 GHz
80 cores (20 cores/CPU)
3 TB Memory
2x Mirror 1 TB Drive (1 TB usable)

Technical specifications for 2020 Pitzer:

Number of Nodes

398 nodes

Number of CPU Sockets

796 (2 sockets/node for all nodes)

Number of CPU Cores

19,104 (48 cores/node for all nodes)

Cores Per Node

48 cores/node for all nodes

Local Disk Space Per Node

1 TB for most nodes
4 TB for quad GPU
0.5 TB for large mem

Compute CPU Specifications

Intel Xeon 8268s Cascade Lakes for most compute

2.9 GHz
24 cores per processor

Computer Server Specifications

352 Dell PowerEdge C6420
42 Dell PowerEdge R740 (for dual GPU nodes)
4 Dell Poweredge c4140 (for quad GPU nodes)

Accelerator Specifications

NVIDIA V100 "Volta" GPUs 32GB memory for dual GPU
NVIDIA V100 "Volta" GPUs 32GB memory and NVLink for quad GPU

Number of Accelerator Nodes

42 dual GPU nodes (2 GPUs per node)
4 quad GPU nodes (4 GPUs per node)

Total Memory

~95 TB

Memory Per Node

192 GB for standard nodes
384 GB for dual GPU nodes
768 GB for quad and Large Mem Nodes

Memory Per Core

4.0 GB for standard nodes
8.0 GB for dual GPU nodes
16.0 GB for quad and Large Mem Nodes

Interconnect

Mellanox EDR Infiniband Networking (100Gbps)

Special Nodes

4 quad GPU Nodes

Dual Intel Xeon 8260s Cascade Lakes
Quad NVIDIA Volta V100s w/32GB GPU memory and NVLink
48 cores per node @ 2.4GHz
768GB memory
4 TB disk space

12 Large Memory Nodes

Dual Intel Xeon 8268 Cascade Lakes
48 cores per node @ 2.9GHz
768GB memory
0.5 TB disk space

Supercomputer:

Pitzer

Pitzer Programming Environment (PBS)

This document is obsoleted and kept as a reference to previous Pitzer programming environment. Please refer to here for the latest version.

Compilers

C, C++ and Fortran are supported on the Pitzer cluster. Intel, PGI and GNU compiler suites are available. The Intel development tool chain is loaded by default. Compiler commands and recommended options for serial programs are listed in the table below. See also our compilation guide.

The Skylake processors that make up Pitzer support the Advanced Vector Extensions (AVX512) instruction set, but you must set the correct compiler flags to take advantage of it. AVX512 has the potential to speed up your code by a factor of 8 or more, depending on the compiler and options you would otherwise use. However, bare in mind that clock speeds decrease as the level of the instruction set increases. So, if your code does not benefit from vectorization it may be beneficial to use a lower instruction set.

In our experience, the Intel and PGI compilers do a much better job than the GNU compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the GNU compilers, use -march=native and -O3. The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Owens. The executables will not be portable. Of course, any highly optimized builds, such as those employing the options above, should be thoroughly validated for correctness.

LANGUAGE	INTEL EXAMPLE	PGI EXAMPLE	GNU EXAMPLE
C	icc -O2 -xHost hello.c	pgcc -fast hello.c	gcc -O3 -march=native hello.c
Fortran 90	ifort -O2 -xHost hello.f90	pgf90 -fast hello.f90	gfortran -O3 -march=native hello.f90
C++	icpc -O2 -xHost hello.cpp	pgc++ -fast hello.cpp	g++ -O3 -march=native hello.cpp

Parallel Programming

MPI

OSC systems use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect. MPI is a standard library for performing parallel processing using a distributed-memory model. For more information on building your MPI codes, please visit the MPI Library documentation.

Parallel programs are started with the mpiexec command. For example,

mpiexec ./myprog

The program to be run must either be in your path or have its path specified.

The mpiexec command will normally spawn one MPI process per CPU core requested in a batch job. Use the -n and/or -ppn option to change that behavior.

The table below shows some commonly used options. Use mpiexec -help for more information.

MPIEXEC OPTION	COMMENT
`-ppn 1`	One process per node
`-ppn procs`	procs processes per node
`-n totalprocs` `-np totalprocs`	At most totalprocs processes per node
`-prepend-rank`	Prepend rank to output
`-help`	Get a list of available options

Caution: There are many variations on mpiexec and mpiexec.hydra. Information found on non-OSC websites may not be applicable to our installation.

The information above applies to the MVAPICH2 and IntelMPI installations at OSC. See the OpenMPI software page for mpiexec usage with OpenMPI.

OpenMP

The Intel, PGI and GNU compilers understand the OpenMP set of directives, which support multithreaded programming. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

Process/Thread placement

Processes and threads are placed differently depending on the compiler and MPI implementation used to compile your code. This section summarizes the default behavior and how to modify placement.

For all three compilers (Intel, GNU, PGI), purely threaded codes do not bind to particular cores by default.

For MPI-only codes, Intel MPI first binds the first half of processes to one socket, and then second half to the second socket so that consecutive tasks are located near each other. MVAPICH2 first binds as many processes as possible on one socket, then allocates the remaining processes on the second socket so that consecutive tasks are near each other. OpenMPI alternately binds processes on socket 1, socket 2, socket 1, socket 2, etc, with no particular order for the core id.

For Hybrid codes, Intel MPI first binds the first half of processes to one socket, and then second half to the second socket so that consecutive tasks are located near each other. Each process is allocated ${OMP_NUM_THREADS} cores and the threads of each process are bound to those cores. MVAPICH2 allocates ${OMP_NUM_THREADS} cores for each process and each thread of a process is placed on a separate core. By default, OpenMPI behaves the same for hybrid codes as it does for MPI-only codes, allocating a single core for each process and all threads of that process.

The following tables describe how to modify the default placements for each type of code.

OpenMP options:

Option	Intel	GNU	Pgi	description
Scatter	KMP_AFFINITY=scatter	OMP_PLACES=cores OMP_PROC_BIND=close/spread	MP_BIND=yes	Distribute threads as evenly as possible across system
Compact	KMP_AFFINITY=compact	OMP_PLACES=sockets	MP_BIND=yes MP_BLIST="0,2,4,6,8,10,1,3,5,7,9"	Place threads as closely as possible on system

MPI options:

OPTION	INTEL	MVAPICh2	openmpi	DESCRIPTION
Scatter	I_MPI_PIN_DOMAIN=core I_MPI_PIN_ORDER=scatter	MV2_CPU_BINDING_POLICY=scatter	-map-by core --rank-by socket:span	Distribute processes as evenly as possible across system
Compact	I_MPI_PIN_DOMAIN=core I_MPI_PIN_ORDER=compact	MV2_CPU_BINDING_POLICY=bunch	-map-by core	Distribute processes as closely as possible on system

Hybrid MPI+OpenMP options (combine with options from OpenMP table for thread affinity within cores allocated to each process):

OPTION	INTEL	MVAPICH2	OPENMPI	DESCRIPTION
Scatter	I_MPI_PIN_DOMAIN=omp I_MPI_PIN_ORDER=scatter	MV2_CPU_BINDING_POLICY=hybrid MV2_HYBRID_BINDING_POLICY=linear	-map-by node:PE=$OMP_NUM_THREADS --bind-to core --rank-by socket:span	Distrubute processes as evenly as possible across system ($OMP_NUM_THREADS cores per process)
Compact	I_MPI_PIN_DOMAIN=omp I_MPI_PIN_ORDER=compact	MV2_CPU_BINDING_POLICY=hybrid MV2_HYBRID_BINDING_POLICY=spread	-map-by node:PE=$OMP_NUM_THREADS --bind-to core	Distribute processes as closely as possible on system ($OMP_NUM_THREADS cores per process)

The above tables list the most commonly used settings for process/thread placement. Some compilers and Intel libraries may have additional options for process and thread placement beyond those mentioned on this page. For more information on a specific compiler/library, check the more detailed documentation for that library.

GPU Programming

64 Nvidia V100 GPUs are available on Pitzer. Please visit our GPU documentation.

Supercomputer:

Pitzer

Pitzer Programming Environment

Compilers

The Skylake and Cascade Lake processors that make up Pitzer support the Advanced Vector Extensions (AVX512) instruction set, but you must set the correct compiler flags to take advantage of it. AVX512 has the potential to speed up your code by a factor of 8 or more, depending on the compiler and options you would otherwise use. However, bare in mind that clock speeds decrease as the level of the instruction set increases. So, if your code does not benefit from vectorization it may be beneficial to use a lower instruction set.

In our experience, the Intel compiler usually does the best job of optimizing numerical codes and we recommend that you give it a try if you’ve been using another compiler.

With the Intel compilers, use -xHost and -O2 or higher. With the GNU compilers, use -march=native and -O3. The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

LANGUAGE	INTEL	GNU	PGI
C	icc -O2 -xHost hello.c	gcc -O3 -march=native hello.c	pgcc -fast hello.c
Fortran 77/90	ifort -O2 -xHost hello.F	gfortran -O3 -march=native hello.F	pgfortran -fast hello.F
C++	icpc -O2 -xHost hello.cpp	g++ -O3 -march=native hello.cpp	pgc++ -fast hello.cpp

Parallel Programming

MPI

MPI programs are started with the srun command. For example,

#!/bin/bash
#SBATCH --nodes=2

srun [ options ] mpi_prog

Note: the program to be run must either be in your path or have its path specified.

The srun command will normally spawn one MPI process per task requested in a Slurm batch job. Use the --ntasks-per-node=n option to change that behavior. For example,

#!/bin/bash
#SBATCH --nodes=2

# Use the maximum number of CPUs of two nodes
srun ./mpi_prog

# Run 8 processes per node
srun -n 16 --ntasks-per-node=8  ./mpi_prog

The table below shows some commonly used options. Use srun -help for more information.

OPTION	COMMENT
`--ntasks-per-node=n`	number of tasks to invoke on each node
`-help`	Get a list of available options

Note: The information above applies to the MVAPICH2, Intel MPI and OpenMPI installations at OSC.

Caution: mpiexec or mpiexec.hydra is still supported with Intel MPI and OpenMPI, but it is not fully compatible in our Slurm environment. We recommand using srun in any circumstances.

OpenMP

The Intel, GNU and PGI compilers understand the OpenMP set of directives, which support multithreaded programming. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

An OpenMP program by default will use a number of threads equal to the number of CPUs requested in a Slurm batch job. To use a different number of threads, set the environment variable OMP_NUM_THREADS. For example,

#!/bin/bash
#SBATCH --ntasks-per-node=8

# Run 8 threads
./omp_prog

# Run 4 threads
export OMP_NUM_THREADS=4
./omp_prog

To run a OpenMP job on an exclusive node:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --exclusive

export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE
./omp_prog

Interactive job only

Please use -c, --cpus-per-task=X to request an interactive job. This results in an interactive job with X CPUs available and automatically assigns the correct number of threads to the OpenMP program.

Hybrid (MPI + OpenMP)

An example of running a job for hybrid code:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --constraint=48core

# Run 4 MPI processes on each node and 12 OpenMP threads spawned from a MPI process
export OMP_NUM_THREADS=12
srun -n 8 -c 12 --ntasks-per-node=4 ./hybrid_prog

To run a job across either 40-core or 48-core nodes exclusively:

#!/bin/bash
#SBATCH --nodes=2

# Run 4 MPI processes on each node and the maximum available OpenMP threads spawned from a MPI process 
export OMP_NUM_THREADS=$(($SLURM_CPUS_ON_NODE/4))
srun -n 8 -c $OMP_NUM_THREADS --ntasks-per-node=4 ./hybrid_prog

Tuning Parallel Program Performance: Process/Thread Placement

To get the maximum performance, it is important to make sure that processes/threads are located as close as possible to their data, and as close as possible to each other if they need to work on the same piece of data, with given the arrangement of node, sockets, and cores, with different access to RAM and caches.

While cache and memory contention between threads/processes are an issue, it is best to use scatter distribution for code.

Processes and threads are placed differently depending on the computing resources you requste and the compiler and MPI implementation used to compile your code. For the former, see the above examples to learn how to run a job on exclusive nodes. For the latter, this section summarizes the default behavior and how to modify placement.

OpenMP only

For all three compilers (Intel, GNU, PGI), purely threaded codes do not bind to particular CPU cores by default. In other words, it is possible that multiple threads are bound to the same CPU core.

The following table describes how to modify the default placements for pure threaded code:

DISTRIBUTION	Compact	Scatter/Cyclic
DESCRIPTION	Place threads as closely as possible on sockets	Distribute threads as evenly as possible across sockets
INTEL	KMP_AFFINITY=compact	KMP_AFFINITY=scatter
GNU	OMP_PLACES=sockets^[1]	OMP_PROC_BIND=spread/close
PGI^[2]	MP_BIND=yes MP_BLIST="$(seq -s, 0 2 47),$(seq -s, 1 2 47)"	MP_BIND=yes

Threads in the same socket might be bound to the same CPU core.
PGI LLVM-backend (version 19.1 and later) does not support thread/processors affinity on NUMA architecture. To enable this feature, compile threaded code with --Mnollvm to use proprietary backend.

MPI Only

For MPI-only codes, MVAPICH2 first binds as many processes as possible on one socket, then allocates the remaining processes on the second socket so that consecutive tasks are near each other. Intel MPI and OpenMPI alternately bind processes on socket 1, socket 2, socket 1, socket 2 etc, as cyclic distribution.

For process distribution across nodes, all MPIs first bind as many processes as possible on one node, then allocates the remaining processes on the second node.

The following table describe how to modify the default placements on a single node for MPI-only code with the command srun:

DISTRIBUTION (single node)	Compact	Scatter/Cyclic
DESCRIPTION	Place processs as closely as possible on sockets	Distribute process as evenly as possible across sockets
MVAPICH2^[1]	Default	MV2_CPU_BINDING_POLICY=scatter
INTEL MPI	srun --cpu-bind="map_cpu:$(seq -s, 0 2 47),$(seq -s, 1 2 47)"	Default
OPENMPI	srun --cpu-bind="map_cpu:$(seq -s, 0 2 47),$(seq -s, 1 2 47)"	Default

MV2_CPU_BINDING_POLICY will not work if MV2_ENABLE_AFFINITY=0 is set.

To distribute processes evenly across nodes, please set SLURM_DISTRIBUTION=cyclic.

Hybrid (MPI + OpenMP)

For Hybrid codes, each MPI process is allocated OMP_NUM_THREADS cores and the threads of each process are bound to those cores. All MPI processes (as well as the threads bound to the process) behave as we describe in the previous sections. It means the threads spawned from a MPI process might be bound to the same core. To change the default process/thread placmements, please refer to the tables above.

Summary

GPU Programming

164 Nvidia V100 GPUs are available on Pitzer. Please visit our GPU documentation.

Reference

Supercomputer:

Pitzer

Batch Limit Rules

Pitzer includes two types of processors, Intel® Xeon® 'Skylake' processor and Intel® Xeon® 'Cascade Lake' processor. This document provides you information on how to request resources based on the requirements of # of cores, memory, etc despite the heterogeneous nature of the Pitzer cluster. Therefore, in some cases, your job can land on either type of processor. Please check guidance on requesting resources on pitzer for your job to obtain a certain type of processor on Pitzer.

We use Slurm syntax for all the discussions on this page. Please check how to pre pare slurm job script if your script is prepared in PBS syntax.

Memory limit

A small portion of the total physical memory on each node is reserved for distributed processes. The actual physical memory available to user jobs is tabulated below.

Summary

Node type	default and max memory per core	max memory per node
Skylake 40 core - regular compute	4.449 GB	177.96 GB
Cascade Lake 48 core - regular compute	3.708 GB	177.98 GB
large memory	15.5 GB	744 GB
huge memory	37.362 GB	2988.98 GB
Skylake 40 core dual gpu	9.074 GB	363 GB
Cascade 48 core dual gpu	7.562 GB	363 GB
quad gpu (48 core)	15.5 GB	744 GB

A job may request more than the max memory per core, but the job will be allocated more cores to satisfy the memory request instead of just more memory.
e.g. The following slurm directives will actually grant this job 3 cores, with 10 GB of memory
(since 2 cores * 4.5 GB = 9 GB doesn't satisfy the memory request).
#SBATCH --ntasks-per-node=2 #SBATCH --mem=10g

It is recommended to let the default memory apply unless more control over memory is needed.
Note that if an entire node is requested, then the job is automatically granted the entire node's main memory. On the other hand, if a partial node is requested, then memory is granted based on the default memory per core.

See a more detailed explanation below.

Regular Compute Node

For the regular 'Skylake' processor-based node, it has 40 cores/node. The physical memory equates to 4.8 GB/core or 192 GB/node; while the usable memory equates to 4,556 MB/core or 182,240 MB/node (177.96 GB/node).
For the regular 'Cascade Lake' processor-based node, it has 48 cores/node. The physical memory equates to 4.0 GB/core or 192 GB/node; while the usable memory equates to 3,797 MB/core or 182,256 MB/node (177.98 GB/node).

Jobs requesting no more than 1 node

If your job requests less than a full node, it may be scheduled on a node with other running jobs. In this case, your job is entitled to a memory allocation proportional to the number of cores requested (4,556 MB/core or 3,797 MB/core depending on which type of node your job lands on). For example, without any memory request ( --mem=XX ):

A job that requests --ntasks-per-node=1 and lands on a 'Skylake' node will be assigned one core and should use no more than 4556 MB of RAM; a job that requests --ntasks-per-node=1 and lands on a 'Cascade Lake' node will be assigned one core and should use no more than 3797 MB of RAM
A job that requests --ntasks-per-node=3 and lands on a 'Skylake' node will be assigned 3 cores and should use no more than 3*4556 MB of RAM; a job that requests --ntasks-per-node=3 and lands on a 'Cascade Lake' node will be assigned 3 cores and should use no more than 3*3797 MB of RAM
A job that requests --ntasks-per-node=40 and lands on a 'Skylake' node will be assigned the whole node (40 cores) with 178 GB of RAM; a job that requests --ntasks-per-node=40 and lands on a 'Cascade Lake' node will be assigned 40 cores (partial node) and should use no more than 40* 3797 MB of RAM
A job that requests --exclusive and lands on a 'Skylake' node will be assigned the whole node (40 cores) with 178 GB of RAM; a job that requests --exclusive and lands on a 'Cascade Lake' node will be assigned the whole node (48 cores) with 178 GB of RAM
A job that requests --exclusive --constraint=40core will land on a 'Skylake' node and will be assigned the whole node (40 cores) with 178 GB of RAM.

For example, with a memory request:
A job that requests --ntasks-per-node=1 --mem=16000MB and lands on 'Skylake' node will be assigned 4 cores and have access to 16000 MB of RAM, and charged for 4 cores worth of usage; a job that requests --ntasks-per-node=1 --mem=16000MB and lands on 'Cascade Lake' node will be assigned 5 cores and have access to 16000 MB of RAM, and charged for 5 cores worth of usage
A job that requests --ntasks-per-node=8 --mem=16000MB and lands on 'Skylake' node will be assigned 8 cores but have access to only 16000 MB of RAM , and charged for 8 cores worth of usage; a job that requests --ntasks-per-node=8 --mem=16000MB and lands on 'Cascade Lake' node will be assigned 8 cores but have access to only 16000 MB of RAM , and charged for 8 cores worth of usage

Jobs requesting more than 1 node

A multi-node job ( --nodes > 1 ) will be assigned the entire nodes and charged for the entire nodes regardless of --ntasks-per-node request. For example, a job that requests --nodes=10 --ntasks-per-node=1 and lands on 'Skylake' node will be charged for 10 whole nodes (40 cores/node*10 nodes, which is 400 cores worth of usage); a job that requests --nodes=10 --ntasks-per-node=1 and lands on 'Cascade Lake' node will be charged for 10 whole nodes (48 cores/node*10 nodes, which is 480 cores worth of usage).

Large Memory Node

On Pitzer, it has 48 cores per node. The physical memory equates to 16.0 GB/core or 768 GB/node; while the usable memory equates to 15,872 MB/core or 761,856 MB/node (744 GB/node).

For any job that requests no less than 363 GB/node but less than 744 GB/node, the job will be scheduled on the large memory node.To request no more than a full large memory node, you need to specify the memory request between 363 GB and 744 GB, i.e., 363GB <= mem <744GB. --mem is the total memory per node allocated to the job. You can request a partial large memory node, so consider your request more carefully when you plan to use a large memory node, and specify the memory based on what you will use.

Huge Memory Node

On Pitzer, it has 80 cores per node. The physical memory equates to 37.5 GB/core or 3 TB/node; while the usable memory equates to 38,259 MB/core or 3,060,720 MB/node (2988.98 GB/node).

To request no more than a full huge memory node, you have two options:

The first is to specify the memory request between 744 GB and 2988 GB, i.e., 744GB <= mem <=2988GB).
The other option is to use the combination of --ntasks-per-node and --partition, like --ntasks-per-node=4 --partition=hugemem . When no memory is specified for the huge memory node, your job is entitled to a memory allocation proportional to the number of cores requested (38,259 MB/core). Note, --ntasks-per-node should be no less than 20 and no more than 80

Summary

In summary, for serial jobs, we will allocate the resources considering both the # of cores and the memory request. For parallel jobs (nodes>1), we will allocate the entire nodes with the whole memory regardless of other requests. Check requesting resources on pitzer for information about the usable memory of different types of nodes on Pitzer. To manage and monitor your memory usage, please refer to Out-of-Memory (OOM) or Excessive Memory Usage.

GPU Jobs

Dual GPU Node

For the dual GPU node with 'Skylake' processor, it has 40 cores/node. The physical memory equates to 9.6 GB/core or 384 GB/node; while the usable memory equates to 9292 MB/core or 363 GB/node. Each node has 2 NVIDIA Volta V100 w/ 16 GB GPU memory.
For the dual GPU node with 'Cascade Lake' processor, it has 48 cores/node. The physical memory equates to 8.0 GB/core or 384 GB/node; while the usable memory equates to 7744 MB/core or 363 GB/node. Each node has 2 NVIDIA Volta V100 w/32GB GPU memory.

For serial jobs, we will allow node sharing on GPU nodes so a job may request either 1 or 2 GPUs (--ntasks-per-node=XX --gpus-per-node=1 or --ntasks-per-node=XX --gpus-per-node=2)

For parallel jobs (nodes>1), we will not allow node sharing. A job may request 1 or 2 GPUs ( gpus-per-node=1 or gpus-per-node=2 ) but both GPUs will be allocated to the job.

Quad GPU Node

For quad GPU node, it has 48 cores/node. The physical memory equates to 16.0 GB/core or 768 GB/node; while the usable memory equates to 15,872 MB/core or 744 GB/node.. Each node has 4 NVIDIA Volta V100s w/32 GB GPU memory and NVLink.

For serial jobs, we will allow node sharing on GPU nodes, so a job can land on a quad GPU node if it requests 3-4 GPUs per node (--ntasks-per-node=XX --gpus-per-node=3 or --ntasks-per-node=XX --gpus-per-node=4), or requests quad GPU node explicitly with using --gpus-per-node=v100-quad:4, or gets backfilled with requesting 1-2 GPUs per node with less than 4 hours long.

For parallel jobs (nodes>1), only up to 2 quad GPU nodes can be requested in a single job. We will not allow node sharing and all GPUs will be allocated to the job.

Partition time and job size limits

Here is the walltime and node limits per job for different queues/partitions available on Pitzer:

NAME	MAX TIME LIMIT (dd-hh:mm:ss)	MIN JOB SIZE	MAX JOB SIZE	NOTES
serial	7-00:00:00	1 core	1 node
longserial	14-00:00:00	1 core	1 node	Restricted access Only 40 core nodes are available
parallel	96:00:00	2 nodes	40 nodes
hugemem	7-00:00:00	1 core	1 node	There are only 4 pitzer huge memory nodes
largemem	7-00:00:00	1 core	1 node	There are 12 large memory nodes
gpuserial	7-00:00:00	1 core	1 node	Includes dual and quad GPU nodes
gpuparallel	96:00:00	2 nodes	10 nodes	Includes dual and quad GPU nodes Only up to 2 quad GPU nodes can be requested in a single job
debug	1:00:00	1 core	2 nodes
gpudebug	1:00:00	1 core	2 nodes

Total available nodes shown for pitzer may fluctuate depending on the amount of currently operational nodes and nodes reserved for specific projects.

To specify a partition for a job, either add the flag --partition=<partition-name> to the sbatch command at submission time or add this line to the job script:
#SBATCH --paritition=<partition-name>

To access one of the restricted queues, please contact OSC Help. Generally, access will only be granted to these queues if the performance of the job cannot be improved, and job size cannot be reduced by splitting or checkpointing the job.

Job/Core Limits

Max Running Job Limit					Max Core/Processor Limit
	For all types	GPU jobs	Regular debug jobs	GPU debug jobs	For all types
Individual User	384	140	4	4	3240
Project/Group	576	140	n/a	n/a	3240

An individual user can have up to the max concurrently running jobs and/or up to the max processors/cores in use. However, among all the users in a particular group/project, they can have up to the max concurrently running jobs and/or up to the max processors/cores in use.

A user may have no more than 1000 jobs submitted to both the parallel and serial job queue separately.

Supercomputer:

Pitzer

Service:

HPC

Citation

For more information about citations of OSC, visit https://www.osc.edu/citation.

To cite Pitzer, please use the following Archival Resource Key:

ark:/19495/hpc56htp

Please adjust this citation to fit the citation style guidelines required.

Ohio Supercomputer Center. 2018. Pitzer Supercomputer. Columbus, OH: Ohio Supercomputer Center. http://osc.edu/ark:19495/hpc56htp

Here is the citation in BibTeX format:

@misc{Pitzer2018,
ark = {ark:/19495/hpc56htp},
url = {http://osc.edu/ark:/19495/hpc56htp},
year  = {2018},
author = {Ohio Supercomputer Center},
title = {Pitzer Supercomputer}
}

And in EndNote format:

%0 Generic
%T Pitzer Supercomputer
%A Ohio Supercomputer Center
%R ark:/19495/hpc56htp
%U http://osc.edu/ark:/19495/hpc56htp
%D 2018

Here is an .ris file to better suit your needs. Please change the import option to .ris.

Documentation Attachment:

Pitzer Endnote Citation

Supercomputer:

Pitzer

Pitzer SSH key fingerprints

These are the public key fingerprints for Pitzer:
pitzer: ssh_host_rsa_key.pub = 8c:8a:1f:67:a0:e8:77:d5:4e:3b:79:5e:e8:43:49:0e
pitzer: ssh_host_ed25519_key.pub = 6d:19:73:8e:b4:61:09:a9:e6:0f:e5:0d:e5:cb:59:0b
pitzer: ssh_host_ecdsa_key.pub = 6f:c7:d0:f9:08:78:97:b8:23:2e:0d:e2:63:e7:ac:93

These are the SHA256 hashes:
pitzer: ssh_host_rsa_key.pub = SHA256:oWBf+YmIzwIp+DsyuvB4loGrpi2ecow9fnZKNZgEVHc
pitzer: ssh_host_ed25519_key.pub = SHA256:zUgn1K3+FK+25JtG6oFI9hVZjVxty1xEqw/K7DEwZdc
pitzer: ssh_host_ecdsa_key.pub = SHA256:8XAn/GbQ0nbGONUmlNQJenMuY5r3x7ynjnzLt+k+W1M

Supercomputer:

Pitzer

Migrating jobs from other clusters

This page includes a summary of differences to keep in mind when migrating jobs from other clusters to Pitzer.

Guidance for Oakley Users

The Oakley cluster is removed from service on December 18, 2018.

Guidance for Owens Users

Hardware Specifications

	pitzer (PER NODE)	owens (PER NODE)
Regular compute node	40 cores and 192GB of RAM 48 cores and 192GB of RAM	28 cores and 125GB of RAM
Huge memory node	48 cores and 768GB of RAM (12 nodes in this class) 80 cores and 3.0 TB of RAM (4 nodes in this class)	48 cores and 1.5TB of RAM (16 nodes in this class)

pitzer (PER NODE)

owens (PER NODE)

Regular compute node

40 cores and 192GB of RAM

48 cores and 192GB of RAM

28 cores and 125GB of RAM

Huge memory node

48 cores and 768GB of RAM

(12 nodes in this class)

80 cores and 3.0 TB of RAM

(4 nodes in this class)

48 cores and 1.5TB of RAM

(16 nodes in this class)

File Systems

Pitzer accesses the same OSC mass storage environment as our other clusters. Therefore, users have the same home directory, project space, and scratch space as on the Owens cluster.

Software Environment

Pitzer uses the same module system as Owens.

Use module load <package> to add a software package to your environment. Use module list to see what modules are currently loaded and module avail to see the modules that are available to load. To search for modules that may not be visible due to dependencies or conflicts, use module spider .

You can keep up to on the software packages that have been made available on Pitzer by viewing the Software by System page and selecting the Pitzer system.

Programming Environment

Like Owens, Pitzer supports three compilers: Intel, PGI, and gnu. The default is Intel. To switch to a different compiler, use module swap intel gnu or module swap intel pgi .

Pitzer also use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect and support the Advanced Vector Extensions (AVX2) instruction set.

See the Pitzer Programming Environment page for details.

Accounting

Below is a comparison of job limits between Pitzer and Owens:

	PItzer	Owens
Per User	Up to 256 concurrently running jobs and/or up to 3240 processors/cores in use	Up to 256 concurrently running jobs and/or up to 3080 processors/cores in use
Per group	Up to 384 concurrently running jobs and/or up to 3240 processors/cores in use	Up to 384 concurrently running jobs and/or up to 4620 processors/cores in use

Please see Queues and Reservations for Pitzer and Batch Limit Rules for more details.

Guidance for Ruby Users

The Ruby cluster is removed from service on October 29, 2020.

Supercomputer:

Pitzer

Service:

HPC

Guidance on Requesting Resources on Pitzer

In late 2018, OSC installed 260 Intel® Xeon® 'Skylake' processor-based nodes as the original Pitzer cluster. In September 2020, OSC installed additional 398 Intel® Xeon® 'Cascade Lake' processor-based nodes as part of a Pitzer Expansion cluster. This expansion makes Pitzer a heterogeneous cluster, which means that the jobs may land on different types of CPU and behaves differently if the user submits the same job script repeatedly to Pitzer but does not request the resources properly. This document provides you some general guidance on how to request resources on Pitzer due to this heterogeneous nature.

Step 1: Identify your job type

	Nodes the job may be allocated on	# of cores per node	Usable Memory	GPU
Jobs requesting standard compute node(s)	Dual Intel Xeon 6148s Skylake @2.4GHz	40	178 GB memory/node 4556 MB memory/core	N/A
Jobs requesting standard compute node(s)	Dual Intel Xeon 8268s Cascade Lakes @2.9GHz	48	178 GB memory/node 3797 MB memory/core	N/A
Jobs requesting dual GPU node(s)	Dual Intel Xeon 6148s Skylake @2.4GHz	40	363 GB memory/node 9292 MB memory/core	2 NVIDIA Volta V100 w/ 16GB GPU memory
Jobs requesting dual GPU node(s)	Dual Intel Xeon 8268s Cascade Lakes @2.9GHz	48	363 GB memory/node 7744 MB memory/core	2 NVIDIA Volta V100 w/32GB GPU memory
Jobs requesting quad GPU node(s)	Dual Intel Xeon 8260s Cascade Lakes @2.4GHz	48	744 GB memory/node 15872 MB memory/core	4 NVIDIA Volta V100s w/32GB GPU memory and NVLink
Jobs requesting large memory node(s)	Dual Intel Xeon 8268s Cascade Lakes @2.9GHz	48	744 GB memory/node 15872 MB memory/core	N/A
Jobs requesting huge memory node(s)	Quad Processor Intel Xeon 6148 Skylakes @2.4GHz	80	2989 GB memory/node 38259 MB memory/core	N/A

According to this table,

If your job requests standard compute node(s) or dual GPU node(s), it can potentially land on different types of nodes and may result in different job performance. Please follow the steps below to determine whether you would like to restrain your job to a certain type of node(s).
If your job requests quad GPU node(s), large memory node(s), or huge memory node(s), please check pitzer batch limit rules on how to request these special types of resources properly.

Step 2: Perform test

This step is to submit your jobs requesting the same resources to different types of nodes on Pitzer. For your job script is prepared with either PBS syntax or Slurm syntax:

Request 40 or 48 core nodes

#SBATCH --constraint=40core
#SBATCH --constraint=48core

Request 16gb, 32gb gpu

#SBATCH --constraint=v100
#SBATCH --constraint=v100-32g --partition=gpuserial-48core

Once the script is ready, submit your jobs to Pitzer and wait till the jobs are completed.

Step 3: Compare the results

Once the jobs are completed, you can compare the job performances in terms of core-hours, gpu-hours, walltime, etc. to determine how your job is sensitive to the type of the nodes. If you would like to restrain your job to land on a certain type of nodes based on the testing, you can add #SBATCH --constraint=. The disadvantage of this is that you may have a longer queue wait time on the system. If you would like to have your jobs scheduled as fast as possible and do not care which type of nodes your job will land on, do not include the constraint in the job request.

Supercomputer:

Pitzer

Pitzer

How to Connect

SSH Method

OnDemand Method

File Systems

Software Environment

Compiling Code to Use Advanced Vector Extensions (AVX2)

Batch Specifics

Using OSC Resources

Technical Specifications

Technical specifications for 2018 Pitzer:

Technical specifications for 2020 Pitzer:

Pitzer Programming Environment (PBS)

Compilers

Parallel Programming

MPI

OpenMP

Process/Thread placement

GPU Programming

Pitzer Programming Environment

Compilers

Parallel Programming

MPI

OpenMP

Interactive job only

Hybrid (MPI + OpenMP)

Tuning Parallel Program Performance: Process/Thread Placement

OpenMP only

MPI Only

Hybrid (MPI + OpenMP)

Summary

GPU Programming

Reference

Batch Limit Rules

Memory limit

Summary

Regular Compute Node

Jobs requesting no more than 1 node

Jobs requesting more than 1 node

Large Memory Node

Huge Memory Node

Summary

GPU Jobs

Dual GPU Node

Quad GPU Node

Partition time and job size limits

Job/Core Limits

Citation

Pitzer SSH key fingerprints

Migrating jobs from other clusters

Guidance for Oakley Users

Guidance for Owens Users

Hardware Specifications

File Systems

Software Environment

Programming Environment

Accounting

Guidance for Ruby Users

Guidance on Requesting Resources on Pitzer

Step 1: Identify your job type

Step 2: Perform test

Request 40 or 48 core nodes

Request 16gb, 32gb gpu

Step 3: Compare the results