OSC's Ascend cluster was installed in fall 2022 and is a Dell-built, AMD EPYC™ CPUs with a NVIDIA A100 GPUs cluster devoted entirely to intensive GPU processing. In preparation for the deployment of the new hardware, OSC would like to invite selected members of the client community to participate in the Ascend Early User Program.
Who is eligible to participate in the Early User Program?
Access to the Ascend cluster is restricted. The invitations to the early user program were sent to selected PIs in July 2022. The acceptance letters were sent to PIs in September 2022.
Early user period
October 24 - December 12, 2022 (tentative)
Detailed system specifications:
- 24 Power Edge XE 8545 nodes, each with:
- 2 AMD EPYC 7643 (Milan) processors (each with 44 usable cores); 88 usable cores/node
- NVIDIA A100 4-GPU baseboard, ~300GB usable GPU memory, supercharged by NVIDIA NVLink
- 921GB usable RAM
- 12.8TB of NVMe drives
- NVIDIA Quantum 200Gb/s InfiniBand networking
- 2,112 total usable cores
- 96 total GPUs
Available software packages
Selected software packages have been installed on the Ascend cluster. You can use '
module spider' to see the available packages after logging into Ascend. Also check this page after selecting "Ascend" under "System" to see the available packages. We note that the package list in the web page is not complete at this moment.
Jupyter and RStudio apps through OnDemand are also provided.
If you need other software you want to use on Ascend, please contact OSC Help.
C, C++ and Fortran are supported on the Acend cluster. Intel and Gnu compilers are recommended. Currently the following compiler versions are available:
oneapi/2022.0.0), AMD AOCC (
aocc/3.2.0) and NVIDIA compiler (
nvhpc/21.9) are also available for testing, but are not fully supported on Ascend at this time.
- Math and I/O Libraries
Intel MKL (
mlk/2021.4.0), AMDBlis (
amdblis/3.1), Scalapack (
scalapack/2.2.0), FFTW3 (
fftw3/3.3.10), gsl (
gsl/2.7.1) and Boost (
boost/1.78.0) math libraries are available on Acend. HDF5 (
hdf5/1.12.2) and NetCDF (
netcdf-cxx4/4.3.1) are also available. Use the module spider command as described above for details on available versions.
- Python environment
We have two python modules:
miniconda3/4.10.3. You can use
python/3.9 for general python programing if you don't need extra python packages other than already installed.
miniconda3/4.10.3 can be used for creating your conda virtual environment. Please check this page for more details about the conda environment.
- GPU Programming
Ascend has 4 NVIDIA A100 GPUs per node each with 80GB device memory and connected via NVLINK. CUDA versions (
cuda/11.7.1) have been installed with corresponding libraries (cuBlas, cuSparse, cuFFT, etc..). The NCCL library (
nccl/2.11.4-1) and NVIDIA HPC SDK (
nvhpc/21.9) are also available.
- Parallel Programming
MPI, OpenMP and MPI/OpenMP hybrid parallel programming paradigms are supported on Ascend. The default MPI,
mvapich2/2.3.7, is recommended at this time. But OpenMPI and IntelMPI are also available. MPI performance can be sensitive to process placement on Ascend for hybrid MPI/OpenMP parallel programs. Contact OSC Help if you experience unexpected performance with your application.
How to log into Ascend
To login to Ascend at OSC, ssh to the following hostname:
You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:
From there, you are connected to the Ascend login node and have access to the compilers and other software development tools. You can run programs interactively or through batch requests. We use control groups on login nodes to keep the login nodes stable. Please use batch jobs for any compute-intensive or memory-intensive work.
You can also login to Ascend at OSC with our OnDemand tool. The first step is to log into OnDemand. Then once logged in you can access Pitzer by clicking on "Clusters," and then selecting ">_Ascend Shell Access."
It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. It has 88 usable cores/node. The usable memory is 10,724 MB/core or 921GB/node.
We reserve 1 core per 1 GPU for an Ascend node. The CPU-only job can be scheduled but can only request up to 84 cores per node. You can also request multiple nodes for one CPU-only job.
|Max # of cores in use||Max # of GPUs in use||Max # of running jobs||Max # of jobs to submit|
|Max walltime limit||Min job size||Max job size||Note|
|GPU||7-00:00:00 (168 hours)||1 core with 1 GPU||528 cores/24 GPUs||Can request multiple partial nodes|
|CPU only||4-00:00:00 (96 hours)||1 core||
84 cores per node
528 cores in total
|Can request multiple partial nodes|
|Debug||1:00:00||1 core||2 nodes||4 GPUs per node|
|Preemptible||To be provided later|
How do the jobs get charged?
Jobs that are eligible for the early user program will not be charged. All queued jobs submitted during the early user program will be deleted from the system at the end of the early user program to avoid any unwanted charges.
All jobs submitted after the official deployment of the Ascend hardware will be charged.
The charge for core-hour and GPU-hour on Ascend is the same as the Standard compute core-hour and GPU-hour on Pitzer and Owens. Academic users can check the service costs page for more information. Please contact OSC Help if you have any questions about the charges.
How do I find my jobs submitted during the Early User Program?
For any queued or running jobs, you can check the job information with either Slurm commands (which are discussed here) or the OSC OnDemand Jobs app by clicking "Active Jobs" and choosing "Ascend" as the cluster name.
For any completed jobs, you can check the job information using the OSC XDMoD Tool. Choose "Ascend" as "Resource." Check here for more information on how to use XDMoD.
How do I get help?
Please feel free to contact OSC Help if you have any questions.