Ascend Early User Program

OSC's Ascend cluster was installed in fall 2022 and is a Dell-built, AMD EPYC™ CPUs with a NVIDIA A100 GPUs cluster devoted entirely to intensive GPU processing. In preparation for the deployment of the new hardware, OSC would like to invite selected members of the client community to participate in the Ascend Early User Program. 

Who is eligible to participate in the Early User Program?

Access to the Ascend cluster is restricted. The invitations to the early user program were sent to selected PIs in July 2022. The acceptance letters were sent to PIs in September 2022.  

Early user period

October 24 - December 12, 2022 (tentative)

Hardware

Detailed system specifications:

  • 24 Power Edge XE 8545 nodes, each with:
    • 2 AMD EPYC 7643 (Milan) processors (each with 44 usable cores); 88 usable cores/node
    • NVIDIA A100 4-GPU baseboard, ~300GB usable GPU memory, supercharged by NVIDIA NVLink
    • 921GB usable RAM
    • 12.8TB of NVMe drives  
  • NVIDIA Quantum 200Gb/s InfiniBand networking
  • 2,112 total usable cores
  • 96 total GPUs

Available software packages 

During the early access period, the programming environment and software packages will keep being updated; and the system may go down or jobs may be killed with little or no warning. If your work won't tolerate this level of instability, we recommend that you use Owens or Pitzer instead.

Selected software packages have been installed on the Ascend cluster. You can use 'module spider' to see the available packages after logging into Ascend. Also check this page after selecting "Ascend" under "System" to see the available packages. We note that the package list in the web page is not complete at this moment. 

Jupyter and RStudio apps through OnDemand are also provided.

If you need other software you want to use on Ascend, please contact OSC Help

Programming environment

  • Compilers

C, C++ and Fortran are supported on the Acend cluster. Intel and Gnu compilers are recommended. Currently the following compiler versions are available: intel/2021.4.0, intel/2021.5.0, gnu/11.2.0, gnu/10.3.0, and gnu/9.1.0

OneAPI (oneapi/2021.4.0, oneapi/2022.0.0), AMD AOCC (aocc/3.2.0) and NVIDIA compiler (nvhpc/21.9) are also available for testing, but are not fully supported on Ascend at this time.

  • Math and I/O Libraries

Intel MKL (mlk/2021.4.0), AMDBlis (amdblis/3.1), Scalapack (scalapack/2.2.0), FFTW3 (fftw3/3.3.10), gsl (gsl/2.7.1) and Boost (boost/1.78.0) math libraries are available on Acend.  HDF5 (hdf5/1.10.8, hdf5/1.12.2) and NetCDF (netcdf-c/4.8.1, netcdf-cxx4/4.3.1) are also available. Use the module spider command as described above for details on available versions. 

  • Python environment

We have two python modules: python/3.9 and miniconda3/4.10.3. You can use python/3.9 for general python programing if you don't need extra python packages other than already installed. miniconda3/4.10.3 can be used for creating your conda virtual environment. Please check this page for more details about the conda environment.

  • GPU Programming

Ascend has 4 NVIDIA A100 GPUs per node each with 80GB device memory and connected via NVLINK. CUDA versions (cuda/11.0.3, cuda/11.6.2, cuda/11.7.1) have been installed with corresponding  libraries (cuBlas, cuSparse, cuFFT, etc..). The NCCL library (nccl/2.11.4-1) and NVIDIA HPC SDK (nvhpc/21.9) are also available.  

  • Parallel Programming

MPI, OpenMP and MPI/OpenMP hybrid parallel programming paradigms are supported on Ascend. The default MPI, mvapich2/2.3.7, is recommended at this time. But OpenMPI and IntelMPI are also available. MPI performance can be sensitive to process placement on Ascend for hybrid MPI/OpenMP parallel programs. Contact OSC Help if you experience unexpected performance with your application. 

How to log into Ascend

  • SSH Method

To login to Ascend at OSC, ssh to the following hostname:

ascend.osc.edu 

You can either use an ssh client application or execute ssh on the command line in a terminal window as follows:

ssh <username>@ascend.osc.edu

From there, you are connected to the Ascend login node and have access to the compilers and other software development tools. You can run programs interactively or through batch requests. We use control groups on login nodes to keep the login nodes stable. Please use batch jobs for any compute-intensive or memory-intensive work. 

  • OnDemand Method

You can also login to Ascend at OSC with our OnDemand tool. The first step is to log into OnDemand. Then once logged in you can access Pitzer by clicking on "Clusters," and then selecting ">_Ascend Shell Access."

Scheduling policy

  • Memory limit

It is strongly suggested to consider the memory use to the available per-core memory when users request OSC resources for their jobs. It has 88 usable cores/node. The usable memory is 10,724 MB/core or 921GB/node.

  • CPU-only jobs

We reserve 1 core per 1 GPU for an Ascend node. The CPU-only job can be scheduled but can only request up to 84 cores per node. You can also request multiple nodes for one CPU-only job. 

  • Job limits 

  Max # of cores in use Max # of GPUs in use Max # of running jobs  Max # of jobs to submit
Per user 528 24 256 1000
Per project 528 24 512 n/a
  • Partition 

  Max walltime limit Min job size Max job size Note
GPU 7-00:00:00 (168 hours) 1 core with 1 GPU 528 cores/24 GPUs Can request multiple partial nodes
CPU only 4-00:00:00 (96 hours) 1 core

84 cores per node

528 cores in total

Can request multiple partial nodes
Debug 1:00:00 1 core 2 nodes 4 GPUs per node
Preemptible       To be provided later

How do the jobs get charged?

Jobs that are eligible for the early user program will not be charged. All queued jobs submitted during the early user program will be deleted from the system at the end of the early user program to avoid any unwanted charges.

All jobs submitted after the official deployment of the Ascend hardware will be charged.

The charge for core-hour and GPU-hour on Ascend is the same as the Standard compute core-hour and GPU-hour on Pitzer and Owens. Academic users can check the service costs page for more information. Please contact OSC Help if you have any questions about the charges.  

How do I find my jobs submitted during the Early User Program?

For any queued or running jobs, you can check the job information with either Slurm commands (which are discussed here) or the OSC OnDemand Jobs app by clicking "Active Jobs" and choosing "Ascend" as the cluster name.

For any completed jobs, you can check the job information using the OSC XDMoD Tool. Choose "Ascend" as "Resource." Check here for more information on how to use XDMoD.  

How do I get help?

Please feel free to contact OSC Help if you have any questions. 

Supercomputer: