Pitzer

Reduced Pitzer capacity starting Dec. 12, 2022

Beginning Monday, December 12, 2022, at 7 a.m., OSC will be taking the 40-core Pitzer nodes offline to replace the liquid cooling unit. We anticipate this work may take until Friday, December 16 to complete. Given uncertainties about the time necessary to complete this work, OSC engineers opted to begin this work before the December 13 downtime to increase the likelihood of the nodes returning online before the following weekend and to minimize the total outage. We are working with our vendors to reduce the outage duration as much as possible.

NVHPC

NVHPC, or NVIDIA HPC SDK, C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC® directives, and CUDA®. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming. Performance profiling and debugging tools simplify porting and optimization of HPC applications, and containerization tools enable easy deployment on-premises or in the cloud.

System Downtime December 13, 2022

A downtime for OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, December 13, 2022. The downtime will affect the Pitzer, Owens and Ascend Clusters, web portals, and HPC file servers. MyOSC (https://my.osc.edu) and state-wide licenses will be available during the downtime. In preparation for the downtime, the batch scheduler will not start jobs that cannot be completed before 7 a.m., December 13, 2022. Jobs that are not started on clusters will be held until after the downtime and then started once the system is returned to production status.

NCCL

The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnects within a node and over NVIDIA Mellanox Network across nodes.

oneAPI

oneAPI is an open, cross-industry, standards-based, unified, multiarchitecture, multi-vendor programming model that delivers a common developer experience across accelerator architectures – for faster application performance, more productivity, and greater innovation. The oneAPI initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

Miniconda3

Miniconda3 is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.

Availability and Restrictions

Versions

Miniconda is available on the Ascend Cluster. The versions currently available at OSC are:

2022 Storage Service Upgrades

In October 2022, OSC retires the Data Direct Networks (DDN) GRIDScaler system deployed in 2016 and expands the IBM Elastic Storage System (ESS) for both Project and global Scratch services. This expands the total capacity of Project and Scratch storage at OSC to ~16 petabytes with better performance.

System Downtime October 11, 2022

A downtime for OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, Oct. 11 2022. The downtime will affect the Pitzer and Owens Clusters, web portals, and HPC file servers. MyOSC (my.osc.edu) and state-wide licenses will be available during the downtime. In preparation for the downtime, the batch scheduler will not start jobs that cannot be completed before 7 a.m., Oct. 11 2022. Jobs that are not started on clusters will be held until after the downtime and then started once the system is returned to production status.

Pages