Owens

Rolling reboots of Owens cluster, starting from Feb 18, 2021

Updated on March 2:

This is completed.

Original Post:

We will have rolling reboots of Owens cluster including login and compute nodes, starting from 9AM Feb 18, 2021. The rolling reboot is to update BIOS for urgent security updates. The rolling reboots won't affect any running jobs, but users may experience longer queue wait time than usual on the cluster. User will also expect a ~10 minute outage of login nodes during the reboot of login nodes.

OpenMPI job stopped at 'There are not enough slots available in the system to satisfy the slots'

Users would encounter a MPI job failed with openmpi/3.1.0-hpcx on Owens and Pitzer. The job would stop with the error  like "There are not enough slots available in the system to satisfy the slots". Please switch to openmpi/3.1.4-hpcx. The buggy version openmpi/3.1.0-hpcx will be removed on August 18 2020.

==========

Resolved: We removed  openmpi/3.1.0-hpcx on August 18 2020.

Error 'libim_client.so: undefined reference to uuid@' with MVAPICH2 in Conda environment

Users may encoutner an error like 'libim_client.so: undefined reference to `uuid_unparse@UUID_1.0' while compiling MPI applications with mvapich2 in some Conda enivronments. We found pre-installed libuuid package from Conda conflicting with system libuuid libraries. The affected Conda packages are python/2.7-conda5.2python/3.6-conda5.2 and python/3.7-2019.10.

Incorrect MPI launcher and compiler wrappers with Conda environments python/2.7-conda5.2 and python/3.6-conda5.2

Users may encounter under-performing MPI jobs or failures of compiling MPI applications if you are using Conda from system. We found pre-installed mpich2 package in some Conda environments overrides default MPI path. The affected Conda packages are python/2.7-conda5.2 and python/3.6-conda5.2. If users experience these issues, please re-load MPI module, e.g. module load mvapich2/2.3.2 after setting up your Conda environment.

Pages