Emergency maintenance in OSC’s data center Feb 10 2022
OSC will shut down significant portions of the Owens and Pitzer clusters for several hours this afternoon (Thursday, Feb. 10).
Owens cluster will be decommissioned on February 3, 2025. Some pages may still reference Owens after Owens is decommissioned , and we are in the process of gradually updating the content. Thank you for your patience during this transition
OSC will shut down significant portions of the Owens and Pitzer clusters for several hours this afternoon (Thursday, Feb. 10).
When requesting an interactive session in ondemand and requesting gpu resources, users may see an error similar similar to "sbatch: error: Invalid generic resource (gres) specification"
OSC staff are currently looking into this.
This issue has been resolved.
CP2K 6.1 would fail with the following error when running on Pitzer Cascade Lakes (48-core) node:
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error:
Thid could be a bug in libxsmm 1.9.0 which is released on Mar 15, 2018 (Cascade Lake is launched in 2019). The bug has been fixed in cp2k/7.1
.
You might encounter an error while pulling a large Docker image:
ERROR: toomanyrequests: Too Many Requests.
or
We found mpiexec
/mpirun
from OpenMPI can not be used in an interactive session (launched by sinteractive
) after upgrading Pitzer and Owens to Slurm 20.11.4. Please use srun
only while you use OpenMPI in an interactive session.
We will have rolling reboot of Pitzer cluster. User will expect a ~10 minute outage of login nodes at about 9AM Feb 05, 2021.
We have found some types of CP2K jobs would fail or have poor performance using cp2k.popt and cp2k.psmp from MVAPICH2 build (gnu/4.8.5 mvapich2/2.3). This version will be removed on December 15th, 2020. Please switch to Intel MPI build (gnu/7.3.0 intelmpi/2018.3).
A partial-node MPI job may fail to start using mpiexec
from intelmpi/2019.3
and intelmpi/2019.7
with error messages like
OSC is currently experiencing problems with its internal network. Interactive sessions may be slow or unresponsive, but running jobs should not be affected.
Users would encounter a MPI job failed with openmpi/3.1.0-hpcx
on Owens and Pitzer. The job would stop with the error like "There are not enough slots available in the system to satisfy the slots". Please switch to openmpi/3.1.4-hpcx
. The buggy version openmpi/3.1.0-hpcx
will be removed on August 18 2020.
==========
Resolved: We removed openmpi/3.1.0-hpcx
on August 18 2020.