Slurm to be Upgraded to Version 23.11.4
OSC is preparing to update Slurm on its production systems to version 23.11.4 on March, 27.
OSC is preparing to update Slurm on its production systems to version 23.11.4 on March, 27.
The Slurm upgrades during rolling reboots of Ascend, Owens and Pitzer we performed today (Oct 25 2023) cause all running jobs on the systems requeued around 8:45am. You will not be billed for the consumed resources before the jobs were requeued.
We apologize for the inconvenience this causes you. Please contact oschelp@osc.edu if you have any questions.
Update on Nov 8 2023:
Rolling reboots of all clusters are completed.
Update on Nov 3 2023:
Rolling reboots of Ascend and Pitzer clusters are completed.
Original Post:
We will have rolling reboots of Ascend, Owens and Pitzer clusters including login and compute nodes, starting from 9AM Wednesday October 25, to perform NVIDIA driver and Slurm upgrades.
A threading code with MPI where MPI_Init_thread uses MPI_THREAD_MULTIPLE will fail because UCX from HPCX package is built without enabling multi-threading. UCX is the default framework for OMPI 4.0 and above.
Affects versions
Owens: openmpi/4.0.3-hpcx, openmpi/4.1.2-hpcx, penmpi/4.1.4-hpcx
Ascend: openmpi/4.1.3
We will have rolling reboots of Owens and Pitzer clusters including login and compute nodes, starting from 9AM Monday, July 11 2022.
Updates on Feb 25 2022:
This issue is fixed.
Original Post:
Users may see an issue of missing shared library with some mvapich2 modules on Pitzer and Owens. The error is like
<path_to_executable>: error while loading shared libraries: libim_client.so.0: cannot open shared object file: No such file or directory
We are in the process of rebuilding mvapich2 versions that are affected.
OSC will shut down significant portions of the Owens and Pitzer clusters for several hours this afternoon (Thursday, Feb. 10).
You might encounter an error while pulling a large Docker image:
ERROR: toomanyrequests: Too Many Requests.
or
We found mpiexec
/mpirun
from OpenMPI can not be used in an interactive session (launched by sinteractive
) after upgrading Pitzer and Owens to Slurm 20.11.4. Please use srun
only while you use OpenMPI in an interactive session.
Updated on Feb 25:
StarCCM license outage is restored.
Original post:
OSC's starccm software license will expire at 12 a.m., Sunday, Feb 21, 2021, making the software unavailable until the license is renewed.