Gateway

Slurm to be Upgraded to Version 23.11.4

OSC is preparing to update Slurm on its production systems to version 23.11.4 in preparation for the deployment of the new Cardinal cluster. This version of Slurm has a number of improvements, but it also has a known regression in behavior where if a job requests both a total number of tasks (--ntasks=N) and a number of tasks per node (--ntasks-per-node=n), the number of tasks per node takes precedence. OSC users are strongly encouraged to review their job scripts for jobs that request both --ntasks and --ntasks-per-node. Jobs should request either --ntasks or --ntasks-per-node, not both.

Submission deadline for OSC Research Symposium is March 9

We invite you to submit a presentation to the OSC Research Symposium, a new opportunity for members of the research computing community to share and discuss their work. The deadline for submissions is March 9. The event, which will be held on Tuesday, April 9, will feature posters, flash talks, birds of a feather sessions and OSC-facilitated breakout sessions. Please visit https://www.osc.edu/supercomputing/sug to learn more about contributing to the Research Symposium.

Slurm database repair on 01/25/2024

We have scheduled a Slurm database repair, which is planned to start at 8:30 am US/Eastern on Thursday, January 25, 2024. During the repair, Slurm database will be offline; running jobs and OnDemand jobs will not be affected; any jobs using the '--clusters/-M' flag will receive errors; commands including sacct will be unavailable. The anticipated duration of the outage is expected to be up to 2 hours.

System Downtime December 19 2023

A downtime for OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, December 19, 2023. The downtime will affect the Pitzer, Owens and Ascend Clusters, web portals, HPC file servers, and state-wide licenses. MyOSC (the client portal) will be available during the downtime. In preparation for the downtime, the batch scheduler will not start jobs that cannot be completed before 7 a.m., December 19. Jobs that are not started on clusters will be held until after the downtime and then started once the system is returned to production status.

Rolling reboot of all clusters starting from Oct 25 2023

We will have rolling reboots of Ascend, Owens and Pitzer clusters including login and compute nodes, starting from 9AM Wednesday October 25, to perform NVIDIA driver and Slurm upgrades. At the start of the rolling reboot all login nodes will be unavailable for about 10 minutes. The rolling reboots won't affect any running batch jobs, but users may experience longer queue wait time than usual on the clusters.

System Downtime August 8 2023

A downtime for OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, August 8, 2023. The downtime will affect the Pitzer, Owens and Ascend Clusters, web portals, and HPC file servers. MyOSC and state-wide licenses will be available during the downtime. See this link for more details: https://www.osc.edu/calendar/events/2023_08_08-system_downtime_august_8_2023

Pages