The Ohio Supercomputer Center (OSC) is experiencing an email delivery problem with several types of messages from MyOSC. 

 OSC is preparing to update Slurm on its production systems to version 23.11.4 on March, 27. 

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Postedsort descending Updated
Emergency InfiniBand Shutdown (All systems) Network Resolved

We have returned to service. It appears that we have resolved the networking issues enough to allow jobs to run safely. We will continue working with our vendors to fix any remaining hardware... Read more

9 years 8 months ago 9 years 8 months ago
Lustre, Infiniband Operational and Being Monitored Closely filesystem Resolved

UPDATE: Most users should no longer see any issues with Lustre.


Again, please continue to notify OSC Help of any errors you see in job output. For example, you might see "... Read more

9 years 8 months ago 9 years 7 months ago
my.osc.edu logins failing Account Management Resolved

Logins to my.osc.edu are failing. This is unrelated to our InfiniBand issue; a router change at OARnet is the believed cause. They are working on re-establishing the necessary routing.

9 years 8 months ago 9 years 8 months ago
issue with OnDemand 6:09 - 8:39 pm Resolved

OnDemand, epi accounting queries, the Viper DB, the Medline DB, the Eweld DB,... Read more

9 years 7 months ago 9 years 7 months ago
Lustre jobs suspended filesystem Resolved

The Lustre filesystem ($PFSDIR and /fs/lustre) has crashed several times Friday evening (8/15). We have degraded this service temporarily, while we work to isolate the actions that are triggering... Read more

9 years 7 months ago 9 years 7 months ago
Armstrong offline until Noon Resolved

Armstrong will need to be taken down today until Noon.  In the meantime, contact OSCHelp (OSCHelp@osc.edu) for account assistance.

9 years 7 months ago 9 years 7 months ago
Lustre Updates filesystem Resolved

9/10/14 - We have not seen any additional crashes of the Lustre servers since making this change.

8/26/14 
- Lustre jobs are being accepted as of 10AM this... Read more

9 years 7 months ago 9 years 6 months ago
Statewide Intel compiler license checkout failures Licensing Resolved

This morning (9/10/14) we updated our Intel compiler licenses. We are seeing some unexpected license checkout failures in the logs (please click through to see details):

10:44:... Read more          
9 years 6 months ago 9 years 6 months ago
Oakley login node instability Operations Resolved

Oakley login nodes are seeing some instability related to Lustre. We will reboot the nodes on Thursday, October 2nd 2014 to resolve the issue. If a login node crashes before then and we have the... Read more

9 years 6 months ago 9 years 5 months ago
Scheduling suspended Batch Resolved

We have temporarily suspended scheduling due to some problems with the parallel scratch file system.

9 years 6 months ago 9 years 6 months ago

Pages