We are currently experiencing outages affecting multiple services, including OnDemand (ondemand.osc.edu) and login nodes of HPC systems.

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort descending Description Posted Updated
Lustre bug causing Oakley login node crashes filesystem, login Resolved

Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.  The bug (or issue otherwise) seems to be activated when a user does operations on a... Read more

9 years 9 months ago 9 years 8 months ago
"Forgot your password?" Unavailable Account Management, client portal Resolved

Password changes cannot be completed via the "Forgot your password?" tool at the login page of the Client Portal (my.osc.edu).

Passwords can be changed once you log into the Client Portal... Read more

5 years 11 months ago 5 years 11 months ago
Security Vulnerability for GPFS filesystem Resolved

Update: The fix was deployed during May 19 Downtime. 

Clients are not able to use mm* commands to manipulate GPFS ACLs on most OSC systems, due to a security vulnerability... Read more

5 years 1 month ago 5 years 3 weeks ago
Handling full-node MPI warnings with MVAPICH 3.0 Ascend, Cardinal Resolved
(workaround)

When running a full-node MPI job with MVAPICH 3.0 , you may encounter the following warning message:

[][mvp_generate_implicit_cpu_mapping] WARNING: You appear to be running at full... Read more          
7 months 2 weeks ago 1 month 1 week ago
Scratch and Project are hung; schedulings have been paused Batch, filesystem Resolved

1:00PM 4/6/2017 Update:  The Scratch and Project file systems are back to normal service. Scheduling on systems are resumed. We are still investigating the causes to this problem... Read more

8 years 2 months ago 8 years 2 months ago
MyOSC budget balance may not be correct client portal Resolved

Resolved

Version 3.0.1 was deployed which patches this issue. View the changelog for details.

Original post

... Read more

3 years 3 months ago 2 years 9 months ago
Oakley login node instability Operations Resolved

Oakley login nodes are seeing some instability related to Lustre. We will reboot the nodes on Thursday, October 2nd 2014 to resolve the issue. If a login node crashes before then and we have the... Read more

10 years 8 months ago 10 years 7 months ago
Rolling reboots of Owens and Pitzer, starting from Tuesday, Jan 22, 2019 Batch, login, Owens Resolved

... Read more

6 years 5 months ago 6 years 4 months ago
Email Issues client portal Resolved

OSU is having ongoing periodic problems with Microsoft (their mail hosting provider) severely delaying outbound email. There is no solution being offered and no timeline for getting it resolved.... Read more

5 years 8 months ago 5 years 4 months ago
Slurm database repair on 01/25/2024 Outage Resolved

We have scheduled a Slurm database repair, which is planned to start at 8:30 am US/Eastern on Thursday, January 25, 2024. During the repair, Slurm database will be offline; running jobs and... Read more

1 year 4 months ago 1 year 4 months ago

Pages