The Ohio Supercomputer Center (OSC) is experiencing an email delivery problem with several types of messages from MyOSC. 

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Postedsort descending Updated
Nsight GPU profiler not working due to DCGM conflict GPU, Infrastructure Resolved

UPDATE (Mar 15, 2023)

After the downtime on Mar. 14, 2023, OSC enabled a new Slurm option --gres=nsight. DCGM will be disabled on the nodes for the job with the Slurm option,... Read more

1 year 1 month ago 1 year 1 month ago
ORCA Bind to CORE Failure Software Resolved
(workaround)

The default CPU binding for ORCA jobs can fail sporadically.  The failure is almost immediate and produces a cryptic error message, e.g.:

... Read more          
12 months 22 hours ago 12 months 22 hours ago
Possible job failures due to MPI library change on Pitzer after May 20 Software Resolved

There are changes on MPI libraries on Pitzer after May 20. We will upgrade MOFED from 4.9 to 5.6 and recompile all OpenMPI and Mvapich2 against the newer MOFED version. Users with their own MPI... Read more

11 months 3 weeks ago 11 months 3 weeks ago
Intermittent home directory performance issues filesystem Resolved

Users may experience performance issues in home directory. It is recommended to use temporary directory ($TMPDIR, or scratch) or project storage to minimize the impact on... Read more

9 months 4 weeks ago 9 months 1 week ago
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

9 months 2 weeks ago 3 months 2 weeks ago
Outbound Emails from MyOSC are Blocked at MS 365 Servers Account Management, client portal Resolved

Outbound emails, including account verification emails, from MyOSC to institutional email addresses utilizing MS 365 are being blocked as phishing attempts at the institution's server.

... Read more

8 months 1 day ago 7 months 1 week ago
MOE license server down Licensing Resolved

The MOE license server is experiencing an unknown issue and potentially down.  We are working to resolve the issue.

7 months 5 days ago 7 months 4 days ago
Emergency UPS Maintenance Maintenance Resolved

A UPS in the data center requires some emergency maintenance to be undertaken at 2PM on Oct 11 2023. There is a very small chance that parts of Owens and of the C18 Pitzer nodes may lose power as... Read more

6 months 1 week ago 6 months 1 week ago
Rolling reboot of Ascend, Owens and Pitzer starting from Oct 25 2023 Owens, Pitzer Resolved

Update on Nov 8 2023:

Rolling reboots of all clusters are completed. 

Update on Nov 3 2023:

Rolling reboots of Ascend and Pitzer clusters... Read more

6 months 1 day ago 5 months 2 weeks ago
Running jobs requeued on all clusters Owens, Pitzer Resolved

The Slurm upgrades during rolling reboots of Ascend, Owens and Pitzer we performed today (Oct 25 2023) cause all running jobs on the systems requeued around 8:45am. You will not be billed for the... Read more

6 months 1 hour ago 5 months 4 weeks ago

Pages