The Ohio Supercomputer Center (OSC) is experiencing an email delivery problem with several types of messages from MyOSC. 

OSC has conducted thorough validations to ensure the integrity of our backup data for the /fs/ess filesystem and are beginning full-scan backups of the home directories.

Job failures on some rolling-rebooted nodes on Owens since April 16, 2018

Category: 
Resolution: 
Resolved

3:35 PM 4/30/2018 Update:

The cause is that NFSv4.1 is not configured correctly after OS on Owens was updated from RHEL 7.3 to 7.4. We re-rebooted the Owens compute nodes using NFSv4.0 with the correct configuration, which fixes the problem. Please contact oschelp@osc.edu if you have any questions.

++++++++++++++++++++++++++++++++++++++++++++

9:30 AM 4/18/2018 Original Post:

Users may have been experiencing job failures on Owens cluster since April 16, 2018. Some Owens nodes after being rebooted fail to pick up the new filesystem, and simply hang after users' jobs have been allocated to these nodes. Your job may be impacted by this issue if it reports almost zero CPU usage.  

We are actively investigating the issue and will update the community as more is known. We apologize for any inconvenience this may cause you. Please contact oschelp@osc.edu if you have any questions.