Ruby

Rolling reboot of all clusters, starting from 8 AM Tuesday, June 19, 2018

Rolling reboots of all clusters, starting from 8 AM Tuesday, June 19, 2018

Rolling reboots of all clusters starting from Monday Feb 5, 2018

We will have rolling reboots of Oakley, Ruby and Owens clusters starting from Monday Feb 5, 2018.

Rolling reboot of Oakley and Ruby clusters, starting from 8:30AM October 9, 2017

We will have rolling reboots of Oakley and Ruby clusters starting from 8:30AM on Monday October 9, 2017.

Rolling reboot of all clusters, starting from Wednesday morning, April 19, 2017

1:40PM 4/27/2017 Update: Rolling reboots are completed.

3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday.

Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:

Rolling reboot of compute and login nodes of all clusters, starting from Wednesday morning, March 22, 2017

4:56PM 3/28/2017 Update: The rolling reboots of all systems are completed.

Performance Regression of GPU Nodes on Ruby

We currently have performance regression of Ruby's GPU nodes. Some of the GPU nodes on Ruby will remain in a power-saving state even after an application starts using them, resulting in performance reduction in some cases. We currently have a reservation on the GPU nodes so that we can do a rolling reboot on them to get them back into a known-good state.

We have opened a bug report with the vendor about this performance regression and how to monitor for it.

Problems with MVAPICH2

Some MVAPICH2 MPI installations on Oakley, Ruby, and Owens, such as the default module mvapich2/2.2 as well as mvapich2/2.1, appear to have a bug that is triggered by certain programs. The symptoms are 1) the program hangs or 2) the program fails with an error related to Allreduce or Bcast.

module spider/avail/show not showing MPI dependent modules

On Ruby, the commands:

module spider
module avail
module show

are not listing modules which depend on an MPI module, for example, fftw3. This is believed to be due to the way the system cache is built.

The workaround to see these modules is to use the --ignore_cache argument:

MVAPICH broken on Ruby

Update Monday February 16th -- Ruby MVAPICH2 build fixed.

Ruby's MVAPICH2 build has been fixed. Please email oschelp@osc.edu with any issues.

We are currently experiencing issues with Ruby's MVAPICH builds. This issue is expected to cause all MPI jobs to fail. System administrators are currently investigating the issue. An update will be posted as more information is available.

Search form

Ruby

Rolling reboot of all clusters, starting from 8 AM Tuesday, June 19, 2018

Rolling reboots of all clusters starting from Monday Feb 5, 2018

Rolling reboot of Oakley and Ruby clusters, starting from 8:30AM October 9, 2017

Rolling reboot of all clusters, starting from Wednesday morning, April 19, 2017

Rolling reboot of compute and login nodes of all clusters, starting from Wednesday morning, March 22, 2017

Performance Regression of GPU Nodes on Ruby

Problems with MVAPICH2

module spider/avail/show not showing MPI dependent modules

MVAPICH broken on Ruby

Pages

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links

Search form

You are here

Ruby

Pages

Upcoming Events

Recent News

Translate

State Government Links

Education Links