Owens

Slurm Migration Issues

This page documents the known issues for migrating jobs from Torque to Slurm.

$PBS_NODEFILE and $SLURM_JOB_NODELIST

Please be aware that $PBS_NODEFILE is a file while $SLURM_JOB_NODELIST is a string variable. 

The analog on Slurm to cat $PBS_NODEFILE is srun hostname | sort -n 

 
1 Start 2 Complete

Please report the problem here when you use Slurm

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.

Slurm Migration

Overview

Slurm, which stands for Simple Linux Utility for Resource Management, is a widely used open-source HPC resource management and scheduling system that originated at Lawrence Livermore National Laboratory.

It is decided that OSC will be implementing Slurm for job scheduling and resource management, to replace the Torque resource manager and Moab scheduling system that it currently uses, over the course of 2020.

Backup failures for Project on August 1st and 2nd

OSC experienced backup failures on our GPFS file systems (both Project file systems, /fs/project and /fs/ess) the mornings of August 1st and 2nd. The underlying cause was identified and backups were operating as expected the morning of August 3rd. As a result of these failed backups, OSC will not be able to complete some file restore requests for files changed between approximately 2020-07-31 02:30 through 2020-08-02 02:30.

System Downtime August 18, 2020

A downtime for all OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, August 18, 2020. The downtime will affect the Pitzer, Ruby and Owens Clusters, web portals and HPC file servers. Login services, except for my.osc.edu, will not be available during this time. OSC clients are able to log into my.osc.edu during the downtime but no changes will take place until the downtime is completed. In preparation for the downtime, the batch scheduler will begin holding jobs that cannot be completed before 7 a.m., August 18, 2020.

Pages