Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort descending Description Posted Updated
Segmentation fault from openmpi/1.10-hpcx and 2.0-hpcx on Owens Owens, Software Resolved

We have found that recent MPI jobs using openmpi/1.10-hpcx and openmpi/2.0-hpcx on Owens may complete or hang until the job is killed, but receive segmentation fault. Some applications might be ... Read more

5 years 4 months ago 5 years 4 months ago
Terminal function from RStudio app through OnDemand does not work OnDemand Resolved

Terminal function from RStudio app through OnDemand does not work. It returns error as below:

Rstudio cannot launch terminals.

We are working on the... Read more

1 year 12 months ago 1 year 11 months ago
Rolling reboot of all clusters, starting from Wednesday morning, April 19, 2017 Batch, Maintenance, Owens, Ruby Resolved

1:40PM 4/27/2017 Update: Rolling reboots are completed. 

3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured... Read more

7 years 8 months ago 7 years 7 months ago
Jobs reports 'excessive memory usage' message Owens, Pitzer Resolved

... Read more

4 years 9 months ago 2 years 7 months ago
Lustre is still offline. HPC systems back up Maintenance Resolved

Day One of the scheduled downtime has been completed, and HPC operations have resumed. As planned, Lustre work will extend into Day Two. Jobs using /fs/lustre or $PFSDIR cannot run until this work... Read more

10 years 5 months ago 10 years 5 months ago
HCOLL-related failures in OpenMPI applications Cardinal, Software Resolved
(workaround)

Several applications using OpenMPI, including HDF5, Boost, Rmpi, ORCA, and CP2K, may fail with errors such as

mca_coll_hcoll_module_enable() coll_hcol: mca_coll_hcoll_save_coll_handlers... Read more          
1 month 3 weeks ago 1 month 3 weeks ago
Replacement of Owens Ethernet switches from Dec 14, 2018 Network, Owens Resolved

Updated on Jan 16, 2019, at 09:20 AM:

The replacement is done except for the three switches including the login nodes of Owens. We posted another notice for more... Read more

6 years 3 months ago 5 years 11 months ago
NFS outage on Thursday Jan 17 from 7am to 8am filesystem Resolved

Update:

This work has been canceled and will be done during downtime on Feb. 5. 

Original Post:

On Thursday, January 17th from 7 am to 8 am OSC will have a GPFS... Read more

5 years 11 months ago 5 years 11 months ago
ondemand gpu request error Nov 2021 Batch, OnDemand, Pitzer Resolved

When requesting an interactive session in ondemand and requesting gpu resources, users may see an error similar similar to  "sbatch: error: Invalid generic resource (gres) specification"

... Read more

3 years 1 week ago 3 years 1 week ago
June 7th downtime to finish at 6:30PM Connectivity, filesystem, Infrastructure, login, Login Problems, Maintenance, Operations, Outage Resolved

Update: Downtime completed at 6:30PM, June 7th.

 

The June 7th downtime is now slated to be completed at 6:30PM.  Previous estimate was 5PM.

All systems and services will... Read more

8 years 6 months ago 8 years 6 months ago

Pages