We've been experiencing some instability on the clusters (particularly Cardinal and Ascend). 

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Posted Updated
Python version mismatch in Jupyter + Spark instance Software Resolved
(workaround)

You may encounter the following error message when running a Spark instance using a custom kernel in the Jupyter + Spark app:

25/04/25 10:49:01 WARN TaskSetManager: Lost task 0.0 in... Read more          
18 hours 37 min ago 18 hours 37 min ago
WARN SparkSession in Jupyter + Spark instance Software Resolved
(workaround)

You may encounter the following warning message when running a Spark instance using the default PySpark kernel in a Jupyter + Spark application:

WARN SparkSession: Using an... Read more          
2 days 16 hours ago 19 hours 17 min ago
Instability on Clusters after May 13 Downtime Unresolved

We've been experiencing some instability on the clusters (particularly Cardinal and Ascend) following the recent May 13 downtime, especially with parallel job processing. If you notice any unusual... Read more

21 hours 25 min ago 21 hours 25 min ago
STAR error bgzf_open: Assertion failed Cardinal, Software Resolved
(workaround)

You may encounter errors that look similar to these when running STAR 2.7.10b on Cardinal:

STAR: bgzf.c:158: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.
... Read more
21 hours 40 min ago 21 hours 33 min ago
Singularity: failed to run a container directly or pull an image from Singularity or Docker hub Software Resolved
(workaround)

You might encounter an error while run a container directly from a hub:

[pitzer-login01]$ apptainer run shub://vsoch/hello-world
Progress |===================================| 100.0%... Read more          
2 days 20 hours ago 2 days 20 hours ago
Singularity: failed to pull a large Docker image Software Resolved
(workaround)

You might encounter an error while pulling a large Docker image:

[pitzer-login01]$ apptainer pull docker://qimme2/core
FATAL: Unable to pull docker://qiime2/core While running... Read more          
2 days 20 hours ago 2 days 20 hours ago
Singularity: reached your pull rate limit Software Resolved
(workaround)

You might encounter an error while pulling a large Docker image:

ERROR: toomanyrequests: Too Many Requests.

or

You have reached your pull rate limit. You may... Read more          
3 years 11 months ago 2 days 20 hours ago
MPI-IO issues on home directories with Intel MPI 2019.3 Pitzer, Software Resolved
(workaround)

Certain MPI-IO operations with intelmpi/2019.3 may crash, fail or proceed with errors on the home directory. We do not expect the same issue on our GPFS file system, such as the... Read more

2 days 21 hours ago 2 days 21 hours ago
Using mpiexec/mpirun with Intel MPI on Slurm Software Resolved
(workaround)

Intel MPI on Slurm batch system is configured to support PMI process manager. It is recommended to use srun as MPI program launcher. If you prefer using mpiexec/... Read more

2 days 21 hours ago 2 days 21 hours ago
A partial-node MPI job failed to start using Intel MPI mpiexec Owens, Pitzer, Software Resolved
(workaround)

A partial-node MPI job may fail to start using mpiexec from intelmpi/2019.3 and intelmpi/2019.7 with error messages like

[mpiexec@o0439.ten.osc.... Read more          
4 years 6 months ago 2 days 21 hours ago

Pages