cp2k/2023.2 can produce huge output containing MKL messages
On all clusters the cp2k executables from module cp2k/2023.2 can produce huge output files due to many many repeating errors from MKL, e.g.:
On all clusters the cp2k executables from module cp2k/2023.2 can produce huge output files due to many many repeating errors from MKL, e.g.:
RELION versions prior to 5 may exhibit suboptimal performance in hybrid MPI+OpenMP jobs when the number of MPI tasks exceeds four across multiple nodes.
If possible, limit the number of MPI tasks to four or fewer to achieve optimal performance. Alternatively, consider upgrading to RELION version 5 or later, as these newer releases may include optimizations and improvements that resolve this performance issue.
When running a container using the podman or docker command on a desktop system, you may encounter an error like the following:
Error: OCI runtime error: crun: sd-bus call: Process org.freedesktop.systemd1 exited with status 1: Input/output error
A similar issue has been discussed in Podman GitHub Issue #13429, and it has been concluded that this is not a Podman bug.
If you experience a storage error such as:
write /var/tmp/storage388772891/1: no space left on device.
you may have an outdated ~/.config/containers/storage.conf file, possibly generated by an older Podman installation on a RHEL 7 system. Removing this file should resolve the issue.
Users may encounter errors when attempting to run a sandbox on GPFS-mounted directories such as /fs/scratch or /fs/ess. The error output may look like the following:
While using MVAPICH3 builds of Quantum ESPRESSO (QE), users may encounter hangs when running the CP package, which can lead to job timeouts. We recommend switching to the OpenMPI build of any QE version.
Please switch to Intel-OpenMPI version. You can access it via
module load intel/2021.10.0 openmpi/5.0.2 module load quantum-espresso/7.3.1
Note that MVAPICH3 variants will be deprecated on August 19, 2025
Update on November 24, 2025: MVAPICH 4 variants are now available for version 7.4.1.
MKL module files define some helper environment variables with incorrect paths. This can yield link time errors. All three clusters are affected. We are working to correct the module files. A workaround for users is to redefine the environment variable with the correct path; this requires some computational maturity. We recommend users contact oschelp@osc.edu for assistance. An example error from Cardinal with module intel-oneapi-mkl/2023.2.0 that defined environment variable MKL_LIBS_INT64 follows:
Users may experience their home directory running out of space after executing multiple MATLAB 2024a jobs. This issue is caused by the accumulation of multiple copies of the MathWorks Service Host in $HOME/.MathWorks/ServiceHost.
To address this, we have upgraded MATLAB 2024a to Update 7 on all clusters, as recommended in the following article: Why is the MathWorks Service Host causing issues with my cluster and/or HPC?
Users may encounter the following message and experience NCCL hangs if the first operation is a barrier when running multi-GPU training. We have identified that this issue occurs only on a single Ascend Next Gen (dual-GPU) node where the GPUs are connected via the SMP interconnect across NUMA nodes, rather than through NVLink.
You may encounter the following error message when running a Spark instance using a custom kernel in the Jupyter + Spark app: