Software

Some MKL environment variables have incorrect paths

MKL module files define some helper environment variables with incorrect paths.  This can yield link time errors.  All three clusters are affected.  We are working to correct the module files.   A workaround for users is to redefine the environment variable with the correct path; this requires some computational maturity.  We recommend users contact oschelp@osc.edu for assistance.  An example error from Cardinal with module intel-oneapi-mkl/2023.2.0 that defined environment variable MKL_LIBS_INT64 follows:

Resolved: Home directory space Issue with MATLAB 2024a

Users may experience their home directory running out of space after executing multiple MATLAB 2024a jobs. This issue is caused by the accumulation of multiple copies of the MathWorks Service Host in $HOME/.MathWorks/ServiceHost.

To address this, we have upgraded MATLAB 2024a to Update 7 on all clusters, as recommended in the following article: Why is the MathWorks Service Host causing issues with my cluster and/or HPC?

Singularity: failed to run a container directly or pull an image from Singularity or Docker hub

You might encounter an error while run a container directly from a hub:

[pitzer-login01]$ apptainer run shub://vsoch/hello-world
Progress |===================================| 100.0%
FATAL: container creation failed: mount error: can't mount image /proc/self/fd/13: failed to find loop device: could not attach image file too loop device: No loop devices available

One solution is to remove the Singularity cached images from local cache directory $HOME/.apptainer/cache.

Singularity: failed to pull a large Docker image

You might encounter an error while pulling a large Docker image:

[pitzer-login01]$ apptainer pull docker://qimme2/core
FATAL: Unable to pull docker://qiime2/core While running mksquashfs: signal: killed

The process could be killed because the image is cached in the home directory which is a slower file system or the image size might exceed a single file size limit.

The solution is to use other file systems like /fs/ess/scratch and $TMPDIR for caches and temp files to build the squashfs filesystem:

MPI-IO issues on home directories with Intel MPI 2019.3

Certain MPI-IO operations with intelmpi/2019.3 may crash, fail or proceed with errors on the home directory. We do not expect the same issue on our GPFS file system, such as the project space and the scratch space. The problem might be related to the known issue reported by HDF5 group. Please read the section "Problem Reading A Collectively Written Dataset in Parallel" from HDF5 Known Issues for more detail.

Affected versions

2019.3

Using mpiexec/mpirun with Intel MPI on Slurm

Intel MPI on Slurm batch system is configured to support PMI process manager. It is recommended to use srun as MPI program launcher. If you prefer using mpiexec/mpirun over Hydra process manager with Slurm,  please add following code to the batch script before running any MPI executable:

unset I_MPI_PMI_LIBRARY I_MPI_HYDRA_BOOTSTRAP
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0   # the option -ppn only works if you set this before

Pages