Python version mismatch in Jupyter + Spark instance
You may encounter the following error message when running a Spark instance using a custom kernel in the Jupyter + Spark app:
You may encounter the following error message when running a Spark instance using a custom kernel in the Jupyter + Spark app:
You may encounter errors that look similar to these when running STAR 2.7.10b on Cardinal:
STAR: bgzf.c:158: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.
It seems to be related to this issue: https://github.com/alexdobin/STAR/issues/2063
STAR bundles an older version of HTSlib which is incompatible with zlib-ng, a library we build STAR with
Use star/2.7.11b on Cardinal
You may encounter the following warning message when running a Spark instance using the default PySpark kernel in a Jupyter + Spark application:
You might encounter an error while run a container directly from a hub:
[pitzer-login01]$ apptainer run shub://vsoch/hello-world Progress |===================================| 100.0% FATAL: container creation failed: mount error: can't mount image /proc/self/fd/13: failed to find loop device: could not attach image file too loop device: No loop devices available
One solution is to remove the Singularity cached images from local cache directory $HOME/.apptainer/cache
.
You might encounter an error while pulling a large Docker image:
[pitzer-login01]$ apptainer pull docker://qimme2/core FATAL: Unable to pull docker://qiime2/core While running mksquashfs: signal: killed
The process could be killed because the image is cached in the home directory which is a slower file system or the image size might exceed a single file size limit.
The solution is to use other file systems like /fs/ess/scratch
and $TMPDIR
for caches and temp files to build the squashfs filesystem:
Certain MPI-IO operations with intelmpi/2019.3
may crash, fail or proceed with errors on the home directory. We do not expect the same issue on our GPFS file system, such as the project space and the scratch space. The problem might be related to the known issue reported by HDF5 group. Please read the section "Problem Reading A Collectively Written Dataset in Parallel" from HDF5 Known Issues for more detail.
2019.3
Intel MPI on Slurm batch system is configured to support PMI process manager. It is recommended to use srun
as MPI program launcher. If you prefer using mpiexec
/mpirun
over Hydra process manager with Slurm, please add following code to the batch script before running any MPI executable:
unset I_MPI_PMI_LIBRARY I_MPI_HYDRA_BOOTSTRAP export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 # the option -ppn only works if you set this before
GNU compiler versions 10+ may have C compiler errors like
/.libs/libmca_mtl_psm.a(mtl_psm_component.o): multiple definition of `mca_mtl_psm_component'
This is a common mistake in C is omitting extern when declaring a global variable in a header file. In previous GCC versions this error is ignored. GCC 10 defaults to -fno-common
, which means a linker error will now be reported. It is bypassable by appending the -fcommon
to compilation flags.
GNU compiler versions 10+ may have Fortran compiler errors like
Error: Type mismatch between actual argument at (1) and actual argument at (2) (REAL(4)/REAL(8))
that result in a error response during configuration
The default CPU binding for ORCA jobs can fail sporadically. The failure is almost immediate and produces a cryptic error message, e.g.