cp2k/2023.2 can produce huge output containing MKL messages
On all clusters the cp2k executables from module cp2k/2023.2 can produce huge output files due to many many repeating errors from MKL, e.g.:
On all clusters the cp2k executables from module cp2k/2023.2 can produce huge output files due to many many repeating errors from MKL, e.g.:
MKL module files define some helper environment variables with incorrect paths. This can yield link time errors. All three clusters are affected. We are working to correct the module files. A workaround for users is to redefine the environment variable with the correct path; this requires some computational maturity. We recommend users contact oschelp@osc.edu for assistance. An example error from Cardinal with module intel-oneapi-mkl/2023.2.0 that defined environment variable MKL_LIBS_INT64 follows:
Certain MPI-IO operations with intelmpi/2019.3
may crash, fail or proceed with errors on the home directory. We do not expect the same issue on our GPFS file system, such as the project space and the scratch space. The problem might be related to the known issue reported by HDF5 group. Please read the section "Problem Reading A Collectively Written Dataset in Parallel" from HDF5 Known Issues for more detail.
2019.3
According to https://github.com/cp2k/cp2k/issues/1830 and user feedback, you may encounter Out-of-Memory (OOM) errors during long molecular dynamics (MD) simulations with CP2K 7.1 on Pitzer and Owens clusters due to a memory leak issue in Intel MPI. If you experience this problem, consider switching to a newer version available on the system.
Ascend, Cardinal, and Owens completed. Pitzer is ongoing.
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error:
Thid could be a bug in libxsmm 1.9.0 which is released on Mar 15, 2018 (Cascade Lake is launched in 2019). The bug has been fixed in cp2k/7.1
.
Users may encounter the following errors when compiling a C++ program with GCC 13:
error: 'uint64_t' in namespace 'std' does not name a type
or
You may experience a multi-node job hang if the job runs into a module that requires heavy I/O, e.g., MP2 or CCSD. Additionally, it potentially leads to our GPFS performance issue. We have identified the issue as related to the MPI I/O issue of OpenMPI 4.1. To remedy this, we will take the following procedures:
OSC is preparing to update Slurm on its production systems to version 23.11.4 on March, 27.
The Slurm upgrades during rolling reboots of Ascend, Owens and Pitzer we performed today (Oct 25 2023) cause all running jobs on the systems requeued around 8:45am. You will not be billed for the consumed resources before the jobs were requeued.
We apologize for the inconvenience this causes you. Please contact oschelp@osc.edu if you have any questions.