| Ohio Supercomputer Center

Addressing CP2K 7.1 Memory Issues on Pitzer and Owens Clusters

According to https://github.com/cp2k/cp2k/issues/1830 and user feedback, you may encounter Out-of-Memory (OOM) errors during long molecular dynamics (MD) simulations with CP2K 7.1 on Pitzer and Owens clusters due to a memory leak issue in Intel MPI. If you experience this problem, consider switching to a newer version available on the system.

Rolling reboots on all HPC systems starting Oct 31 2024

Ascend, Cardinal, and Owens completed. Pitzer is ongoing.

CP2K 6.1 Floating-point exception on Pitzer Cascade Lakes (48-core) node

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

Thid could be a bug in libxsmm 1.9.0 which is released on Mar 15, 2018 (Cascade Lake is launched in 2019). The bug has been fixed in cp2k/7.1.

Multi-node job hang with ORCA 5

You may experience a multi-node job hang if the job runs into a module that requires heavy I/O, e.g., MP2 or CCSD. Additionally, it potentially leads to our GPFS performance issue. We have identified the issue as related to the MPI I/O issue of OpenMPI 4.1. To remedy this, we will take the following procedures:

Slurm to be Upgraded to Version 23.11.4

OSC is preparing to update Slurm on its production systems to version 23.11.4 on March, 27.

Running jobs requeued on all clusters

The Slurm upgrades during rolling reboots of Ascend, Owens and Pitzer we performed today (Oct 25 2023) cause all running jobs on the systems requeued around 8:45am. You will not be billed for the consumed resources before the jobs were requeued.

We apologize for the inconvenience this causes you. Please contact oschelp@osc.edu if you have any questions.

Rolling reboot of Ascend, Owens and Pitzer starting from Oct 25 2023

Update on Nov 8 2023:

Rolling reboots of all clusters are completed.

Update on Nov 3 2023:

Rolling reboots of Ascend and Pitzer clusters are completed.

Original Post:

We will have rolling reboots of Ascend, Owens and Pitzer clusters including login and compute nodes, starting from 9AM Wednesday October 25, to perform NVIDIA driver and Slurm upgrades.

MPI_THREAD_MULTIPLE is not supported with OpenMPI-HPCX 4.x

A threading code with MPI where MPI_Init_thread uses MPI_THREAD_MULTIPLE will fail because UCX from HPCX package is built without enabling multi-threading. UCX is the default framework for OMPI 4.0 and above.

Affects versions

Owens: openmpi/4.0.3-hpcx, openmpi/4.1.2-hpcx, penmpi/4.1.4-hpcx
Ascend: openmpi/4.1.3

Rolling reboot of Owens and Pitzer starting from July 11, 2022

We will have rolling reboots of Owens and Pitzer clusters including login and compute nodes, starting from 9AM Monday, July 11 2022.

Missing shared library of some mvapich2 modules

Updates on Feb 25 2022:

This issue is fixed.

Original Post:

Users may see an issue of missing shared library with some mvapich2 modules on Pitzer and Owens. The error is like

<path_to_executable>: error while loading shared libraries: libim_client.so.0: cannot open shared object file: No such file or directory

We are in the process of rebuilding mvapich2 versions that are affected.

Search form

Owens

Addressing CP2K 7.1 Memory Issues on Pitzer and Owens Clusters

Rolling reboots on all HPC systems starting Oct 31 2024

CP2K 6.1 Floating-point exception on Pitzer Cascade Lakes (48-core) node

Multi-node job hang with ORCA 5

Slurm to be Upgraded to Version 23.11.4

Running jobs requeued on all clusters

Rolling reboot of Ascend, Owens and Pitzer starting from Oct 25 2023

MPI_THREAD_MULTIPLE is not supported with OpenMPI-HPCX 4.x

Rolling reboot of Owens and Pitzer starting from July 11, 2022

Missing shared library of some mvapich2 modules

Pages

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links