Software
Multi-node job hang with ORCA 5
You may experience a multi-node job hang if the job runs into a module that requires heavy I/O, e.g., MP2 or CCSD. Additionally, it potentially leads to our GPFS performance issue. We have identified the issue as related to the MPI I/O issue of OpenMPI 4.1. To remedy this, we will take the following procedures:
Possible job failures due to MPI library change on Pitzer after May 20
There are changes on MPI libraries on Pitzer after May 20. We will upgrade MOFED from 4.9 to 5.6 and recompile all OpenMPI and Mvapich2 against the newer MOFED version. Users with their own MPI libraries may see job failures and will need to rebuild their applications linked against the MPI libraries.
ORCA Bind to CORE Failure
The default CPU binding for ORCA jobs can fail sporadically. The failure is almost immediate and produces a cryptic error message, e.g.:
MPI_THREAD_MULTIPLE is not supported with OpenMPI-HPCX 4.x
A threading code with MPI where MPI_Init_thread uses MPI_THREAD_MULTIPLE will fail because UCX from HPCX package is built without enabling multi-threading. UCX is the default framework for OMPI 4.0 and above.
Affects versions
Owens: openmpi/4.0.3-hpcx, openmpi/4.1.2-hpcx, penmpi/4.1.4-hpcx
Ascend: openmpi/4.1.3
Frame Renderer - Maya Ondemand app issue
There are currently issues with the Frame Renderer (Maya) interacitve app in ondemand.osc.edu
OSC staff are working with the vendor to resolve the issue
Please contact oschelp@osc.edu if there are questions.
[Resolved: May 1, 2023]
weld predictor - slurm account error
Updated on 09/08/2022:
Users can choose the project code from a dropdown list to use.
Original Post:
Users of weld predictor software in ondemand will receive an error when trying to start these jobs.
The error shows:
Simulation failed to be submitted ... sbatch:error job failed to be submitted Job invalid: must specify account for job sbatch error
Workaround
Create file ~/.slurm/defaults
and add entry for account.
Security vulnerabilities on ARM Forge versions prior to 22.0.x
ARM identified security vulnerabilities on ARM Forge versions prior to 22.0.x as follow:
Jupyter security issue Aug. 13, 2021
Please do not run any Jupyter applications at OSC until further notice due to a security vulnerability.
OSC will update JupyterLab and Jupyter Notebook applications to rectify this as soon as possible.
List of versions changed:
- 0.35: removed because there is no official patch release.
- 1.2: upgraded to 1.2.21
- 2.1: replaced with 2.2.10 because there is no official patch release.
- 3.0: upgraded to 3.0.17
References for more information:
Singularity: reached your pull rate limit
You might encounter an error while pulling a large Docker image:
ERROR: toomanyrequests: Too Many Requests.
or