Abaqus Parallel Job Failure with PMPI Due to Out-of-Memory (OOM) Error

Category: 
Resolution: 
Resolved

You may encounter the following error while running an Abaqus parallel job with PMPI:

Traceback (most recent call last):
 File "SMAPylModules/SMAPylDriverPy.m/src/driverAnalysis.py", line 263, in run
 File "SMAPylModules/SMAPylDriverPy.m/src/driverExplicit.py", line 214, in analyze
 File "SMAPylModules/SMAPylDriverPy.m/src/driverExplicitMPI.py", line 36, in runXpl
 File "SMAPylModules/SMAPylDriverPy.m/src/driverPhase.py", line 575, in run
 File "SMAPylModules/SMAPylDriverPy.m/src/driverPhase.py", line 567, in _run
 driverExceptions.AbaqusExecutionError: ('Abaqus/Explicit Analysis', 255, 'knee_bolster_nsm')
 slurmstepd: error: Detected 1 oom_kill event in StepId=2822.batch. Some of the step tasks have been OOM Killed.

Cause of the Error

This error occurs because the job is terminated due to the MPI process abnormally running out of memory. This triggers an Out-of-Memory (OOM) event, leading to Slurm job termination.

Affected versions

2022 and 2024

Workaround

Switch to the default MPI implementation (IntelMPI) to run the job. This avoids the memory issue associated with PMPI.