The default CPU binding for ORCA jobs can fail sporadically. The failure is almost immediate and produces a cryptic error message, e.g.
$ORCA/orca h2o.in . . . -------------------------------------------------------------------------- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node: o0033 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -------------------------------------------------------------------------- . . . [file orca_tools/qcmsg.cpp, line 465]: .... aborting the run
Affeced versions
At least through 5.0.4
Workarounds
Three workarounds are known. Invoke ORCA without CPU binding:
$ORCA/orca h2o.in "--bind-to none"
Use a non HPCX OpenMPI module with ORCA:
module load openmpi/4.1.2-tcp orca/5.0.4 $ORCA/orca h2o.in
Use more SLURM ntasks relative to ORCA nprocs which does not prevent the failure but merely reduces it's likelyhood:
#SBATCH --ntasks=10 cat << EOF > h2o.in %pal nprocs 5 end . . . EOF $ORCA/orca h2o.in
Note that each workaround can have performance side effects, and the last workaround can have direct charging consequences. We recommend that users benchmark their jobs to gauge the most desirable approach.