If you run a parallel (or even serial!) job, but not using all the cpus per node, the number of processes and their distribution is OK, but on the first node it appears they are all pinned to 1 cpu.
On Owens, usage of user-defined material (UMAT) script for abaqus is limited as following:
abaqus 2017: correctly running on single and multi-nodes
abaqus 6.14 and 2016: correctly running only on a single node
Resolved: We confirmed that this is an issue on the old versions only. The software page indicates this issue.
There is a bug with VASP 5.4.1 built with mvapich2/2.2 on Owens such that the VASP job with out-of-memory issue crashes the Owens compute node(s). We will investigate monitoring for this type of jobs so that we can cleanup after the job more efficiently, and notify the user of their problem more quickly.
LAMMPS 14May16 on Owens can hang when using the velocity command. Inputs that hang on Owens work on Oakley and Ruby. LAMMPS 31Mar17 on Owens also works. Here is an example failing input snippet:
velocity mobile create 298.0 111250 mom yes dist gaussian
Update: We think this is fixed. Please submit a ticket if you encounter further problems.
As a result of updates made during yesterday's downtime, software built with mvapich2/1.7 is failing with the error:
libibumad.so.2: cannot open shared object file: No such file or directory
We're working on fixing the problem.
LAMMPS 14May16 was built with the USER-OMP package on Oakley, Ruby, and Owens. Its default behavior is to spawn too many OpenMP threads. lammps/14May16 batch scripts that do not use the USER-OMP package should set the OMP_NUM_THREADS environment variable to 1 as a workaround, e.g.:
for Bourne type shells and
setenv OMP_NUM_THREADS 1
for C type shells.
Some MVAPICH2 MPI installations on Oakley, Ruby, and Owens, such as the default module mvapich2/2.2 as well as mvapich2/2.1, appear to have a bug that is triggered by certain programs. The symptoms are 1) the program hangs or 2) the program fails with an error related to Allreduce or Bcast.
Intel compiler versions 11 and later on Glenn do not support the -mkl compiler options. Contact firstname.lastname@example.org for workarounds.
Typical symptoms are
../fft3d.h(184): catastrophic error: cannot open source file "mkl_dfti.h" ld: cannot find -lmkl_intel_lp64
NAMD 2.11 precompiled binaries do not work. Please use NAMD 2.11 installed from the source and available via module namd/2.11.
The NAMD 2.11 issue involves changes to the command charmrun in Charmm++.
A typical symtom is:
Batch scripts loading module lammps-7Dec15 should use the user's login shell or
the Korn shell, e.g. #PBS -S /bin/ksh
There is a bug that causes the module load to fail if the job script specifies the Bash shell, i.e.:
#PBS -S /bin/bash
Alternatiely you can unload mpi and intel-10.x before loading the lammps module.
An example failure is: