Software

libibumad.so.2 missing on Oakley

Update:  We think this is fixed.  Please submit a ticket if you encounter further problems.

 

As a result of updates made during yesterday's downtime, software built with mvapich2/1.7 is failing with the error:

libibumad.so.2: cannot open shared object file: No such file or directory

We're working on fixing the problem.

Problems with LAMMPS 14May16

LAMMPS 14May16 was built with the USER-OMP package on Oakley, Ruby, and Owens. Its default behavior is to spawn too many OpenMP threads. lammps/14May16 batch scripts that do not use the USER-OMP package should set the OMP_NUM_THREADS environment variable to 1 as a workaround, e.g.:
export OMP_NUM_THREADS=1
for Bourne type shells and
setenv OMP_NUM_THREADS 1
for C type shells.

Problems with MVAPICH2

Some MVAPICH2 MPI installations on Oakley, Ruby, and Owens, such as the default module mvapich2/2.2 as well as mvapich2/2.1, appear to have a bug that is triggered by certain programs.  The symptoms are 1) the program hangs or 2) the program fails with an error related to Allreduce or Bcast.

Glenn module lammps-7Dec15 bug

Batch scripts loading module lammps-7Dec15 should use the user's login shell or

the Korn shell, e.g. #PBS -S /bin/ksh

There is a bug that causes the module load to fail if the job script specifies the Bash shell, i.e.: 

#PBS -S /bin/bash

Alternatiely you can unload mpi and intel-10.x before loading the lammps module.

An example failure is:

Issue when loading multiple Fluent or ANSYS modules simultaneously

Due to the way our Fluent and ANSYS modules are configured, simultaneously loading multiple of either module will cause a cryptic error.  The most common case of this happening is when multiple of a users jobs are started at the same time and all load the module at once.  In order for this error to manifest, the modules have to be loaded at precisely the same time; a rare occurrence, but a probable occurrence over the long term.  

 

If you encounter this error you are not at fault.  Please resubmit the failed job(s).

Certain modules not accessible

Certain modules are not working for all clusters since the downtime.  We have reports specifically that Amber, Gaussian, and Turbomole are not working.  We are working to resolve the issue, but until then jobs using Amber, Gaussian, Turbomole, or other software with permissions restrictions will not work.

Pages