OSC Configuration and Tools for PBS

This page attempts to describe the configurations which OSC has made to OpenPBS on the Center's Linux clusters, as well as some tools and scripts developed at OSC for making this combination easier to deal with. They are provided with the hope that they will be helpful to those at other sites who use these two software packages. Many of the tools presented here will also work with PBS Pro and TORQUE.

Patches to OpenPBS source
These have moved to a separate page maintained by Pete Wyckoff, who does most of the source-level work on PBS at OSC.
mpiexec
This is a C program that needs to be compiled and placed in /usr/local/bin. It is a replacement for mpirun that used's PBS's task manager interface rather than rsh to spawn processes. It is also responsible for setting up the necessary environment for various MPI implementations. The code also includes a patch to PBS to improve the characteristics of the task manager interface.
queue structure
This is the basic queue configuration we use on virtually all of our cluster systems. It has a default routing queue called "batch" that feeds into three execution queues: "serial", "parallel", and "dedicated".
PBS tools
These are a set of PBS utilities that OSC has developed over the years, including a distributed copy command (pbsdcp) and tools for doing statistic and data mining analysis on job information. Several of the tools rely on a parallel rsh wrapper called all, and the data mining tools use a MySQL DB back end.
prologue and epilogue
These are scripts which would be placed in ${PBS_HOME}/mom_priv and run as root by PBS either before or after a job, respectively. In OSC's case, prologue creates a unique temporary directory on each node assigned to a job before the job begins to run, and epilogue deletes that directory after the job completes. (Note that having a separate temporary directory on each node is probably not as good as having a good, high performance parallel filesystem.)
tmpdir.sh and tmpdir.csh
These are scripts which would be placed in /etc/profile.d and/or sourced by users' shells. They place the name of the temporary directory created by the prologue script above in the environment variable TMPDIR. This simulates the behavior of NQE on Cray and SGI systems.