OSC Configuration and Tools for PBS
This page attempts to describe the configurations which OSC has made
to OpenPBS on the Center's
Linux clusters, as well as some tools and scripts developed at OSC
for making this combination easier to deal with. They are provided
with the hope that they will be helpful to those at other sites who
use these two software packages. Many of the tools presented here
will also work with PBS Pro and
TORQUE.
-
Patches to OpenPBS source
-
These have moved to a separate page maintained by
Pete Wyckoff, who does most of the source-level work on PBS at
OSC.
-
mpiexec
-
This is a C program that needs to be compiled and placed in
/usr/local/bin. It is a replacement for mpirun
that used's PBS's task manager interface rather than rsh
to spawn processes. It is also responsible for setting up the
necessary environment for various MPI implementations. The code
also includes a patch to PBS to improve the characteristics of the
task manager interface.
-
queue structure
-
This is the basic queue configuration we use on virtually all of
our cluster systems. It has a default routing queue called
"batch" that feeds into three execution queues: "serial",
"parallel", and "dedicated".
-
PBS tools
-
These are a set of PBS utilities that OSC has developed over the
years, including a distributed copy command (pbsdcp) and
tools for doing statistic and data mining analysis on job
information. Several of the tools rely on a parallel rsh wrapper
called all, and the data mining tools use a MySQL DB back
end.
-
prologue and epilogue
-
These are scripts which would be placed in
${PBS_HOME}/mom_priv and run as root by PBS either before
or after a job, respectively. In OSC's case, prologue
creates a unique temporary directory on each node assigned to a
job before the job begins to run, and epilogue deletes
that directory after the job completes. (Note that having a
separate temporary directory on each node is probably not as good
as having a good, high performance parallel filesystem.)
-
tmpdir.sh and tmpdir.csh
-
These are scripts which would be placed in /etc/profile.d
and/or sourced by users' shells. They place the name of the
temporary directory created by the prologue script above
in the environment variable TMPDIR. This simulates the
behavior of NQE on Cray and SGI systems.