Batch-Related Command Summary

This section summarizes two groups of batch-related commands: commands that are run on the login nodes to manage your jobs and commands that are run only inside a batch script. Only the most common options are described here.

Many of these commands are discussed in more detail elsewhere in this document. All have online manual pages (example: man qsub) unless otherwise noted.

In describing the usage of the commands we use square brackets [like this] to indicate optional arguments. The brackets are not part of the command.

Important note: The batch systems on Oakley and Glenn are entirely separate. Be sure to submit your jobs on a login node for the system you want them to run on. All monitoring while the job is queued or running must be done on the same system also. Your job output, of course, will be visible from both systems.

Commands for managing your jobs

These commands are typically run from a login node to manage your batch jobs. The batch systems on Oakley and Glenn are completely separate, so the commands must be run on the system where the job is to be run.

qsub

The qsub command is used to submit a job to the batch system.

Usage Desctiption Example
qsub [options] script Submit a script for a batch job. The options list is rarely used but can augment or override the directives in the header lines of the script.   qsub sim.job
qsub -t array_request [options] jobid Submit an array of jobs qsub -t 1-100 sim.job
qsub -I [options] Submit an interactive batch job qsub -I -l nodes=1:ppn=



qstat

The qstat command is used to display the status of batch jobs.

Usage Desctiption Example
qstat Display all jobs currently in the batch system. qstat
qstat [-a] jobid Display information about job jobid. The -a flag uses an alternate format. qstat -a 123456
qstat -f jobid Display full status information about job jobid. qstat -f 123456
qstat -u username [-f] Display information about all the jobs belonging to user username. qstat -u usr1234

qdel

The qdel command may be used to delete a queued or running job.

Usage Description Example
qdel jobid Delete job jobid. qdel 123456

qpeek

The qpeek command may be used to look at the output log file (stdout) or error log file (stderr) of a running job.

Usage Description Example
qpeek jobid Display the current contents of the output log file (stdout) for job jobid. qpeek 1234567
qpeek -e jobid Display the current contents of the error log file (stderr) for job jobid. qpeek -e 1234567
qpeek -h [-e] jobid Display just the beginning (“head”) of the file. qpeek -h 123456
qpeek -t [-e] jobid Display just the end (“tail”) of the file. qpeek -t 123456
qpeek -f [-e] jobid Display the end of the file and keep listening (“tail -f”). qpeek -f 123456

qalter

The qalter command may be used to modify the attributes of a queued (not running) job. Not all attributes can be altered.

Usage Description Example
qalter [option] jobid Alter one or more attributes a queued job. The options you can modify are a subset of the directives that can be used when submitting a job. qalter -l mem=47gb 123456

qhold, qrls

The qhold command allows you to place a hold on a queued job. The job will be prevented from running until you release the hold with the qrls command.

Usage Description Example
qhold jobid Place a user hold on job jobid. qhold 123456
qrls jobid Release a user hold previously placed on job jobid. qrls 123456

showstart

The showstart command tries to estimate when a queued job will start running. It is extremely unreliable, often making large errors in either direction.

Usage Description Example
showstart jobid Display estimate of start time. showstart 123456

showq

The showq command lists jobs from the point of view of the Moab scheduler.

Usage Description Example
showq List all jobs currently in the batch system. showq
showq -i List idle jobs that are eligible to run. showq -i
showq -r List running jobs. showq -r
showq -b List blocked jobs. showq -b
showq -u username List all jobs belonging to user username. showq -u usr1234

pdsh or all

The pdsh (on Oakley) or all (on Glenn) command can be used to monitor a running job by executing a command on all the nodes assigned to the job and returning  the results. It is primarily used with parallel jobs. The commands that are run should be quick and simple to avoid interfering with the job. Two useful commands used with pdsh or all are uptime, which displays system load, and free, which gives memory usage; see also the man pages for these commands.

Usage Description Example
pdsh -j jobid cmd Run cmd on all the nodes on which jobid is running. Oakley only.

pdsh -j 123456 uptime

pdsh -j 123456 free -m

all -j jobid cmd Run cmd on all the nodes on which jobid is running. Glenn only.

all -j 123456 uptime

all -j 123456 free -m

Commands used only inside a batch job

These commands can only be used inside a batch job.

mpiexec

Use the mpiexec command to run a parallel program or to run multiple processes simultaneously within a job. It is a replacement program for the script mpirun, which is part of the mpich package.

The OSC version of mpiexec is customized to work with our batch environment. There are other mpiexec programs in existence, but it is imperative that you use the one provided with our system.

Usage Description Example
mpiexec progname [args] Run the executable program progname in parallel, with as many processes as there are processors (cores) assigned to the job (nodes*ppn).

mpiexec myprog

mpiexec yourprog abc.dat 123

mpiexec -pernode progname [args] Run only one process per node. mpiexec -pernode myprog
mpiexec -npernode num progname [args] Run the specified number of processes on each node. mpiexec -npernode 3 myprog
mpiexec -tv [options] progname [args] Run the program with the TotalView parallel debugger.

mpiexec -tv myprog

mpiexec -n num progname [args]

mpiexec -np num progname [args] Run only the specified number of processes. (-n and -np are equivalent.) Does not spread processes out evenly across nodes. mpiexec -n 3 myprog

pbsdcp

The pbsdcp command is a distributed copy command for the PBS environment. It copies files to or from each node of the cluster assigned to your job. This is needed when copying files to directories which are not shared between nodes, such as $TMPDIR.

Options are -r for recursive and -p to preserve modification times and modes.

Usage Description Example
pbsdcp [-s] [options] srcfiles  target “Scatter”. Copy one or more files from shared storage to the target directory on each node (local storage). The -s flag is optional.

pbsdcp -s infile1 infile2 $TMPDIR

pbsdcp model.* $TMPDIR

pbsdcp -g [options] srcfiles  target “Gather”. Copy the source files from each node to the shared target directory. Wildcards must be enclosed in quotes. pbsdcp -g '$TMPDIR/outfile*' $PBS_O_WORKDIR

Note: In gather mode, if files on different nodes have the same name, they will overwrite each other. In the -g example above, the file names may have the form outfile001, outfile002, etc., with each node producing a different set of files.

ja

The ja command prints job accounting information from inside a PBS job. This includes aggregate CPU time, memory, virtual memory, and walltime. Note: The same information is available from qstat -f while the job is running.

Usage Description Example
ja Print job accounting information inside a PBS job. ja