parallel-command-processor

Introduction

There are many instances where it is necessary to run the same serial program many times with slightly different input. Parametric runs such as these either end up running in a sequential fashion in a single batch job, or a batch job is submitted for each parameter that is varied (or somewhere in between.) One alternative to this is to allocate a number of nodes/processors to running a large number of serial processes for some period of time. The command parallel-command-processor allows the execution of large number of independent serial processes in parallel. parallel-command-processor works as follows: In a parallel job with N processors allocated, the PCP manager process will read the first N-1 commands in the command stream and distribute them to the other N-1 processors. As processes complete, the PCP manager will read the next one in the stream and send it out to an idle processor core. Once the PCP manager runs out of commands to run, it will wait on the remaining running processes to complete before shutting itself down.

 

Availability

Parallel-Command-Processor is available for Oakley, Ruby, and Owens for all users.

 

Usage

Here is an interactive batch session that demonstrates the use of parallel-command-processor with a config file, pconf. pconf contains several lines of simple commands, one per line. The output of the commands were redirected to individual files.

-bash-3.2$ qsub -I -l nodes=2:ppn=4 
-bash-3.2$ cd $PBS_O_WORKDIR 
-bash-3.2$ cp pconf $TMPDIR
-bash-3.2$ cd $TMPDIR
-bash-3.2$ cat pconf
ls / > 1 
ls $TMPDIR > 2 
ls $HOME > 3 
ls /usr/local/ > 4 
ls /tmp > 5 
ls /usr/src > 6 
ls /usr/local/src > 7
ls /usr/local/etc > 8 
hostname > 9 
uname -a > 10 
df > 11
-bash-3.2$ module load pcp
-bash-3.2$ mpiexec parallel-command-processor pconf
-bash-3.2$ pwd
/tmp/pbstmp.1371894 
-bash-3.2$ mpiexec -ppn=1 ls -l $TMPDIR 
854 total 16 
-rw------- 1 yzhang G-3040 1082 Feb 18 16:26 11
-rw------- 1 yzhang G-3040 1770 Feb 18 16:26 4 
-rw------- 1 yzhang G-3040 67 Feb 18 16:26 5
-rw------- 1 yzhang G-3040 32 Feb 18 16:26 6 
-rw------- 1 yzhang G-3040 0 Feb 18 16:26 7 
855 total 28
-rw------- 1 yzhang G-3040 199 Feb 18 16:26 1
-rw------- 1 yzhang G-3040 111 Feb 18 16:26 10
-rw------- 1 yzhang G-3040 12 Feb 18 16:26 2
-rw------- 1 yzhang G-3040 87 Feb 18 16:26 3 
-rw------- 1 yzhang G-3040 38 Feb 18 16:26 8
-rw------- 1 yzhang G-3040 20 Feb 18 16:26 9
-rw------- 1 yzhang G-3040 163 Feb 18 16:25 pconf 
-bash-3.2$ exit

As the command "mpiexec -ppn=1 ls -l $TMPDIR" shows, the output files are distributed on the two nodes. In a batch file, pbsdcp can be used to distribute-copy the files to $TMPDIR on all nodes of the job and gather output files once execution has completed. This step is important due to the load that executing many processes in parallel can place on the user home directories.

Here is a slightly more complex example showing the usage of parallel-command-processor and pbsdcp:

#PBS -l nodes=13:ppn=4 
#PBS -l walltime=1:00:00 
#PBS -S /bin/bash 
#PBS -N blast-PCP 
#PBS -j oe 
date

module load biosoftw 
module load blast

set -x

cd $PBS_O_WORKDIR 
pbsdcp query/query.fsa.* $TMPDIR 
pbsdcp db/rice.* $TMPDIR 
cd $TMPDIR

for i in $(seq 1 49)

do 
      cmd="blastall -p blastn -d rice -i query.fsa.$i -o out.$i" 
      echo ${cmd} >> runblast 
done

module load pcp
mpiexec parallel-command-processor runblast

mkdir $PBS_O_WORKDIR/output 
pbsdcp -g out.* $PBS_O_WORKDIR/output

date

Documentation

The parallel-command-processor command is documented as a man page: man parallel-command-processor

 

Service: