Darshan

Darshan is a lightweight "scalable HPC I/O characterization tool".  It is intended to profile I/O by emitting log files to a consistent log location for systems administrators, and also provides scripts to create summary PDFs to characterize I/O in MPI-based programs.

Availability and Restrictions

Versions

The following versions of Darshan are available on OSC clusters:

version Owens
3.1.2 X

Access 

Darshan is available to all OSC users without restriction.

Usage on Owens

Setup on Owens

To configure the Owens cluster for Darshan use the following commands:

module load darshan

Darshan is only supported for the following compiler and MPI implementations:

gnu/4.8.5  mvapich2/2.2
gnu/4.8.5  mvapich2/2.2rc1
gnu/4.8.5  openmpi/1.10
intel/16.0.3  intelmpi/5.1.3
intel/16.0.3  mvapich2/2.2
intel/16.0.3  mvapich2/2.2rc1
intel/16.0.3  openmpi/1.10

Batch Usage on Owens

Batch jobs can request mutiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations for Owens, and Scheduling Policies and Limits for more info. 

If you have an MPI-based program the syntax is as simple as

# basic call to darshan 
mpiexec.darshan [args] ./my_mpi_program
# to show evidence that Darshan is working and to see internal timing
mpiexec.darshan.timing [args] ./my_mpi_program
An Example of Using Darshan with MPI-IO

Below is an example batch script ( mpiio_with_darshan.qsub ) for understanding MPI-IO, see this resource for a detailed explanation: http://beige.ucs.indiana.edu/I590/node29.html.  The C program examples have each MPI task write to the same file at different offset locations sequentially.  A serial version (1 processor writes to 1 file) is included and timed for comparison.  Because the files generated here are large scratch files there is no need to retain them.

  1. Load the Darshan module
  2. Create a function as a build and run harness for each case
  3. Run the cases
  4. Generate PDF reports (using pdflatex and supporting files) from binary log files.
  5. Check job output and read PDF reports.
#!/bin/bash
#PBS -l nodes=1:ppn=28:pfsdir
#PBS -j oe
# one may need to perform 'shopt -s expand_aliases' inside the shell before calling this script to expand darshan aliases
shopt -s expand_aliases
ml r --quiet
module load darshan
ml
function run_darshan() {
COMPILER=mpicc
MPIPROCS=$1
PROGNAME=$2
BLOKSIZE=$3
LOCATION=$4
cp ${DARSHAN_EXAMPLE}/${PROGNAME}.c $LOCATION
cd $LOCATION
$COMPILER -DDEBUG -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -o $PROGNAME ${PROGNAME}.c
mpiexec_darshan() { eval mpiexec.darshan "$@"; }
mpiexec_darshan -n $MPIPROCS ./$PROGNAME -f test -l $BLOKSIZE
du -sh test # show the human-readable file size
rm test # we are not keeping the large file around
}
# run the parallel version of mpi-io to write to a file but use the local tmp directory
run_darshan $PBS_NP mkrandpfile 256 $TMPDIR
# run the parallel version of mpi-io to write to a file but use the parallel file system scratch location
run_darshan $PBS_NP mkrandpfile 256 $PFSDIR
# for each darshan log generate a PDF report
JOBID=$([[ $tmp =~ ([0-9]*)\..* ]]; echo "${BASH_REMATCH[1]}")
for log in $(ls $DARSHAN_LOG/$(date "+%Y/%-m/%-d")/${USER:0:8}_*_id${JOBID}*.darshan); do
darshan-job-summary.pl $log
done
cp *.pdf $PBS_O_WORKDIR

In order to run it via the batch system, submit the mpiio_with_darshan.qsub   file with the following command:

qsub mpiio_with_darshan.qsub

Further Reading

http://www.mcs.anl.gov/research/projects/darshan/

Service: 
Technologies: 
Fields of Science: