Supercomputing |
Supercomputing EnvironmentsUsing the Parallel File System on the OSC ClustersOne of the back-end services provided by OSC's mass storage environment is a parallel file system, intended for use as high-performance, high-capacity shared temporary space. The parallel file system is supported by sixteen dedicated storage nodes, each with two 2.4 GHz Pentium 4 processors, 4 GB of memory, two Fibre Channel interfaces, and Gigabit Ethernet and InfiniBand network interfaces. The aggregate capacity of this parallel file system is approximately 128 terabytes. The software used to create the parallel file system is PVFS2 from Clemson University and Argonne National Laboratory.
Getting StartedThe parallel file system is currently accessible from selected nodes on the following systems:
On nodes where the parallel file system is accessible, it will be mounted at /fs/pvfs. These nodes will be identified to the PBS batch system by having the node attribute pvfs. Files and directories on the parallel file system can be manipulated as on any other UNIX-style file system, so commands like cd, mkdir, cp, ls and so on will work on the parallel file system. To access the parallel file system from a batch job, you'll need to tell the batch system you intend to use it by adding a pvfs attribute to your job's nodes= request: #PBS -l nodes=2:ppn=2:pvfs In a batch job which requests the pvfs node attribute, there will be an additional environment variable set called $PFSDIR; this is similar to $TMPDIR in that it is a directory that only exists for the duration of the job, but it resides on the parallel file system and is accessible by all the nodes in your job (as opposed to $TMPDIR which is private to each node). Using the Parallel File System for Serial JobsFor serial jobs requiring large (>50GB) amounts of scratch space, the parallel file system should be used in place of locally attached temporary space. In these cases, the job should use $PFSDIR instead of $TMPDIR as its working directory. Here is an example: #PBS -N bigfile #PBS -j oe #PBS -l nodes=1:ppn=2:pvfs #PBS -l walltime=10:00:00 cd myscience cp input.dat $PFSDIR cd $PFSDIR $HOME/myscience/bigfileapp cp output.dat $HOME/myscience For serial programs doing block (binary or unformatted) I/O to the parallel file system, transfer rates of up to 60 MB/s have been observed. For character I/O (eg. Fortran formatted I/O or C printf()), transfer rates should be approximately 10-15 MB/s. Using the Parallel File System for MPI Parallel JobsThe MPI-2 specification includes a section on parallel I/O, and most MPI implementations (including the MPICH/ch_gm implementation used on OSC's clusters) implements that interface. As a result, MPI programs on OSC's clusters can use the MPI parallel I/O interface (MPI_File_*()) to acheive higher I/O performance. The parallel file system is specifically tuned for this type of use. Here is an example of a parallel job using the parallel file system: #PBS -N mpi-io #PBS -j oe #PBS -l nodes=8:ppn=2:pvfs #PBS -l walltime=24:00:00 cd $HOME/myscience pbsdcp parallel-io-app $TMPDIR cp input.dat $PFSDIR cd $PFSDIR mpiexec $TMPDIR/parallel-io-app cp output.dat $HOME/myscience Note that in this example, the executable run by the job is stored in $TMPDIR on each node, but the working directory for the program is $PFSDIR. Executables should not be stored on the parallel file system. Caveats for Using the Parallel File SystemHere are a few things to keep in mind when using the paralle file system:
Links to More InformationOSC's Science and Technology Support Group has developed a workshop on parallel I/O techniques, including the use of the MPI-2 parallel I/O interface. The MPI-2 parallel I/O interface is also discussed in the PACS Intermediate MPI asynchronous course. The PVFS2 website has links to several articles about the file system software. |
