HOWTO

Our HOWTO collection contains short tutorials that help you step through some of the common (but potentially confusing) tasks users may need to accomplish, that do not quite rise to the level of requiring more structured training materials. Items here may explain a procedure to follow, or present a "best practices" formula that we think may be helpful.

Service: 

HOWTO: Add python packages using the conda package manager

While our python installations come with many popular packages installed, you may come upon a case where you need an addiditonal package that is not installed.  If the specific package you are looking for is available from Anaconda.org (formerlly binstar.org) you can easily install it and required dependencies by using the Conda package manager.

To be able to install a package using the conda package manager:

  • You must use a Anaconda distribution of python:
    • On Oakley the following modules:
      • python/2.7.8, python/3.4.2
    • On Ruby:
      • python/2.7.8, python/3.4.2
  • Package must be available through Anaconda.org

 

Procedure

We will install the yt package to a local directory in this example.

Load proper python module

module load python/2.7.8

Clone python installation to local directory

conda create -n local --clone="$PYTHON_HOME"

Activate clone environment

source activate local

Install package

conda install yt
  • Replace yt with the name of the package you want to install, as listed by anaconda.org.
If there are errors on this step you will need to resolve them before continuing.

Test python package

Now we will test our installed python package by loading it in python and checking its location to ensure we are using the correct version.

python -c "import yt;print yt.__file__"

Output:

/nfs/12/osu8968/.conda/envs/local/lib/python2.7/site-packages/yt/__init__.py
  • Replace both instances of yt with the name of the package you installed.
Remember, you will need to load the proper version of python before you go to use your newlly installed package.  Packages are only installed to one version of python.

HOWTO: Configure the MATLAB Parallel Computing Toolbox

Introduction

The MATLAB Parallel Computing Toolbox and Distributed Computing Server are designed to allow users to create and launch parallel MATLAB jobs on a cluster of compute nodes.  It also allows users to remotely connect to OSC resources, whether to run parallel jobs in MATLAB or to use toolboxes for which users own their own licenses.  This guide will explain the basics of how to configure your Parallel Computing Toolbox for OSC systems.

Versions

The following versions of the MATLAB Parallel Computing Toolbox are supported at OSC:

VERSION OAKLEY
R2013a X
R2013b X
R2014a X
R2014b X
R2015a X
R2015b X

Usage Overview

When you use the MATLAB Parallel Computing Toolbox, you have a MATLAB client session and one or more MATLAB workers.  The client session may run on your laptop/desktop computer ("remote client") or (for OSU users only) it may run on an OSC login, OnDemand, or compute node.  The MATLAB workers always run on the OSC cluster as part of a batch job.  In the client session you will run MATLAB commands to set up and run a batch job, and MATLAB submits the job for you.

This document describes how to perform a computation on a single worker (one processor) or on multiple workers using a script with a "parfor" loop.

Licensing issues

You must have a license for MATLAB and the MATLAB PCT to allow you to run the MATLAB client session.  OSC has licenses for the MATLAB Distributed Computing Server, which covers the MATLAB workers running on the cluster.

OSC is included in the OSU site license for MATLAB, so OSU users have the option of running the client session on an OSC machine.  You can manage and view your MATLAB licenses using the Mathworks License Center.

Remote client

If you run your MATLAB client session on your local computer, it is considered a remote client.  You will be able to use any toolboxes that you have a license for by submitting batch jobs from your MATLAB client.  The batch jobs you submit from your MATLAB client through the PCT will be able to utilize functions such as "parfor" and "spmd" to run your code in parallel, in addition to being able to run normal MATLAB code.

Client running on Oakley

Because of licensing issues this option applies only to OSU users.  Through the MATLAB installations on Oakley, you will be able to take advantage of the interactive capabilities in the MATLAB PCT, as well as submitting batch jobs through the MATLAB client sessions.  OnDemand or VNC is needed to use this (the MATLAB PCT is GUI only at this time), but it will allow you to debug your parallel jobs using various tools in real time.  Jobs submitted through the client sessions on Oakley behave the same as jobs submitted by qsub, but can utilize the PCT functions such as "parfor" and "spmd".

Performance limitations

Parallel MATLAB on a cluster has a lot of overhead.  It may not give you the speedup you expect, especially if you're solving small problems.  Another consideration is that the MATLAB workers are single-threaded.  They don't take advantage of the multithreading built into many MATLAB functions.

Download and Install the Configuration Files

The first step is to download the necessary configuration files to the computer where you will be running the MATLAB client session.

    Click the link below to download the files.

    OSCMatlabPCT

    OSCMatlabPCT Configuration Resources -- Directory Structure

    The OSC MATLAB PCT Configuration package contains the following files and directories:

    OSCMatlabPCT (top-level directory)

    • config - A directory containing the cluster profiles and configuration functions for using the MATLAB PCT at  OSC.  Note: the only function that users should edit is the "addSubmitArgs" function, as this allows setting some additional batch options, such as walltime, email, and custom job name.  All the other functions have been specially prepared by OSC staff to allow MATLAB to work in harmony with the HPC system on Oakley.
    • launch - A directory containing two scripts, "client_session_script.m" for launching jobs and "reconnect_client_session.m" for reconnecting to a job.  These scripts illustrate key concepts of using the PCT when accessing our system remotely via the MATLAB client on your personal computer.  They also apply to submitting batch jobs from a MATLAB client running in OnDemand.  Both scripts are heavily commented and give usage details beyond what this document can provide.  Note that these scripts contain commands to be run in your client session; do not submit them using the batch command.
    • PCTtestfiles -  A directory containing parfor use-cases. The first, "eigtest", is an example of how to program an extremely simple entry function using parfor.  It simply computes the eigenvalues of multiple large, random matrices.  The second case, is an example of how to run a parallel Simulink simulation using parfor.  The entry function is "paralleltestv2", which calls the function "parsim" inside a parfor loop.  The "parsim" script contains commands that initialize Simulink and run the simulation on each of the parfor workers.

    Import and Configure the Cluster Profile

    These configuration steps need to be done once before you use PCT.  (If you upgrade to a new version of MATLAB you'll have to reconfigure.)  There are also command line options for performing these tasks.

    1. After launching MATLAB, click the "Parallel" dropdown menu from the "Environment" menu and select "Manage Cluster Profiles".  At this time, a new window should open displaying the Cluster Profile Manager.  


    Image of MATLAB Parallel Menu


    2. In the Cluster Profile Manager window, click the "Import" button and locate the "clusterProfiles" directory contained within the OSCMatlabPCT/config/clusterProfiles directory.  

    Cluster profiles are named according to filesystem configuration and version.  Select the file which corresponds to your current version of MATLAB and your filesystem configuration.  If you're running remotely on your laptop/desktop computer select a "NonShared" version.  If you're running on Oakley, select a "Shared" version.  Click "Open" to proceed.  


    Screen shot 2014-06-06 at 11.14.32 AM.png


    3. If you are using a shared filesystem configuration: No further modifications will need to be made to the cluster profile, and you can close the Cluster Profile Manager and continue with configuring your job. 

    If you are using a non-shared filesystem configuration: You will need to make some changes to your cluster profile before exiting the Cluster Profile Manager.  Click the "Edit" button to enable editing.


    Screen shot 2014-06-06 at 11.36.41 AM.png


    4. In the editing window under "Submit Functions", you should see two entries -- IndependentSubmitFcn and CommunicatingSubmitFcnIntel.  These fields contain cell arrays with three values: a handle to the submit function, the hostname of the cluster, and the remote job storage location where results files and logs will be stored.  You only need to change the remote job storage location.  Use an absolute path to a location in your OSC home directory where you'd like the output of your job (or jobs) to be retained.  


    Image of MATLAB Cluster Profile Properties

    Here is an example of the full syntax:

    {@independentSubmitFcn, 'oakley.osc.edu', '/my/home/dir/MATLAB'}

    {@communicatingSubmitFcnIntel, 'oakley.osc.edu', '/my/home/dir/MATLAB'}

    Note:  If you try to validate your configuration, the last test will always fail for remote clients, even if the configuration is correct.


    5. Include the configuration files in your MATLAB path.  Click on "Set Path", then "Add Folder".  Locate and select the OSCMatlabPCT\config directory.  Click "Select Folder" and "Save".  (Alternatively you can use the "addpath" command in the MATLAB command window.)


    Configure and Run a Batch Job

    The primary way to utilize the MATLAB PCT at OSC is to submit batch jobs from the MATLAB client, either on your local PC or on Oakley via OnDemand.  These instructions assume that you have already configured your MATLAB client as described above.

    IMPORTANT: MATLAB treats the output directory specified with the submit functions as SCRATCH file space, and will cleanup this directory after you have successfully retrieved your data.  However, in practice it has been observed that MATLAB sometimes cleans up this directory even if the commands are unsuccessful (such as in the case of a large file transfer).  To avoid data loss, please make sure your entry function/script copies any important output files to an alternate location for redundancy.  Or simply save your data to a directory of your choice.
    1. Write your script to be run on the cluster.  This can be just a serial script or it may contain parallel commands such as "parfor".  Some example scripts are provided in the directory OSCMatlabPCT/PCTtestfiles.  The example below uses "eigtest.m" as the script to be run.
    2. In your client session:  Using "client_session_script.m" as a guide, run the commands to connect to the cluster and launch a batch job.  We don't recommend that you run the script as written, at least at first.  Until you're familiar with using the PCT, simply copy commands out of the script and paste them into your MATLAB command window as needed.  There are two important functions involved in launching a job.  The command "parcluster" creates a cluster object in the MATLAB workspace, which is used to store information sent between the remote cluster and your MATLAB client.  The command "batch" begins an automated process that can connect to the Oakley cluster, submit a job to PBS, and initialize the MATLAB Distributed Computing Server (MDCS).  For more on the batch command, see the "client_session_script.m" file, and the MATLAB help.
    3. When your job has been successfully configured, you will be prompted for your username and password at OSC.  After entering this, your job will be submitted to the system, and assigned an OSC job ID.  Please note: as explained in the "client_session_script" example, MATLAB remembers your jobs on the cluster by the job index number, not the OSC job ID.  Also, if you are unable to qstat your job immediately after it is submitted, don't worry!  There seems to be some latency between the MATLAB MDCS and the batch system when it comes to exchanging job status info, but this should not affect your ability to track your job.  In fact, as shown in the example, there are several ways to get information about your jobs directly from your MATLAB client. After submission you can check your job's progress or retrieve results at any time (even after closing the MATLAB client), and you can continue using the MATLAB client for other work if you wish.
    4. Wait for your job to complete.  You may use the "wait" command in your client session or simply monitor your job's status with "qstat" on an Oakley login node to know when your job has completed.
    5. Reconnect after closing your MATLAB client session.  The script "reconnect_client_session.m" illustrates how to find your job information again if you restart your MATLAB client session.
    6. Retrieve your results.  The "load" or "fetchOutputs" command retrieves the results of your computations from your MATLAB working directory on Oakley after the job has finished.  They also report errors if your run failed.  "Load" is used for exporting the workspace from an entry script, and "fetchOutputs" is used for retrieving the specified output arguments of an entry function.  For large data, there is an important caveat: MATLAB does not utilize sftp when fetching your files, so large files might take a long time to retrieve.  Additionally, if your output arguments/workspace is over a certain size (around 2GB), using the "load" or "fetchOutputs" commands will give an "index out of range" error caused by the MDCS failing to save the output to file.  This is due to an internal limitation in the default version of MATLAB's ".mat" files.  However, you may work around this limitation by manually saving your workspace or arguments in your entry script, using the '-v7.3' switch in the "save" command.

    Specifying Additional PBS Options for Your Batch Job

    The submit functions are MATLAB functions which serve two primary purposes: to create the appropriate connections to the cluster where your job will run, and to procedurally generate and submit a shell script to the queue on Oakley.  Your nodes and cores (ppn) are determined by the number of MATLAB workers you request in your submit script, however to change or add additional arguments or resource requests you must add the PBS commands to the string in the function "addSubmitArgs" (located in config).  The default arguments in the string are for a walltime of 1:00:00, a job name "default", and emails sent on abort, begin, and end of the job.  This function is called in the submit functions, so advanced users may choose to do what they wish with how this file is called in the submit functions to streamline their workflow if multiple configurations are needed.

    Configure and Run an Interactive Job

    There are a couple useful ways for OSU users to use the interactive features of the MATLAB PCT.  Because of our license terms, this option is available only to OSU faculty, staff, and students.

    Using OnDemand with a Shared Cluster Profile

    OSU users may take advantage of the MATLAB licenses on Oakley to submit interactive "parpool" (R2013b or later) or "matlabpool" (R2013a) jobs the batch system.  These MATLAB PCT functions allow you to run parallel functions such as "parfor" or "spmd" directly from command line or a script, and allow users to debug or interact with parallel jobs in real time.  OSC offers this functionality in MATLAB clients running through the OnDemand portal.  In this method, simply start on Oakley desktop session in OnDemand, open a terminal window, load the correct module, and start MATLAB.  Be sure the OSCMatlabPCT/config directory is in your path.  You will then import the correct shared cluster profile for your version of MATLAB, and use the "parpool" (R2013b or later) or "matlabpool" (R2013a) command to initialize an automatic parallel pool. This might take a while to set up, as the pool request must go throught the batch system like everything else, so be patient!  After a couple minutes or more (depending on the resource request), your pool will connect to the MATLAB workers, and you may begin running parallel commands. This will allow simulation of how jobs will behave when submitted to the MDCS as a batch job.

    Using VNC with the "local" Cluster Profile

    Another way to run an interactive parallel job using the MATLAB PCT is to submit an Interactive Batch job from using qsub, and run MATLAB on a compute node using VNC to utilize the GUI.  You can then import the "local" cluster profile, which runs the workers directly on the node.  Only one node can be used, but workers will run on multiple processors on the node.  This can be used in situations when using OnDemand is not ideal, but users should carefully follow the directions provided so as to not leave VNC processes behind on the node.  Directions for using VNC in a batch job are located here: https://osc.edu/documentation/howto/use-vnc-in-a-batch-job">https://osc.edu/documentation/howto/use-vnc-in-a-batch-job.

    Additional Documentation

    www.mathworks.com/index.html?s_tid=gn_logo">http://www.mathworks.com/index.html?s_tid=gn_logo">Mathworks has written extensive documentation covering all of the public functions within the Parallel Computing Toolbox. The PCT homepage is located at www.mathworks.com/products/parallel-computing/">http://www.mathworks.com/products/parallel-computing/">The Mathworks Parallel Computing Toolbox Homepage.

    For more information about how to construct and run an independent job, see the Mathworks documentation page "www.mathworks.com/help/distcomp/program-independent-jobs.html">http://www.mathworks.com/help/distcomp/program-independent-jobs.html">Program Independent Jobs".  

    For more information about how to construct and run a communicating job, see the Mathworks documentation page "www.mathworks.com/help/distcomp/introduction.html">http://www.mathworks.com/help/distcomp/introduction.html">Program Communicating Jobs".

    Errors

    If you have any errors related to the OSC MATLAB PCT configuration files, or have any other questions about utilizing the MATLAB PCT at OSC please contact OSC Help with: your user id, any relevant error messages, a job ID(s) if applicable, and the version of MATLAB you are using.

    Supercomputer: 

    HOWTO: Identify users on a project account and check status

    An eligible principal investigator (PI) heads a project account and can authorize/remove user accounts under the project account (please check our Allocations and Accounts documentation for more details). This document shows you how to identify users on a project account and check the status of each user. 

    Identify Users on a Project Account

    If you know the project acccount

    If you know the project account (projectID), the following command will list all users on the project:

    getent group projectID
    

    The returned information is in the format of:

    projectID:*:gid: list of user IDs
    

    gid is the group identifier number unique for the project account projectID. 

    For example, the command  getent group PZS0530  lists all users on the project account PZS0530 as below:

    bash-4.1$ getent group PZS0530
    PZS0530:*:2959:yli,osc0539,elton,ananth,osc0413,dhudak,osc0695,ksaantha,changlee,buss,osc0414,osc0478,nsharma,kmanalo,nwadih,bsmith,...
    

    If you don't know the project acccount, but know the user account

    If you don't know the project account, but know the user account, userID, use the  groups  command to list all of the groups the userID belongs to:

    groups userID
    

    The returned information is in the format of:

    userID : list of groups
    

    where the first item after " userID : " is the primary group, i.e. the project account (projectID) which userID is on. Once you know the project account, use the command  getent group projectID  as discussed in the previous session to list all users on this project.

    For example, with the userID as xwang, the command groups xwang  returns the information as below:

    bash-4.1$ groups xwang
    xwang : oscgen lstc gaussian ...
    

    It lists all groups xwang belongs to, and oscgen is the project account which xwang is on. The command  getent group oscgen  lists all user accounts on the project account oscgen:

    bash-4.1$ getent group oscgen
    oscgen:*:200:jordan,jpu,mlewis,rmonroe,rmarshal,spears,sengupta,jtm,adraghi,lepage,ian,karin,remote,njustice,bedford,hjiang,tom,mudronja,elainep,gisadmin,airani,guilfoos,osc0498,osc0722,ngagnet,mfaerman,justinw,arya,mattm,echong,rahmed,jwright,...
    

    If you don't know either the project acccount or user account

    If you don't know either the project account or user account, you can use ldapsearch  command to get the user account based on the user's registration information such as name, email address, etc. 

    Use the following command to list all of the user accounts accociated with the name First Last:

    ldapsearch -x -LLL "(gecos=First Last)" | grep cn | awk '{print $2}'

    Use the following command to list all of the user accounts accociated with the email address email@address:

    ldapsearch -x -LLL "(mail=email@address)" | grep cn | awk '{print $2}'

    For example, with user's first name as Summer and last name as Wang, the command

    ldapsearch -x -LLL "(gecos=Summer Wang)" | grep cn | awk '{print $2}' returns the information as below:

    bash-4.1$ ldapsearch -x -LLL "(gecos=Summer Wang)" | grep cn | awk '{print $2}'
    xwang
    xwangnd
    ... 
    

    With user's email address as xwang@osc.edu, the command ldapsearch -x -LLL "(mail=xwang@osc.edu)" | grep cn | awk '{print $2}' returns the information as below:

    bash-4.1$ ldapsearch -x -LLL "(mail=xwang@osc.edu)" | grep cn | awk '{print $2}'
    xwang
    xwangnd
    ...
    

    Once you know the user account userID, follow the discussions in previous session to get all user accounts on the project. Please contact OSC Help if you have any questions. 

    Check the Status of a User

    Use the  finger  command to check the status of  a user account userID as below:

    finger userID
    

    For example, if the userID is xwang, the command  finger xwang  will return:

    bash-4.1$ finger xwang
    Login: xwang                            Name: Summer Wang
    Directory: /nfs/17/xwang                Shell: /bin/bash
    On since Tue Apr 19 09:58 (EDT) on pts/190 from localhost:43.0
    Mail forwarded to xwang@osc.edu
    No mail.
    No Plan.
    
    • The home directory of xwang is on /nfs/17 ( Directory: /nfs/17/xwang )
    • The shell of xwang is bash ( Shell: /bin/bash ). If the information is Shell:/access/denied , it means this user account has been either archived or resticted. Please contact OSC Help if you'd like to reactivate this user account.
    • xwang@osc.edu is the associated email with the user account xwang; that is, all OSC emails related to the account xwang will be sent to xwang@osc.edu ( Mail forwarded to xwang@osc.edu ). Please contact OSC Help if the email address asscoiated with this user account has been changed to ensure important notifications/messages/remindars from OSC may be received in a timely manner.

    Check the Usage and Quota of a User's Home Directory/Project's Project Space

    All users see their file system usage staticstics when logging in, like so:

    As of 2016 Apr 20 04:02 userid userID on /nfs/nn used XGB of quota 500GB and Y files of quota 1000000 files

    The information is from the file /usr/local/quotas/quota_report.txt, which is updated daily. Some users may see multiple lines asscoiated with userid, as well as information on project space usage and quota, if there is one. The usage and quota of the home diretory of userID is provided by the line including the file server your home directory is on (for more information, please visit Home Directories), while others (generated due to file copy) can be safely ingored. 

    You can check any user's home directory or a project's project space usage and quota by running:

    grep <userID OR projectID > /usr/local/quotas/quota_report.txt
    
    Supercomputer: 
    Service: 

    HOWTO: Install Local R Packages

    This document shows you the steps to install R packages locally without root access on OSC's Oakley cluster. 

    R comes with a single library  $R_HOME/library which contains the standard and recommended packages. This is usually in a system location. On Oakley cluster, it is   /usr/local/R/3.0.1/lib64/R/library . R also has a default value for a directory where users can install their own R packages. On Oakley cluster, it is ~/R/x86_64-unknown-linux-gnu-library/3.0 if the default R-3.0.1 module is loaded. This directory doesn't exist by default. The first time a user installs an R package, R will ask the user if s/he wants to use the default location and if yes, will create the directory.

    A Simple Example

    First you need to load the module for R:

    module load R
    

    On Oakley, the default R module is version 3.0.1 .

    Then fire up an R session:

    R
    

    To install package lattice, use this command inside R:

    > install.packages("lattice", repos="http://cran.r-project.org")
    

    It gives a warning: 

    Warning in install.packages("lattice") :
    'lib = "/usr/local/R/3.0.1/lib64/R/library"' is not writable
    Would you like to create a personal library
    ~/R/x86_64-unknown-linux-gnu-library/3.0
    to install packages into?  (y/n) 
    

    Answer y , and it will create the directory and install the package there.

    Setting the Local R Library Path

    If you want to use another location rather than the default location, for example, ~/local/R_libs/ ,  you need to create the directory first:

    mkdir ~/local/R_libs
    

    Then type the following command inside R:

    > install.packages("lattice", repos="http://cran.r-project.org", lib="~/local/R_libs/")
    

    It is a bit of burden having to type the long string of library path every time. To avoid doing that, you can create a file .Renviron in your home directory, and add the following line to the file:

    export R_LIBS=~/local/R_libs/ 
    

    Whenever R is started, the directory ~/local/R_libs/ is added to the list of places to look for R packages and so:

    > install.packages("lattice", repos="http://cran.r-project.org")
    

    will have the same effect as the previous install.packages() command. 

    To see the directories where R searches for libraries, use the command:

    >.libPaths();
    
    

    Setting The Repository

    When you install an R package, you are asked which repository R should use. To set the repository and avoid having to specify this at every package install, create a file .Rprofile in your home directory. This is the start up code for R. Add the following line to the file:

    cat(".Rprofile: Setting R repository:")
    repo = getOption("repos") 
    # set up the server from which you will download the package.
    repo["CRAN"] = "http://cran.case.edu" 
    options(repos = repo)
    rm(repo)
    

    Now you only need to do 

    > install.packages("lattice")
    

    That will download the package lattice from  http://cran.case.edu and install it in ~/local/R_libs  .

    Updating Packages

    > update.packages() inside an R session is the simplest way to ensure that all the packages in your local R library are up to date. It downloads the list of available packages and their current versions, compares it with those installed and offers to fetch and install any that have later versions on the repositories.

    Removing packages

    > remove.packages("lattice") inside an R session to remove package lattice . An even easier way is just to go into the directory ~/local/R_libs and remove the directory lattice from there.

     

    References

    Add-on packages in R installation guide (http://cran.r-project.org/doc/manuals/R-admin.pdf)

    Supercomputer: 
    Service: 

    HOWTO: Install your own Perl modules

    While we provide a number of Perl modules, you may need a module we do not provide. If it is a commonly used module, or one that is particularly difficult to compile, you can contact OSC Help for assistance, but we have provided an example below showing how to build and install your own Perl modules. Note, these instructions use "bash" shell syntax; this is our default shell, but if you are using something else (csh, tcsh, etc), some of the syntax may be different.

    CPAN Minus

    CPAN, the Comprehensive Perl Achive Network, is the primary source for publishing and fetching the latest modules and libraries for the Perl programming language. The default method for installing Perl modules using the "CPAN Shell", provides users with a great deal of power and flexibility but at the cost of a complex configuration and inelegant default setup.

     

    Setting Up CPAN Minus

    To use CPAN Minus, we must first load it, if it hasn't already been loaded.

    module load cpanminus
    

    Next, in order to use cpanminus, you will need to run the following command only ONCE:

    perl -I /usr/local/cpanminus/perl5/lib/perl5 -Mlocal::lib
    

     

    Using CPAN Minus

    In most cases, using CPAN Minus to install modules is as simple as issuing a command in the following form:

    cpanm [Module::Name]
    

    For example, below are three examples of installing perl modules:

    cpanm Math::CDF
    cpanm SET::IntervalTree
    cpanm DB_File
    

     

    Testing Perl Modules

    To test a perl module import, here are some examples below:

    perl -e "require Math::CDF"
    perl -e "require Set::IntervallTree"
    perl -e "require DB_File"
    

    The modules are installed correctly if no output is printed.

    What Local Modules are Installed in my Account?

    To show the local modules you have installed in your user account:

    perldoc perllocal
    

    Reseting Module Collection

    If you should ever want to start over with your perl module collection, delete the following folders:

    rm -r ~/perl5 
    rm -r ~/.cpanm
    

     

    Supercomputer: 
    Service: 

    HOWTO: Install your own python modules

    While we provide a number of Python modules, you may need a module we do not provide. If it is a commonly used module, or one that is particularly difficult to compile, you can contact OSC Help for assistance, but we have provided an example below showing how to build and install your own Python modules, and make them available inside of Python. Note, these instructions use "bash" shell syntax; this is our default shell, but if you are using something else (csh, tcsh, etc), some of the syntax may be different.

    Gather your materials

    First, you need to collect up what you need in order to do the installation. To keep things tidy, we will do all of our work in $HOME/local/src. You should make this directory now.

    mkdir -p $HOME/local/src

    Now, we will need to download the source code for the module we want to install. In our example, we will use "NumExpr", a module we already provide in the system version of Python. You can either download the file to your desktop, and then upload it to OSC, or directly download it using the wget utility (if you know the URL for the file).

    cd ~/local/src
    wget http://numexpr.googlecode.com/files/numexpr-2.0.1.tar.gz

    Now, extract the downloaded file. In this case, since it's a "tar.gz" format, we can use tar to decompress and extract the contents.

    tar xvfz numexpr-2.0.1.tar.gz

    You can delete the downloaded archive now, if you wish, or leave it around should you want to start the installation from scratch.

    Build it!

    Environment

    To build the module, we will want to first create a temporary environment variable to aid in installation. We'll call it "INSTALL_DIR".

    export INSTALL_DIR=${HOME}/local/numexpr/2.0.1

    I am following, roughly, the convention we use at the system level. This allows us to easily install new versions of software without risking breaking anything that uses older versions. We have specified a folder for the program (numexpr), and for the version (2.0.1). Now, to be consistent with python installations, we're going to create a second temporary environment variable, which will contain the actual installation location.

    export TREE=${INSTALL_DIR}/lib/python2.7/site-packages

    Now, make the directory tree.

    mkdir -p $TREE

    Compile

    To compile the module, we should switch to the GNU compilers. The system installation of Python was compiled with the GNU compilers, and this will help avoid any unnecessary complications. We will also load the Python module, if it hasn't already been loaded.

    module swap intel gnu
    module load python

    Now, build it. This step may vary a bit, depending on the module you are compiling. You can execute python setup.py --help to see what options are available. Since we are overriding the install path to one that we can write to, and that fits our management plan, we need to use the --prefix option.

    python setup.py install --prefix=$INSTALL_DIR

    Make it usable

    At this point, the module is compiled and installed in ~/local/numexpr/2.0.1/lib/python2.7/site-packages. Occasionally, some files will be installed in ~/local/numexpr/2.0.1/bin as well. To ensure Python can locate these files, we need to modify our environment.

    Manual

    The most immediate way - but the one that must be repeated every time you wish to use the module - is to manually modify your environment. If files are installed in the "bin" directory, you'll need to add it to your path. As before, these examples are for bash, and may have to be modified for other shells. Also, you will have to modify the directories to match your install location.

    export PATH=$PATH:~/local/numexpr/2.0.1/bin

    And, for the python libraries:

    export PYTHONPATH=$PYTHONPATH:~/local/numexpr/2.0.1/lib/python2.7/site-packages

    Hardcode it

    We don't really recommend this option, as it is less flexible, and can cause conflicts with system software. But, if you want, you can modify your .bashrc (or similar file, depending on your shell) to set these environment variables automatically. Be extra careful; making a mistake in .bashrc (or similar) can destroy your login environment in a way that will require a system administrator to fix. To do this, you can copy the lines above modifying $PATH and $PYTHONPATH into .bashrc. Remember - test them interactively first! If you destroy your shell interactively, the fix is as simple as logging out and then logging back in. If you break your login environment, you'll have to get our help to fix it.

    Make a module (recommended!)

    This is the most complicated option, but it is also the most flexible, as you can have multiple versions of this particular software installed, and specify at run-time which one to use. This is incredibly useful if a major feature changes that would break old code, for example. You can see our tutorial on writing modules here, but the important variables to modify are, again, $PATH and $PYTHONPATH. You should specify the complete path to your home directory here, and not rely on any shortcuts like ~ or $HOME.  Below is a modulefile written in Lua:

    If you are following the tutorial on writing modules, you will want to place this file in $HOME/local/share/modulefiles/numexpr/2.0.1.lua:

    -- This is a Lua modulefile, this file 2.0.1.lua can be located anywhere
    -- But if you are following a local modulefile location convention, we place them in
    -- $HOME/local/share/modulefiles/
    -- For numexpr we place it in $HOME/local/share/modulefiles/numexpr/2.0.1.lua
    
    -- This finds your home directory
    local homedir = os.getenv("HOME")
    
    prepend_path("PYTHONPATH", 
        pathJoin(homedir, "/local/numexpr/2.0.1/lib/python2.7/site-packages"))
    prepend_path(homedir, "local/numexpr/2.0.1/bin"))

     

    Once your module is created (again, see the guide), you can use your python module simply by loaded the software module you created.

    module use $HOME/local/share/modulefiles/
    module load numexpr/2.0.1
    Supercomputer: 
    Service: 

    HOWTO: Locally Installing Software

    Sometimes the best way to get access to a piece of software on the HPC systems is to install it yourself as a "local install". This document will walk you through the OSC-recommended procedure for maintaining local installs in your home directory or project space.

    NOTE: Throughout this document we'll assume you're installing into your home directory, but you can follow the steps below in any directory for which you have read/write permissions.
    This document assumes you are familiar with the process of building software using "configure" or via editing makefiles, and only provides best practices for installing in your home directory.

    Getting Started

    Before installing your software, you should first prepare a place for it to live. We recommend the following directory structure, which you should create in the top-level of your home directory:

        local
        |-- src
        |-- share
            `-- lmodfiles
    

    This structure is how OSC organizes the software we provide. Each directory serves a specific purpose:

    • local - Gathers all the files related to your local installs into one directory, rather than cluttering your home directory. Applications will be installed into this directory with the format "appname/version". This allows you to easily store multiple versions of a particular software install if necessary.
    • local/src - Stores the installers -- generally source directories -- for your software. Also, stores the compressed archives ("tarballs") of your installers; useful if you want to reinstall later using different build options.
    • local/share/lmodfiles - The standard place to store module files, which will allow you to dynamically add or remove locally installed applications from your environment.

    You can create this structure with one command.

    NOTE: Ensure $HOME is the full path of your home directory. You can identify this from the command line with the command echo $HOME.

    After navigating to where you want to create the directory structure, run:

        mkdir -p $HOME/local/src $HOME/local/share/lmodfiles
    

    Installing Software

    Now that you have your directory structure created, you can install your software. For demonstration purposes, we will install a local copy of Git.

    First, we need to get the source code onto the HPC filesystem. The easiest thing to do is find a download link, copy it, and use the wget tool to download it on the HPC. We'll download this into $HOME/local/src:

        cd $HOME/local/src
        wget https://github.com/git/git/archive/v2.9.0.tar.gz
    

    Now extract the tar file:

        tar zxvf v2.9.0.tar.gz
    

    Next, we'll go into the source directory and build the program. Consult your application's documentation to determine how to install into $HOME/local/"software_name"/"version". Replace "software_name" with the software's name and "version" with the version you are installing, as demonstrated below. In this case, we'll use the configure tool's --prefix option to specify the install location.

    You'll also want to specify a few variables to help make your application more compatible with our systems. We recommend specifying that you wish to use the Intel compilers and that you want to link the Intel libraries statically. This will prevent you from having to have the Intel module loaded in order to use your program. To accomplish this, add CC=icc CFLAGS=-static-intel to the end of your invocation of configure. If your application does not use configure, you can generally still set these variables somewhere in its Makefile or build script.

    Then, we can build Git using the following commands:

        cd git-2.9.0
        autoconf # this creates the configure file
        ./configure --prefix=$HOME/local/git/2.9.0 CC=icc CFLAGS=-static-intel
        make && make install
    

    Your application should now be fully installed. However, before you can use it you will need to add the installation's directories to your path. To do this, you will need to create a module.

    Creating a Module

    Modules allow you to dynamically alter your environment to define environment variables and bring executables, libraries, and other features into your shell's search paths.

    We will be useing the filename 2.9.0.lua ("version".lua). A simple Lua module for our Git installation would be:

    -- Local Variables
    local name = "git"
    local version = "2.9.0"
    
    -- Locate Home Directory
    local homedir = os.getenv("HOME")
    local root = pathJoin(homedir, "local", name, version)
    
    -- Set Basic Paths
    prepend_path("PATH", pathJoin(root, "bin"))
    prepend_path("MANPATH", pathJoin(root, "share/man"))
    

    NOTE: For future module files, copy our sample modulefile from ~support/doc/modules/sample_module.lua. This module file follows the recommended design patterns laid out above and includes samples of many common module operations

    Oakley, Ruby, and Owens use a Lua based module system. However, there is another module system based in TCL that will not be discussed in this HOWTO.                                                                         NOTE: TCL is cross-compatible and is converted to Lua when loaded. More documentation is available at https://www.tacc.utexas.edu/research-development/tacc-projects/lmod/ or by executing module help.

    Initializing Modules

    Any module file you create should be saved into your local lmodfiles directory ($HOME/local/share/lmodfiles). To prepare for future software installations, create a subdirectory within lmodfiles named after your software and add one module file to that directory for each version of the software installed.

    In the case of our Git example, you should create the directory $HOME/local/share/lmodfiles/git and create a module file within that directory named 2.9.0.lua.

    To make this module usable, you need to tell lmod where to look for it. You can do this by issuing the command module use $HOME/local/share/lmodfiles in our example. You can see this change by performing module avail. This will allow you to load your software using either module load git or module load git/2.9.0

    NOTE: module use$HOME/local/share/lmodfiles and module load "software_name" needs to be entered into the command line every time you enter a new session on the system.

    If you install another version later on (lets say version 2.9.1) and want to created a module file for it, you need to make sure you call it 2.9.1.lua. When loading Git, lmod will automatically load the newer version. If you need to go back to an older version, you can do so by specifying the version you want: module load git/2.9.0.

    To make sure you have the correct module file loaded, type which git which should give the output "~/local/git/2.9.0/bin/git" (NOTE: ~ is equivalent to $HOME). 

    To make sure the software was installed correctly and that the module is working, type git --version which shuold give the output "git version 2.9.0".

    Further Reading

    For more information about modules, be sure to read the webpage indicated at the end of module help. If you have any questions about modules or local installations, feel free to contact the OSC Help Desk and oschelp@osc.edu.

    Supercomputer: 
    Service: 

    HOWTO: Reduce Disk Space Usage

    This "how to" will demonstrate how to lower ones' disk space usage.  The following procedures can be applied to all of OSC's file systems.

    We recommend users regularly check their data usage and clean out old data that is no longer needed.

    Users who need assistance lowering their data usage can contact OSC Help.

    Preventing Excessive Data Usage Before It Starts

    Users should ensure that their jobs are written in such a way that temporary data is not saved to permanent file systems, such as the project space file system or their home directory.  
     

    If your job copies data from the scratch file system or its node's local disk ($TMPDIR) back to a permanent file system, such as the project space file system or a home directory, you should ensure you are only copying the files you will need later.

    Identifying Old and Large Data

    The following commands will help you identify old data using the find command.

    find commands may produce an excessive amount of output.  To terminate the command while it is running, click  CTRL + C .

    Find all files in a directory that have not been accessed in the past 100 days:

    This command will recursively search the users home directory and give a detailed listing of all files not accessed in the past 100 days.  

    The last access time atime is updated when a file is opened by any operation, including grep , cat , head , sort , etc.

    find ~ -atime +100 -exec ls -l {} \;
    
    • To search a different directory replace ~ with the path you wish to search.  A period . can be used to search the current directory.
    • To view files not accessed over a different time span, replace 100 with your desired number of days.
    • To view the total size in bytess of all the files found by find , you can add  | awk '{s+=$5} END {print "Total SIZE (bytes): " s}' to the end of the command:
    find ~ -atime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
    

    Find all files in a directory that have not been modified in the past 100 days:

    This command will recursively search the users home directory and give a detailed listing of all files not modified in the past 100 days.

    The last modified time mtime is updated when a file's contents are updated or saved.  Viewing a file will not update the last modified time.

    find ~ -mtime +100 -exec ls -l {} \; 
    
    • To search a different directory replace  ~  with the path you wish to search.  A period  .  can be used to search the current directory.
    • To view files not modified over a different time span, replace  100  with your desired number of days.
    • To view the total size in bytes of all the files found by find , you can add  | awk '{s+=$5} END {print "Total SIZE (bytes): " s}'  to the end of the command:
    find ~ -mtime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
    

    List files larger than a specified size:

    Adding the -size <size>  option and argument to the find command allows you to only view files larger than a certain size.  This option and argument can be added to any other find command.

    For example, to view all files in a users home directory that are larger than 1GB:

    find ~ -size 1G -exec ls -l {} \;
    

    Deleting Identified Data

    CAUTION: Be careful when deleting files.  Be sure your command will do what you want before running it.  Extra caution should be used when deleting files from a file system that is not backed up, such as the scratch file system.

    If you no longer need the old data, you can delete it using the rm command.

    If you need to delete a whole directory tree (a directory and all of its subcontents, including other directories), you can use the rm -R command.  

    For example, the following command will delete the data directory in a users home directory:

    rm -R ~/data
    

    If you would like to be prompted for conformation before deleting every file, use the -i option.

    rm -Ri ~/data 
    

    Enter y or n when prompted.  Simply pressing the enter button will default to n .

    Deleting files found by find

    rm can be combined with any find command to delete the files found.  The syntax for doing so is:

    find <location> <other find options> -exec rm -i {} \;
    

    Where <other find options> can include one or more of the options  -atime <time> , -mtime <time> , and -size <size> .

    The following command would find all files in the ~/data directory 1G or larger that have not been accessed in the past 100 days, and then prompt for confirmation to delete each file:

    find ~/data -atime +100 -size 1G -exec rm -i {} \;
    

    If you are absolutely sure the files identified by find are okay to delete you can remove the -i option to rm and you will not be prompted.  Extreme caution should be used when doing so!

    Archiving Data

    If you still need the data but do not plan on needing the data in the immediate future, contact OSC Help to discuss moving the data to a archive file system.  Requests for data to be moved to the archive file system should be larger than 1TB. 

    Compressing

    If you need the data but do not access the data frequently, you should compress the data using tar or gzip.

    Moving Data to a Local File System

    If you have the space available locally you can transfer your data there using sftp or Globus

    Globus is recommended for large transfers.

    The OnDemand File application should not be used for transfers larger than 1GB.

    Supercomputer: 
    Service: 

    HOWTO: Submit multiple jobs using parameters

    Often users want to submit a large amount of jobs all at once with each using different parameters for each job.  These parameters could be anything, including the path of a data file or different input values for a program.  This how-to will show you how you can do this using a simple python script, a CSV file, and a template script.  You will need to adapt this advice for your own situation.

     

    Consider the following batch script:

    #PBS -l nodes=1:ppn=12
    #PBS -l walltime=80:00:00
    #PBS -n week42_data8
    
    # Copy input data to the nodes fast local disk 
    cp ~/week42/data/source1/data8.out $TMPDIR
    cd $TMPDIR
    
    # Run the analysis 
    full_analysis data8.in data8.out
    
    # Copy results to proper folder
    cp  data8.out ~/week42/results

    Lets say you need to submit 100 of these jobs on a weekly basis.  Each job uses a different data file as input.  You recieve data from two different sources, and thus your data is located within two different folders.  All of the jobs from one week need to store their results in a single weekly results folder.  The output file name is based upon the input file name.

    Creating a Template Script

    As you can see, this job follows a general template.  There are three main parameters that change in each job:

    1. The week 
      • Used as part of the job name
      • Used to find the proper data file to copy to the nodes local disk
      • Used to copy the results to the correct folder
    2. The data source
      • Used to find the proper data file to copy to the nodes local disk
    3. The data file's name
      • Used as part of the job name
      • Used to find the proper data file to copy to the nodes local disk
      • Used to specify both the input and output file to the program full_analysis
      • Used to copy the results to the correct folder

    If we replace these parameters with variables, prefixed by the dollar sign $and surrounded by curly braces { }, we get the following template script:

    #PBS -l nodes=1:ppn=12
    #PBS -l walltime=80:00:00
    #PBS -n ${WEEK}_${DATA}
    
    # Copy input data to the nodes fast local disk 
    cp ~/${WEEK}/data/${SOURCE}/${DATA}.out $TMPDIR
    cd $TMPDIR
    
    # Run the analysis 
    full_analysis ${DATA}.in ${DATA}.out
    
    # Copy results to proper folder
    cp  ${DATA}.out ~/${WEEK}/results
    

    Automating Job Submission

    We can now use qsub's -v option to pass parameters to our template script.  The format for passing parameters is:

    qsub -v par_name=par_value[,par_name=par_value...] script.sh

    Submitting a 100 jobs using qsub -v option manually does not make our task much easier than modifying and submitting each job one by one.  To complete our task we need to automate the submission of our jobs.  We will do this by using a python script that submits our jobs using parameters it reads from a CSV file.  

    Note that python was chosen for this task for its general ease of use and understandability -- if you feel more comfortable using another scritping language feel free to interpret/translate this python code for your own use.

    Here is the script that submits the jobs using the parameters:

    #!/usr/bin/env python
    import csv, subprocess
    
    parameter_file_full_path = "/nfs/12/user0123/week42/job_params.csv"
    
    with open(parameter_file_full_path, "rb") as csvfile:
        reader = csv.reader(csvfile)
        for job in reader:
            qsub_command = """qsub -v WEEK={0},SOURCE={1},DATA={2} template_1.pbs""".format(*job)
    
            #print qsub_command # Uncomment this line when testing to view the qsub command
    
            # Comment the following 3 lines when testing to prevent jobs from being submitted
            exit_status = subprocess.call(qsub_command, shell=True)
            if exit_status is 1:  # Check to make sure the job submitted
                print "Job {0} failed to submit".format(qsub_command)
    print "Done submitting jobs!"

    This script will open the CSV file specified by the variable parameter_file_full_path and step through the file line by line, submitting a job for each line using the lines values.  If the qsub command returns a non-zero exit code, usually indicating it was not submitted, we will print this out to the display.  The jobs will be submitted using the general format:

    qsub -v WEEK=WEEK_VALUE,SOURCE=SOURCE_VALUE,DATA=DATA_VALUE template_1.pbs

    Where WEEK_VALUE ,SOURCE_VALUE, and DATA_VALUE are the first, second, and third comma separated values in the CSV file's current row, and template_1.pbs is the name of our template script.

    Creating a CSV File

    We now need to create a CSV file with parameters for each job.  This can be done with a regular text editor or using a spreadsheet editor such as excel.  By default you should use commas as your delimiter.  

    Here is our CSV file with parameters:

    week42,source1,data1
    week42,source1,data2
    week42,source1,data3
    ...
    week42,source2,data98
    week42,source2,data99
    week42,source2,data100
    

    The submit script would read in the first row of this CSV file and form and  submit the following qsub command:

    qsub -v WEEK=week42,SOURCE=source1,DATA=data1 template_1.pbs

    Submitting Jobs

    Once all the above is done all you need to do to submit your jobs is make sure the CSV file is populated with the proper parameters and run the automatic submission script.  

    Before submitting a large amount of jobs for the first time using this method it is recommended you test our your implimentation with a small amount of jobs.

    HOWTO: Transfer files using Globus Connect

    Globus Connect Logo

    Globus Connect is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems.  It aims to make transfers a "click-and-forget" process by setting up configuration details in the background and automating fault recovery.  

    Globus can be used for both file transfers between OSC and:

    • A computing institution with Globus installed (check with your site provider for availability) or
    • A personal computer (known as a personal endpoint)

    Users transferring between OSC and another computing institution with​ Globus installed do not need to install Globus Connect Personal, and can skip to Usage.

    More on how Globus works can be found on the Globus "How It Works" page.

    If you are looking to transfer smaller sized files you can utilize OnDemand's file transfer capabilities, or use a SFTP client to connect to sftp.osc.edu. Our general recommendation is that for small files - measured in MB to several hundred MB - to use OnDemand or SFTP. You can continue to use SFTP and get reasonable performance up to file sizes of several GB. For transfers of several GB or larger, you should consider using Globus Online.

    Install Globus Connect Personal

    To use Globus to transfer from a personal computer, you will need to install the Globus Connect Personal client on your computer following the steps below. Those transferring between OSC and another computing institution can skip to Usage.

    1. Sign up for a free Globus account
    2. Download the Globus Connect Personal Client 
      • Click "Manage Endpoints" under the "Manage Transfers/Data" menu
      • Click "add Globus Connect" on the top-right of the page
      • Choose a unique name for your endpoint and generate the setup key
      • Download the Globus Connect client for your operating system
    3. Install Globus Connect Personal Client
      • Windows
        1. Run the Installer
        2. Copy-Paste the setup key to complete the installation
      • Mac​​
        1. Mount your drives
        2. Copy the Globus Client to your application Folder
        3. Start The Globus Client, and enter the provided setup key
      • Linux
        1. Un-tar the .tgz file with the command tar -zxvf
        2. Run globusconnect , found within the unzipped directory
        3. Copy-Paste the setup key when prompted
    4. (Optional) Changing directories accessible to Globus

    By default Globus will only add certain default folders to the list of files and directories accessible by Globus. To change/add/remove files and directories from this list:

    Windows

    1. Start Globus Connect Personal
    2. Go to Tools -> Options
    • Add directories/files using the  "+" button
    • Remove directories/files using the "-" button
    • Revert to the default accessible directories/files using the "Reset to Defaults" button
    • Any changes you make are not made permanent until you press the "Save" button

    ​​Mac

    1. Start Globus Connect Personal
    2. Go to Preferences -> Access
    • Add directories/files using the  "+" button
    • Remove directories/files using the "-" button
    • Revert to the default accessible directories/files using the "Reset to Defaults" button
    • Any changes you make are not made permanent until you press the "Save" button

    Usage

    1. Login to Globus and navigate to the "start transfer" page under the "Manage Transfers" menu
    2. Enter your end point in one of the boxes
      • If transferring to a computer with Globus Connect Personal installed, this will be the unique name chosen during installation
    3. Enter osc#gcs (OSC Globus Connect Server) as the other endpoint
      • If this is your first time connecting to OSC Globus Connect Server, enter your OSC username and password for the authetication
    4. You can now transfer files and directories both ways by selecting them and pressing the arrow indicating which way you'd like to transfer
    If you are doing a large transfer you should transfer to/from the parallel file system or project space for best performance.
    Once a transfer has begun, you do not need to keep the Globus webpage up, but you will need to make sure the Globus Connect Personal Client is running on your computer until it has completed.  If the transfer is interrupted for any reason, Globus will attempt to re-initiate the transfer automatically.

    HOWTO: Add InCommon Authentication to Globus

    (OPTIONAL) Adding InCommon Authentication 

    Adding InCommon authentication to your Globus account allows you to login to Globus Online using your university credentials.  Using this process you can store your Globus username password for safe keeping, and instead use your university username and password to login.  If your already logged in to your university authentication system, logging in to Globus can be as simple as two clicks away.

    To use this feature, your university needs to be a InCommon participant.  Some Ohio universities active in InCommon include: Ohio State University, Case Western University, Columbus State Community College, Miami University, Ohio Northern University, Ohio University, University of Findlay, University of Dayton, and many more.  

    For a complete list, visit https://incommon.org/participants/ .

    To add InCommon Authentication:
    1. Login to Globus Online
    2. Go to "Manage Identities" under your username
    3. Click "Add External Identity"
      • Choose a name for the Identity Settings. 
      • Choose InCommon / CILogon from the drop down menu
    4. On the next page, choose your University / Identity Provider
    • Click "Remember this selection"
    • Click "Log on"
    • You may be prompted to login to your university authentication system if you are not already

    When you go to login next, click "alternative login" and then "InCommon / CILogon".  Select your university on the next page, and login using your university credentials.  Globus will remember this preference, and automatically prompt you to login using your university authentication next time.

    HOWTO: Use NFSv4 ACL

    This document shows you how to use the NFSv4 ACL permissions system. An ACL is a list of permissions associated with a file or directory. These permissions allow you to restrict access to a certian file or directory by user or group. NFSv4 ACLs provide more specific options than typical POSIX read/write/execute permissions used in most systems.

    Understanding NFSv4 ACL

    This is an example of an NFSv4 ACL

    A::user@nfsdomain.org:rxtncy
    
    A::alice@nfsdomain.org:rxtncy
    
    A::alice@nfsdomain.org:rxtncy
    
    A::alice@nfsdomain.org:rxtncy
    

    The following sections will break down this example from left to right and provide more usage options

     

    ACE Type

    The 'A' in the example is known as the ace type. The 'A' denotes "Allow" meaning this ACL is allowing the user or group to perform actions requiring permissions. Anything that is not explicitly allowed is denied by default.

    Note: 'D' can denote a Deny ACE. While this is a valid option, this ACE type is not reccomended since any permission that is not explicity granted is automatically denied meaning Deny ACE's can be redundant and complicated.

     

    ACE Flags

    The above example could have a distinction known as a flag shown below

    A:d:user@osc.edu:rxtncy
    

    The 'd' used above is called an inheritence flag. This makes it so the ACL set on this directory will be automatically established on any new subdirectories. Inheritence flags only work on directories and not files. Multiple inheritence flags can be used in combonation or omitted entirely. Examples of inheritence flags are listed below:

    Flag Name Function
    d directory-inherit New subdirectories will have the same ACE
    f file-inherit New files will have the same ACE minus the inheritence flags 
    n no-propogate inherit New subdirectories will inherit the ACE minus the inheritence flags
    i inherit-only New files and subdirectories will have this ACE but the ACE for the directory with the flag is null

     

    ACE Principal

    The 'user@nfsdomain.org' is a principal. The principle denotes the people the ACL is allowing access to. Principals can be the following:

    • A named user
    • Special principals
      • OWNER@
      • GROUP@
      • EVERYONE@
    • A group
      • Note: When the principal is a group, you need to add a group flag, 'g', as shown in the below example
      • A:g:group@osc.edu:rxtncy
        

     

    ACE Permissions

    The 'rxtncy' are the permissions the ACE is allowing. Permissions can be used in combonation with each other. A list of permissions and what they do can be found below:

    Permission Function
    r read-data (files) / list-directory (directories)
    w write-data (files) / create-file (directories)
    a append-data (files) / create-subdirectory (directories)
    x execute (files) / change-directory (directories
    d delete the file/directory
    D delete-child : remove a file or subdirectory from the given directory (directories only)
    t read the attributes of the file/directory
    T write the attribute of the file/directory
    n read the named attributes of the file/directory
    N write the named attributes of the file/directory
    c read the file/directory ACL
    C write the file/directory ACL
    o change ownership of the file/directory

     

    Note: Aliases such as 'R', 'W', and 'X' can be used as permissions. These work simlarly to POSIX Read/Write/Execute. More detail can be found below.

    Alias Name Expansion
    R Read rntcy
    W Write watTNcCy (with D added to directory ACE's
    X Execute xtcy

     

    Using NFSv4 ACL

    This section will show you how to set, modify, and view ACLs

     

    Set and Modify ACLs

    To set an ACE use this command:

    nfs4_setfacl [OPTIONS] COMMAND file
    

    To modify an ACE, use this command:

    nfs4_editfacl [OPTIONS] file
    

    Where file is the name of your file or directory. More information on Options and Commands can be found below.

     

    Options

    Options can be used in combination or ommitted entirely. A list of options is shown below:

    Option Name Function
    -R recursive Applies ACE to a directory's files and subdirectories
    -L logical Used with -R, follows symbolic links
    -P physical Used with -R, skips symbolic links

     

    Commands

    Commands are only used when first setting an ACE. Commands and their uses are listed below.

     

    View ACLs

    To view ACLs, use the following command:

    nfs4_getfacl file
    

    Where file is your file or directory

     

    Supercomputer: 
    Service: 

    HOWTO: Use VNC in a batch job

    SSHing directly to a compute node at OSC - even if that node has been assigned to you in a current batch job - and starting VNC is an "unsafe" thing to do. When your batch job ends (and the node is assigned to other users), stray processes will be left behind and negatively impact other users. However, it is possible to use VNC on compute nodes safely.

    If your work is too big for the regular login nodes, but still not a very large, very intensive computation (for example, you do not expect to saturate all of the cores on a machine for a significant portion of the time you have the application you require open - e.g., you are using the GUI to set up a problem for a longer non-interactive compute job), OnDemand is a much easier way to access desktops.

    The examples below are for Oakley.
     

    To use vncviewer on Oakley, load version 1.2 of turbovnc with the command module load turbovnc/1.2
    To use vncserver on Oakley, load version 1.1 of turbovnc with the command module load turbovnc/1.1

    Starting your VNC server

    Step one is to create your VNC server inside a batch job.

    Option 1: Interactive

    The preferred method is to start an interactive job, requesting an entire node, and then once your job starts, you can start the VNC server.

    qsub -I -l nodes=1:ppn=12:gpus=2:vis

    This command requests an entire GPU node, and tells the batch system you wish to use the GPUs for visualization. This will ensure that the X11 server can access the GPU for acceleration. In this example, I have not specified a duration, which will then default to 1 hour.

    module load virtualgl
    module load turbovnc/1.1

    Then start your VNC server. (The first time you run this command, it may ask you for a password - this is to secure your VNC session from unauthorized connections. Set it to whatever password you desire. We recommend a strong password.)

    vncserver

    The output of this command is important: it tells you where to point your client to access your desktop. Specifically, we need both the host name (before the :), and the screen (after the :).

    New 'X' desktop is n0302.ten.osc.edu:1

    Option 2: Batch

    This option is less optimal, because it is slightly more difficult to get the hostname and screen. However, by submitting a non-interactive batch job, you can go away and have the system email you when your desktop is ready to be connected to, and more importantly if your SSH connection to OSC is somewhat unstable and intermittent, you do not run the risk of being disconnected during your interactive session and having your VNC server terminated. In general, it is recommended you only use this option if running via an interactive session is not feasible.

    In order to start an VNC session non-interactively, you can submit the following script to the scheduler using qsub (adjusting your walltime to what you need):

    #PBS -l nodes=1:ppn=12:gpus=2:vis
    #PBS -l walltime=00:15:00
    #PBS -m b
    #PBS -N VNCjob
    #PBS -j oe
    
    module load virtualgl
    module load turbovnc/1.1
    
    vncserver
    
    sleep 100
    
    vncpid=`pgrep -s 0 Xvnc`
    
    while [ -e /proc/$vncpid ]; do sleep 0.1; done

    This script will send you an email when your job has started, which includes the hostname.

    PBS Job Id: 935621.oak-batch.osc.edu
    Job Name:   VNCjob
    Exec host:  n0282/11+n0282/10+n0282/9+n0282/8+n0282/7+n0282/6+n0282/5+n0282/4+n0282/3+n0282/2+n0282/1+n0282/0
    Begun execution

    The screen is virtually always "1", unless someone else started a VNC server on that node outside of the batch system. You can verify the output of the vncserver command by using qpeek on a login node:

    qpeek jobid

    Where "jobid" is the batch system job number, for example, "935621".

     

    Connecting to your VNC server

    Because the compute nodes of our clusters are not directly accessible, you must log in to one of the login nodes and allow your VNC client to "tunnel" through SSH to the compute node. The specific method of doing so may vary depending on your client software.

    Linux/MacOS

    Option 1: Manually create an SSH tunnel 

    I will be providing the basic command line syntax, which works on Linux and MacOS. You would issue this in a new terminal window on your local machine, creating a new connection to Oakley.

    ssh -L 5901:n0302.ten.osc.edu:5901 guilfoos@oakley.osc.edu

    Open your VNC client, and connect to "localhost:1" - this will tunnel to the correct node on Oakley.

    Option 2: Use your VNC software to tunnel 

    This example uses Chicken of the VNC, a MacOS VNC client.

    The default window that comes up for Chicken requires the host to connect to, the screen (or port) number, and optionally allows you to specify a host to tunnel through via SSH. This screenshot shows a proper configuration for the output of vncserver shown above. Substitute your host, screen, and username as appropriate.

    When you click [Connect], you will be prompted for your HPC password (to establish the tunnel, provided you did not input it into the "password" box on this dialog), and then (if you set one), for your VNC password. If your passwords are correct, the desktop will display in your client.

     

    Windows

    This example shows how to create a SSH tunnel through your ssh client.  We will be using Putty in this example, but these steps are applicable to most SSH clients.

    First, make sure you have x11 forwarding enabled in your SSH client.

    Next, open up the port forwarding/tunnels settings and enter the hostname and port you got earlier in the destination field.  You will need to add 5900 to the port number when specifiying it here.  Some clients may have separate boxes for the desination hostname and port.  

    For source port, pick a number between 11-99 and add 5900 to it.  This number between 11-99 will be the port you connect to in your VNC client.

    Make sure to add the forwaded port, and save the changes you've made before exiting the configutations window.

    PuTTY Tunnel Configuration Settings

    Now start a SSH session to the respective cluster your vncserver is running on.  The port forwarding will automatically happen in the background.  Closing this SSH session will close the forwarded port; leave the session open as long as you want to use VNC.

    Now start a VNC client.  TurboVNC has been tested with our systems and is recommended.  Enter localhost:[port], replacing [port] with the port between 11-99 you chose earlier.

    New TurboVNC Connection

    If you've set up a VNC password you will be prompted for it now.  A desktop display should pop up now if everything is configured correctly.

    How to Kill a VNC session?

    Occasionally you may make a mistake and start a VNC server on a login node or somewhere else you did not want to.  In this case it is important to know how to properly kill your VNC server so no processes are left behind.

    The command syntax to kill a VNC session is:

    vncserver -kill :[screen]

    In the example above, screen would be 1.

    You need to make sure you are on the same node you spawned the VNC server on when running this command.

    Supercomputer: 
    Service: 

    HOWTO: Use an Externally Hosted License

    Many software packages require a license.  These licenses are usually made available via a license server, which allows software to check out necessary licenses.  In this document external refers to a license server that is not hosted inside OSC.

    If you have such a software license server set up using a license manager, such as FlexNet, this guide will instruct you on the necessary steps to connect to and use the licenses at OSC.

    Users who wish to host their software licenses inside OSC should consult OSC Help.

     

    You are responsible for ensuring you are following your software license terms.  Please ensure your terms allow you to use the license at OSC before beginning this process!

    Introduction

    Broadly speaking, there are two different ways in which the external license server's network may be configured.  These differ by whether the license server is directly externally reachable or if it sits behind a private internal network with a port forwarding firewall.  

    If your license server sits behind a private internal network with a port forwarding firewall you will need to take additional steps to allow the connection from our systems to the license server to be properly routed. 

    License Server is Directly Externally Reachable

    Diagram showing network setup of license server that is directly externally reachable

     

    License Server is Behind Port Forwarding Firewall

    Diagram showing network setup of license server that is behind a port forwarding firewall

     

    Unsure?

    If you are unsure about which category your situation falls under contact your local IT administrator.

    Configure Remote Firewall

    In order for connections from OSC to reach the license server, the license server's firewall will need to be configured.  All outbound network traffic from all of OSC's compute nodes are routed through a network address translation host (NAT), or two backup servers.

    The license server should be configured to allow connections from the following IP addresses to the SERVER:PORT where the license server is running:

    • nat.osc.edu (192.157.5.13)
    • 192.148.248.35
    • 192.148.248.186

    Confirm Configuration

    The firewall settings should be verified by attempting to connect to the license server from the compute environment using telenet.

    Get on to a compute node by requesting a short, small, interactive job and test the connection using telenet:

    telnet <License Server IP Address> <Port#>

    (Recommended) Restrict Access to IPs/Usernames

    It is also recommended to restrict accessibility using the remote license server's access control mechanisms, such as limiting access to particular usernames in the options.dat file used with FlexNet-based license servers.

    For FlexNet tools, you can add the following line to your options.dat file, one for each user.

    INCLUDEALL USER <OSC username>
    

    If you have a large number of users to give access to you may want to define a group using GROUP within the options.dat file and give access to that whole group using INCLUDEALL GROUP <group name>.

    Users who use other license managers should consult the license manager's documentation.

    Modify Job Environment to Point at License Server

    The software must now be told to contact the license server for it's licenses.  The exact method of doing so can vary between each software package, but most use an environment variable that specifies the license server IP address and port number to use.

    For example LS DYNA uses the environment variable LSTC_LICENSE and LSTC_LICENSE_SERVER to know where to look for the license.  The following lines would be added to a job script to tell LS-DYNA to use licenses from port 2345 on server 1.2.3.4:

    setenv LSTC_LICENSE network
    setenv LSTC_LICENSE_SERVER 2345@1.2.3.4

    License Server is Behind Port Forwarding Firewall

    If the license server is behind a port forwarding firewall, and has a different IP address from the IP address of the firewall, additional steps must be taken to allow connections to be properly routed within the license server's internal network.  

    1. Use the license server's fully qualified domain name in the environment variables instead of the IP address.
    2. Contact OSC Help to have the firewall IP address mapped to the fully qualified domain name.

     

    Software Specific Details

    The following outlines details particular to a specific software package.  

    ANSYS

    Uses the following environment variables:

    
    ANSYSLI_SERVERS=<port>@<IP>
    ANSYSLMD_LICENSE=<port>@<IP>

     

    HOWTO: Use ulimit command to set soft limits

    This document shows you how to set soft limits using ulimit command.

    The ulimit command sets or reports user process resource limits. The default limits are defined and applied when a new user is added to the system. Limits are categorized as either soft or hard. With the ulimit command, you can change your soft limits for the current shell environment, up to the maximum set by the hard limits. You must have root user authority to change resource hard limits.

    Syntax

    ulimit [-H] [-S] [-a] [-c] [-d] [-f] [-m] [-n] [-s] [-t] [Limit]
    flags description

    -a

    Lists all of the current resource limits

    -c

    Specifies the size of core dumps, in number of 512-byte blocks

    -d

    Specifies the size of the data area, in number of K bytes

    -f

    Sets the file size limit in blocks when the Limit parameter is used, or reports the file size limit if no parameter is specified. The -f flag is the default

    -H

    Specifies that the hard limit for the given resource is set. If you have root user authority, you can increase the hard limit. Anyone can decrease it

    -m

    Specifies the size of physical memory, in number of K bytes

    -n

    Specifies the limit on the number of file descriptors a process may have

    -s

    Specifies the stack size, in number of K bytes

    -S

    Specifies that the soft limit for the given resource is set. A soft limit can be increased up to the value of the hard limit. If neither the -H nor -S flags are specified, the limit applies to both

    -t

    Specifies the number of seconds to be used by each process

    The limit for a specified resource is set when the Limit parameter is specified. The value of the Limit parameter can be a number in the unit specified with each resource, or the value "unlimited". For example, to set the file size limit to 51,200 bytes, use:

    ulimit -f 100

    To set the size of core dumps to unlimited, use:

    ulimit –c unlimited

    How to change ulimit for a MPI program

    ulimit command affects the current shell environment. When a MPI program is started, it does not spawn in the current shell. You have to use mpiexec to start a wrapper script that sets the limit if you want to set the limit for each process. Below is how you set the limit for each shell (We use  ulimit –c unlimited to allow unlimited core dumps, as an example): 

    1. Prepare your batch job script named "myjob" as below (Here, we request 5-hour 2-node on Oakley cluster):
    #PBS -l nodes=2:ppn=12
    #PBS -l walltime=5:00:00
    #PBS ...
    
    #!/bin/bash
    
    cd $PBS_O_WORKDIR
    ...
    mpiexec ./test1
    ...
    1. Prepare the wrapper script named "test1" as below:
    #!/bin/bash
    ulimit –c unlimited
    .....(your own program)
    1. qsub myjob
    Supercomputer: 
    Service: