Recorded on January 13, 2021.
Welcome to OSC! If you are new to supercomputing, new to OSC, or simply interested in getting an account (if you don't already have one), we have some resources to help you.
This guide was created for new users of OSC.
It explains how to use OSC from the very beginning of the process, from creating an account right up to using resources at OSC.
The first step is to make sure that one has an OSC account.
There are multiple ways to create an account.
One can create an account at MyOSC, or be invited to use OSC via a registration email.
Make sure to select the PI checkbox if you are a PI at your institution and want to start your own project at OSC.
Only users with PI status are able to create a project. See how to request PI status in manage profile information. Follow the instructions in creating projects and budgets to create a new project.
Once a project is created, the PI should add themselves to it and any others that they want to permit to use OSC resources under their project.
Refer to adding/inviting users to a project for details on how to do this.
If there was already a project that you would like to reuse, follow the same instructions as found in creating projects and budget, but skip to the budget creation section.
If there are questions about the cost then refer to service costs.
Generally, an Ohio academic PI can create a budget for $1,000 on a project and use the annual $1,000 credit offered to Ohio academic PIs. Review service cost terms for explanations of budgets and credits at OSC.
See the complete MyOSC documentation, our Client Portal, here. The OSCusage command can also provide useful details.
OSC supports classrooms by making it simpler for students to use OSC resources through a customizable OnDemand interface at class.osc.edu
Visit OSC classroom resource guide and contact oschelp@osc.edu if you want to discuss the options there.
As a reminder, there will be no charges for classroom projects.
There are a few options for transferring files between OSC and other systems.
Using the OnDemand file explorer is the quickest option to get started. Just log in to ondemand.osc.edu and click 'File Explorer' from the navigation bar at the top of the page. From there one can upload/download files and directories.
This is a simple option, but for files or directories that are very large, it may not be best. See other options below in this case.
Local software can be used to connect to OSC for downloading and uploading files.
There are quite a few options for this, and OSC does not have a preference for which one you use.
The general guidance for all of them is to connect to host sftp.osc.edu using port 22.
Using Globus is recommended for users that frequently need to transfer many large files/dirs.
We have documentation detailing how to connect to our OSC endpoint in Globus and how to set up a local endpoint on your machine with Globus.
Storage can be requested for a project that is larger than the standard offered by home directories.
On the project details page, submit a "Request Storage Change" and a ticket will be created for OSC staff to create the project space quota.
Finally, after the above setup, one can start using OSC resources. Usually one has some setup that needs to be performed before they can really start using OSC, like creating a custom environment, gaining access to preinstalled software, or installing software to one's home directory that is not already available.
The best place to start is by visiting ondemand.osc.edu, logging in, and starting an interactive desktop session.
Look for the navigation bar at the top of the page and select Interactive Apps, then Owens Desktop.
Notice that there are a lot of fields, but the most important ones, for now, are cores and the number of hours.
Try using only a single core at first, until you are more familiar with the system and can decide when more cores will be needed.
If there is specific software in the Interactive Apps list that you want to use, then go ahead and start a session with it. Just remember to change the cores to one until you understand what you need.
A terminal session can also be started in OnDemand by clicking Clusters then Owens Shell Access.
In this terminal one can perform the needed commands in the below sections on environment setup and software use/installation.
Some of the common programming languages users need an environment set up are python and R.
See add python packages with conda or R software for details.
There are other options, so please browse the OSC software listing.
All the software already available at OSC can be found in the software listing.
Each page has some information on how to use the software from a command line. If you are unfamiliar with the command line in Linux, then try reviewing some Linux tutorials.
For now, try to get comfortable with moving to different directories on the filesystem, creating and editing files, and using the module commands from the software pages.
Software not already installed on OSC systems can be installed locally to one's home directory without admin privileges. Try reviewing locally installing software at OSC.
After getting set up at OSC and understanding the use of interactive sessions, one should start looking into how to utilize the batch system to have their software run programmatically.
The benefits of the batch system are that a user can submit what we call a job (a request to reserve resources) and have the job execute from start to finish without any interaction by the user.
A good place to start is by reviewing job scripts.
OnDemand provides a convenient method for editing and submitting jobs in the job composer.
It can be used by logging into ondemand.osc.edu and clicking Jobs at the top and then Job Composer. A short help message should be shown on basic usage.
OSC offers perioding training both at our facility and at universities across the state on a variety of topics. Additionally, we will partner with other organizations to enable our users to access additional training resources.
We are currently in the process of updating our training strategy and documents. If you are interested in having us come to your campus to provide training, please contact OSC Help. You can also contact us if there is a specific training need you would like to see us address.
To get an introduction to HPC, see our HPC Basics page.
To learn more about using the command line, see our UNIX Basics page.
For detailed instructions on how to perform tasks on our systems, check out HOWTO articles.
Before contacting OSC Help, please check to see if your question is answered in either the FAQ or the Knowledge Base. Many of the questions asked by both new and experienced OSC users are answered on these web pages.
If you still cannot solve your problem, please do not hesitate to contact OSC Help:
Toll-Free: (800) 686-6472
Local: (614) 292-1800
Email: oschelp@osc.edu
Submit your issue online
Schedule virtual consultation
Basic and advanced support is available Monday through Friday, 9 a.m.–5 p.m., except OSU holidays.
We recommend following HPCNotices on Twitter to get up-to-the-minute information on system outages and important operations-related updates.
HPC, or High Performance Computing, generally refers to aggregating computing resources together in order to perform more computing operations at once.
Using HPC is a little different from running programs on your desktop. When you login you’ll be connected to one of the system’s “login nodes”. These nodes serve as a staging area for you to marshal your data and submit jobs to the batch scheduler. Your job will then wait in a queue along with other researchers' jobs. Once the resources it requires become available, the batch scheduler will then run your job on a subset of our hundreds of “compute nodes”. You can see the overall structure in the diagram below.
An important point about the diagram above is that OSC clusters are a collection of shared, finite resources. When you connect to the login nodes, you are sharing their resources (CPU cycles, memory, disk space, network bandwidth, etc.) with a few dozen other researchers. The same is true of the file servers when you access your home or project directories, and can even be true of the compute nodes.
For most day-to-day activities you should not have to worry about this, and we take precautions to limit the impact that others might have on your experience. That said, there are a few use cases that are worth watching out for:
The login nodes should only be used for light computation; any CPU- or memory-intensive operations should be done using the batch system. A good rule of thumb is that if you wouldn't want to run a task on your personal desktop because it would slow down other applications, you shouldn't run it on the login nodes. (See also: Interactive Jobs.)
I/O-intensive jobs should copy their files to fast, temporary storage, such as the local storage allocated to jobs or the Scratch parallel filesystem.
When running memory-intensive or potentially unstable jobs, we highly recommend requesting whole nodes. By doing so you prevent other users jobs from being impacted by your job.
If you request partial nodes, be sure to consider the amount of memory available per core. (See: HPC Hardware.) If you need more memory, request more cores. It is perfectly acceptable to leave cores idle in this situation; memory is just as valuable a resource as processors.
In general, we just encourage our users to remember that what you do may affect other researchers on the system. If you think something you want to do or try might interfere with the work of others, we highly recommend that you contact us at oschelp@osc.edu.
There are two ways to connect to our systems. The traditional way will require you to install some software locally on your machine, including an SSH client, SFTP client, and optionally an X Windows server. The alternative is to use our zero-client web portal, OnDemand.
OnDemand is our "one stop shop" for access to our High Performance Computing resources. With OnDemand, you can upload and download files, create, edit, submit, and monitor jobs, run GUI applications, and connect via SSH, all via a web broswer, with no client software to install and configure.
You can access OnDemand by pointing a web browser to ondemand.osc.edu. Documentation is available here. Any newer version of a common web brower should be sufficient to connect.
In order to use our systems, you'll need two main pieces of software: an SFTP client and an SSH client.
SFTP ("SSH File Transfer Protocol") clients allow you transfer files between your workstation and our shared filesystem in a secure manner. We recommend the following applications:
SSH ("Secure Shell") clients allow you to open a command-line-based "terminal session" with our clusters. We recommend the following options:
A third, optional piece of software you might want to install is an X Windows server, which will be necessary if you want to run graphical, windowed applications like MATLAB. We recommend the following X Windows servers:
In addition, for Windows users, you can use OSC Connect, which is a native windows application developed by OSC to provide a launcher for secure file transfer, VNC, terminal, and web based services, as well as preconfigured management of secure tunnel connections. See this page for more information on OSC Connect.
The primary way you'll interact with the OSC clusters is through the SSH terminal. See our supercomputing environments for the hostnames of our current clusters. You should not need to do anything special beyond entering the hostname.
Once you've established an SSH connection, you will be presented with some informational text about the cluster you've connected to followed by a UNIX command prompt. For a brief discussion of UNIX command prompts and what you can do with them, see the next section of this guide.
To transfer files, use your preferred SFTP client to connect to:
sftp.osc.edu
You may see warning message including SSH key fingerprint. Verify that the fingerprint in the message matches one of the SSH key fingerprint listed here, then type yes.
Since process times are limited on the login nodes, trying to transfer large files directly to pitzer.osc.edu or other login nodes may terminate partway through. The sftp.osc.edu is specially configured to avoid this issue, and so we recommend it for all your file transfers.
See our Firewall and Proxy Settings page for information on how to configure your firewall to allow connection to and from OSC.
With an X Windows server you will be able to run graphical applications on our clusters that display on your workstation. To do this, you will need to launch your X Windows server before connecting to our systems. Then, when setting up your SSH connection, you will need to be sure to enable "X11 Forwarding".
For users of the command-line ssh
client, you can do this by adding the "-X
" option. For example, the below will connect to the Pitzer cluster with X11 forwarding:
$ ssh -X username@pitzer.osc.edu
If you are connecting with PuTTY, the checkbox to enable X11 forwarding can be found in the connections pane under "Connections → SSH → X11".
For other SSH clients, consult their documentation to determine how to enable X11 forwarding.
The Ohio Supercomputer Center provides services to clients from a variety of types of organizations. The methods for gaining access to the systems are different between Ohio academic institutions and everyone else.
Primarily, our users are Ohio-based and academic, and the vast majority of our resources will continue to be consumed by Ohio-based academic users. See the "Ohio Academic Fee Model FAQ" section on our service costs webpage.
Other users (business, non-Ohio academic, nonprofit, hospital, etc.) interested in using Center resources may purchase services at a set rate available on our price list. Expert consulting support is also available.
For users interested in gaining access to larger resources, please contact OSC Help. We can assist you in applying for resources at an NSF or XSEDE site.
Once a project has been created, the PI can create accounts for users by adding them through the client portal. Existing users can also be added. More information can be found on the Project Menu documentation page.
If an academic PI wants a new project or to update the budget balance on an existing project(s), please see our creating projects and budget documentation.
We provide special classroom projects for this purpose and at no cost. You may use the client portal after creating an account. The request will need to include a syllabus or a similar document.
Please contact us in order to discuss options for using OSC resources.
Use of computing resources and services at OSC is subject to the Ohio Supercomputer Center (OSC) Code of Ethics for Academic Users. Ohio Academic Clients are eligible for highly subsidized access to OSC resources, with fees only accruing after a credit provided is exhausted. Clients from an Ohio academic institution that expect to use more than the credit should consult with their institution on the proper guidance for requesting approval to be charged for usage. See the academic fee structure FAQ page for more information.
Eligible principal investigators (PIs) at Ohio academic institutions are able to request projects at OSC, but should also consult with their institution before incurring charges. In order to be an eligible PI at OSC, you must be eligible to hold PI status at your college, university, or research organization administered by an Ohio academic institution (i.e., be a full-time, permanent academic researcher or tenure-track faculty member at an Ohio college or university). Students, post-doctoral fellows, visiting scientists, and others who wish to use the facilities may be authorized users on projects headed by an eligible PI. Once a PI has received their project information, he/she can manage users for the project. Principal Investigators of OSC projects are responsible for updating their authorized user list, their outside funding sources, and their publications and presentations that cite OSC. All of these tasks can be accomplished by contacting using the client portal. Please review the documentation for more information. PIs are also responsible for monitoring their project's budget (balance) and for requesting a new budget (balance) before going negative, as projects with negative balances are restricted.
OSC's online project requests through our Client Portal is part of an electronic system that leads you through the process step by step. Before you begin to fill in the application form, especially if you are new to the process, look at the academic fee structure page. You can save a partially completed project request for later use.
If you need assistance, please contact OSC Help.
Researchers from businesses, non-Ohio academic, nonprofits, hospitals or other organizations (which do not need to be based in Ohio) who wish to use the OSC's resources should complete the Other Client request form available here. All clients not affiliated with and approved by an Ohio academic institution must sign a service agreement, provide a $500 deposit, and pay for resource usage per a standard price list.
OSC will provide a letter of commitment users can include with their account proposals for outside funding, such as from the Department of Energy, National Institutes of Health, National Science Foundation (limited to standard text, per NSF policy), etc. This letter details OSC's commitment to supporting research efforts of its users and the facilities and platforms we provide our users. [Note: This letter does not waive the normal OSC budget process; it merely states that OSC is willing to support such research.] The information users must provide for the letter is:
Send e-mail with your request for the commitment letter to OSC Help or submit online. We will prepare a draft for your approval and then we will send you the final PDF for your proposal submission. Please allow at least two working days for this service.
Letters of support may be subject to strict and specific guidelines, and may not be accepted by your funding agency.
If you need a letter of support, please see above "Letter of Commitment for Outside Funding Proposals".
Researchers requiring additional computing resources should consider applying for allocations at National Science Foundation Centers. For more information, please write to oschelp@osc.edu, and your inquiry will be directed to the appropriate staff member.:
We require you cite OSC in any publications or reports that result from projects supported by our services.
OSC HPC resources use an operating system called "Linux", which is a UNIX-based operating system, first released on 5 October 1991. Linux is by a wide margin the most popular operating system choice for supercomputing, with over 90% of the Top 500 list running some variant of it. In fact, many common devices run Linux variant operating systems, including game consoles, tablets, routers, and even Android-based smartphones.
While Linux supports desktop graphical user interface configurations (as does OSC) in most cases, file manipulation will be done via the command line. Since all jobs run in batch will be non-interactive, they by definition will not allow the use of GUIs. Thus, we strongly suggest new users become comfortable with basic command-line operations, so that they can learn to write scripts to submit to the scheduler that will behave as intended. We have provided some tutorials explaining basics from moving about the file system, to extracting archives, to modifying your environment, that are available for self-paced learning.
This tutorial teaches you about the linux command line and shows you some useful commands. It also shows you how to get help in linux by using the man and apropos commands.
This tutorial guides you through the process of creating and submitting a batch script on one of our compute clusters. This is a linux tutorial which uses batch scripting as an example, not a tutorial on writing batch scripts. The primary goal is not to teach you about batch scripting, but for you to become familiar with certain linux commands that can be used either in a batch script or at the command line. There are other pages on the OSC web site that go into the details of submitting a job with a batch script.
This tutorial shows you some handy time-saving shortcuts in linux. Once you have a good understanding of how the command line works, you will want to learn how to work more efficiently.
This tutorial shows you how to download tar (tape archive) files from the internet and how to deal with large directory trees of files.
This tutorial teaches you about the linux command line and shows you some useful commands. It also shows you how to get help in linux by using the man
and apropos
commands.
For more training and practice using the command line, you can find many great tutorials. Here are a few:
https://www.learnenough.com/command-line-tutorial
https://cvw.cac.cornell.edu/Linux/
http://www.ee.surrey.ac.uk/Teaching/Unix/
https://www.udacity.com/course/linux-command-line-basics--ud595
More Advanced:
http://moo.nac.uci.edu/~hjm/How_Programs_Work_On_Linux.html
None.
Unix is an operating system that comes with several application programs. Other examples of operating systems are Microsoft Windows, Apple OS and Google's Android. An operating system is the program running on a computer (or a smartphone) that allows the user to interact with the machine -- to manage files and folders, perform queries and launch applications. In graphical operating systems, like Windows, you interact with the machine mainly with the mouse. You click on icons or make selections from the menus. The Unix that runs on OSC clusters gives you a command line interface. That is, the way you tell the operating system what you want to do is by typing a command at the prompt and hitting return. To create a new folder you type mkdir
. To copy a file from one folder to another, you type cp
. And to launch an application program, say the editor emacs
, you type the name of the application. While this may seem old-fashioned, you will find that once you master some simple concepts and commands you are able to do what you need to do efficiently and that you have enough flexibility to customize the processes that you use on OSC clusters to suit your needs.
What are some common tasks you will perform on OSC clusters? Probably the most common scenario is that you want to run some of the software we have installed on our clusters. You may have your own input files that will be processed by an application program. The application may generate output files which you need to organize. You will probably have to create a job script so that you can execute the application in batch mode. To perform these tasks, you need to develop a few different skills. Another possibility is that you are not just a user of the software installed on our clusters but a developer of your own software -- or maybe you are making some modifications to an application program so you need to be able to build the modified version and run it. In this scenario you need many of the same skills plus some others. This tutorial shows you the basics of working with the Unix command line. Other tutorials go into more depth to help you learn more advanced skills.
You can think of Unix as consisting of two parts -- the kernel and the shell. The kernel is the guts of the Unix operating system -- the core software running on a machine that performs the infrastructure tasks like making sure multiple users can work at the same time. You don't need to know anything about the kernel for the purposes of this tutorial. The shell is the program that interprets the commands you enter at the command prompt. There are several different flavors of Unix shells -- Bourne, Korn, Cshell, TCshell and Bash. There are some differences in how you do things in the different shells, but they are not major and they shouldn't show up in this tutorial. However, in the interest of simplicity, this tutorial will assume you are using the Bash shell. This is the default shell for OSC users. Unless you do something to change that, you will be running the Bash shell when you log onto Owens or Pitzer.
The first thing you need to do is log onto one of the OSC clusters, Owens or Pitzer. If you do not know how to do this, you can find help at the OSC home page. If you are connecting from a Windows system, you need to download and setup the OSC Starter Kit which you can find here. If you are connecting from a Mac or Linux system, you will use ssh. To get more information about using ssh, go to the OSC home page, hold your cursor over the "Supercomputing" menu in the main blue menu bar and select "FAQ." This should help you get started. Once you are logged in look for the last thing displayed in the terminal window. It should be something like
-bash-3.2$
with a block cursor after it. This is the command prompt -- it's where you will see the commands you type in echoed to the screen. In this tutorial, we will abbreviate the command prompt with just the dollar sign - $. The first thing you will want to know is how to log off. You can log off of the cluster by typing "exit
" then typing the <Enter> key at the command prompt:
$ exit <Enter>
For the rest of this tutorial, when commands are shown, the <Enter> will be omitted, but you must always enter <Enter> to tell the shell to execute the command you just typed.
So let's try typing a few commands at the prompt (remember to type the <Enter> key after the command):
$ date $ cal $ finger $ who $ whoami $ finger -l
That last command is finger followed by a space then a minus sign then the lower case L. Is it obvious what these commands do? Shortly you will learn how to get information about what each command does and how you can make it behave in different ways. You should notice the difference between "finger
" and "finger -l
" -- these two commands seem to do similar things (they give information about the users who are logged in to the system) but they print the information in different formats. try the two commands again and examine the output. Note that you can use the scroll bar on your terminal window to look at text that has scrolled off the screen.
The "man
" command is how you find out information about what a command does. Type the following command:
$ man
It's kind of a smart-alecky answer you get back, but at least you learn that "man
" is short for "manual" and that the purpose is to print the manual page for a command. Before we start looking at manual pages, you need to know something about the way Unix displays them. It does not just print the manual page and return you to the command prompt -- it puts you into a mode where you are interactively viewing the manual page. At the bottom of the page you should see a colon (:) instead of the usual command prompt (-bash-3.2$). You can move around in the man page by typing things at the colon. To exit the man page, you need to type a "q" followed by <Enter>. So try that first. Type
$ man finger
then at the colon of the man page type
: q
You do not have to type <Enter> after the "q" (this is different from the shell prompt.) You should be back at the shell prompt now. Now let's go through the man page a bit. Once again, type
$ man finger
Now instead of just quitting, let's look at the contents of the man page. The entire man page is probably not displayed in your terminal. To scroll up or down, use the arrow keys or the <Page Up> and <Page Down> keys of the keyboard. The <Enter> and <Space> keys also scroll. Remember that "q" will quit out of the man page and get you back to the shell prompt.
The first thing you see is a section with the heading "NAME" which displays the name of the command and a short summary of what it does. Then there is a section called "SYNOPSIS" which shows the syntax of the command. In this case you should see
SYNOPSIS finger [-lmsp] [user ...] [user@host ...]
Remember how "finger
" and "finger -l
" gave different output? The [-lmsp] tells you that you can use one of those four letters as a command option -- i.e., a way of modifying the way the command works. In the "DESCRIPTION" section of the man page you will see a longer description of the command and an explanation of the options. Anything shown in the command synopsis which is contained within square brackets ([ ]) is optional. That's why it is ok to type "finger" with no options and no user. What about "user" -- what is that? To see what that means, quit out of the man page and type the following at the command prompt:
$ whoami
Let's say your username is osu0000. Then the result of the "whoami
" command is osu0000. Now enter the following command (but replace osu0000 with your username):
$ finger osu0000
You should get information about yourself and no other users. You can also enter any of the usernames that are output when you enter the "finger
" command by itself. The user names are in the leftmost column of output. Now try
$ finger -l osu0000 $ finger -lp osu0000 $ finger -s osu0000 osu0001
For the last command, use your username and the username of some other username that shows up in the output of the "finger
" command with no arguments.
Note that a unix command consists of three parts:
You don't necessarily have to enter an argument (as you saw with the "finger" command) but sometimes a command makes no sense without an argument so you must enter one -- you saw this with the "man" command. Try typing
$ man man
and looking briefly at the output. One thing to notice is the synopsis -- there are a lot of possible options for the "man" command, but the last thing shown in the command synopsis is "name ..." -- notice that "name" is not contained in square brackets. This is because it is not optional -- you must enter at least one name. What happens if you enter two names?
$ man man finger
The first thing that happens is you get the man page for the "man
" command. What happens when you quit out of the man page? You should now get the man page for the "finger
" command. If you quit out of this one you will be back at the shell prompt.
You can "pipe" the output of one command to another. First, let's learn about the "more
" command:
$ man more
Read the "DESCRIPTION" section -- it says that more is used to page through text that doesn't fit on one screen. It also recommends that the "less
" command is more powerful. Ok, so let's learn about the "less
" command:
$ man less
You see from the description that "less
" also allows you to examine text one screenful at a time. Does this sound familiar? The "man" command actually uses the "less" command to display its output. But you can use the "less
" command yourself. If you have a long text file named "foo.txt" you could type
$ less foo.txt
and you would be able to examine the contents of the file one screen at a time. But you can also use "less" to help you look at the output of a command that prints more than one screenful of output. Try this:
$ finger | less
That's "finger
" followed by a space followed by the vertical bar (shifted backslash on most keyboards) followed by a space followed by "less
" followed by <Enter>. You should now be looking at the output of the "finger
" command in an interactive fashion, just as you were looking at man pages. Remember, to scroll use the arrow keys, the <Page Up> and <Page Down> keys, the <Enter> key or the space bar; and to quit, type "q
".
Now try the following (but remember to replace "osu0000" with your actual username):
$ finger | grep osu0000
The "grep
" command is Unix's command for searching. Here you are telling Unix to search the output of the "finger
" command for the text "osu0000" (or whatever your username is.)
If you try to pipe the output of one command to a second command and the second is a command which works with no arguments, you won't get what you expect. Try
$ whoami | finger
You see that it does not give the same output as
$ finger osu0000
(assuming "whoami
" returns osu0000.)
In this case what you can do is the following:
$ finger `whoami`
That's "finger
" space backquote "whoami
" backquote. The backquote key is to the left of the number 1 key on a standard keyboard.
Enter the following command:
$ man apropos
As you can see, the apropos searches descriptions of commands and finds commands whose descriptions match the keyword you entered as the argument. That means it outputs a list of commands that have something to do with the keyword you entered. Try this
$ apropos
Ok, you need to enter an argument for the "apropos
" command.
So try
$ apropos calendar
Now you see that among the results are two commands -- "cal
" and "difftime
" that have something to do with the keyword "calendar
."
This tutorial guides you through the process of creating and submitting a batch script on one of our compute clusters. This is a linux tutorial which uses batch scripting as an example, not a tutorial on writing batch scripts. The primary goal is not to teach you about batch scripting, but for you to become familiar with certain linux commands. There are other pages on the OSC web site that go into the details of submitting a job with a batch script.
When you first log in to our clusters, you are in your home directory. For the purposes of this illustration, we will pretend you are user osu0001 and your project code is PRJ0001, but when you try out commands you must use your own username and project code.
$ pwd /users/PRJ0001/osu0001
$ touch foo1 $ touch foo2 $ ls $ ls -l $ ls -lt $ ls -ltr
touch
" command just creates an empty file with the name you give it.-l
", "-lt
" or "-ltr
"? You noticed the difference in the output between just the "ls
" command and the "ls -l
" command.-
" (minus sign) followed by a single letter. "ls -ltr
" is actually specifying three options to the ls
command.l
: I want to see the output in long format -- one file per line with some interesting information about each filet
: sort the display of files by when they were last modified, most-recently modified firstr
: reverse the order of display (combined with -t this displays the most-recently modified file last -- it should be BatchTutorial in this case.)ls -ltr
" because I find it convenient to see the most recently modified file at the end of the list.$ mkdir BatchTutorial $ ls -ltr
mkdir
" command makes a new directory with the name you give it. This is a subfolder of the current working directory. The current working directory is where your current focus is in the hierarchy of directories. The 'pwd
' command shows you are in your home directory:$ pwd /users/PRJ0001/osu0001
$ cd BatchTutorial $ pwd
pwd
' now? "cd
" is short for "change directory" -- think of it as moving you into a different place in the hierarchy of directories. Now do$ cd .. $ pwd
Try the following:
$ echo where am I? $ echo I am in `pwd` $ echo my home directory is $HOME $ echo HOME $ echo this directory contains `ls -l`
These examples show what the echo
command does and how to do some interesting things with it. The `pwd`
means the result of issuing the command pwd. HOME is an example of an environment variable. These are strings that stand for other strings. HOME is defined when you log in to a unix system. $HOME
means the string the variable HOME stands for. Notice that the result of "echo HOME
" does not do the substitution. Also notice that the last example shows things don't always get formatted the way you would like.
Some more commands to try:
$ cal $ cal > foo3 $ cat foo3 $ whoami $ date
Using the ">
" after a command puts the output of the command into a file with the name you specify. The "cat
" command prints the contents of a file to the screen.
Two very important UNIX commands are the cp
and mv
commands. Assume you have a file called foo3 in your current directory created by the "cal > foo3
" command. Suppose you want to make a copy of foo3 called foo4. You would do this with the following command:
$ cp foo3 foo4 $ ls -ltr
Now suppose you want to rename the file 'foo4' to 'foo5'. You do this with:
$ mv foo4 foo5 $ ls -ltr
'mv
' is short for 'move' and it is used for renaming files. It can also be used to move a file to a different directory.
$ mkdir CalDir $ mv foo5 CalDir $ ls $ ls CalDir
Notice that if you give a directory with the "ls
" command is shows you what is in that directory rather than the current working directory.
Now try the following:
$ ls CalDir $ cd CalDir $ ls $ cd .. $ cp foo3 CalDir $ ls CalDir
Notice that you can use the "cp
" command to copy a file to a different directory -- the copy will have the same name as the original file. What if you forget to do the mkdir
first?
$ cp foo3 FooDir
Now what happens when you do the following:
$ ls FooDir $ cd FooDir $ cat CalDir $ cat FooDir $ ls -ltr
CalDir is a directory, but FooDir is a regular file. You can tell this by the "d" that shows up in the string of letters when you do the "ls -ltr
". That's what happens when you try to cp or mv a file to a directory that doesn't exist -- a file gets created with the target name. You can imagine a scenario in which you run a program and want to copy the resulting files to a directory called Output but you forget to create the directory first -- this is a fairly common mistake.
Before we move on to creating a batch script, you need to know more about environment variables. An environment variable is a word that stands for some other text. We have already seen an example of this with the variable HOME. Try this:
$ MY_ENV_VAR="something I would rather not type over and over" $ echo MY_ENV_VAR $ echo $MY_ENV_VAR $ echo "MY_ENV_VAR stands for $MY_ENV_VAR"
You define an environment variable by assigning some text to it with the equals sign. That's what the first line above does. When you use '$
' followed by the name of your environment variable in a command line, UNIX makes the substitution. If you forget the '$
' the substitution will not be made.
There are some environment variables that come pre-defined when you log in. Try using 'echo
' to see the values of the following variables: HOME, HOSTNAME, SHELL, TERM, PATH.
Now you are ready to use some of this unix knowledge to create and run a script.
Before we create a batch script and submit it to a compute node, we will do something a bit simpler. We will create a regular script file that will be run on the login node. A script is just a file that consists of unix commands that will run when you execute the script file. It is a way of gathering together a bunch of commands that you want to execute all at once. You can do some very powerful things with scripting to automate tasks that are tedious to do by hand, but we are just going to create a script that contains a few commands we could easily type in. This is to help you understand what is happening when you submit a batch script to run on a compute node.
Use a text editor to create a file named "tutorial.sh" which contains the following text (note that with emacs or nano you can use the mouse to select text and then paste it into the editor with the middle mouse button):
$ nano tutorial.sh
echo ---- echo Job started at `date` echo ---- echo This job is working on node `hostname` SH_WORKDIR=`pwd` echo working directory is $SH_WORKDIR echo ---- echo The contents of $SH_WORKDIR ls -ltr echo echo ---- echo echo creating a file in SH_WORKDIR whoami > whoami-sh-workdir SH_TMPDIR=${SH_WORKDIR}/sh-temp mkdir $SH_TMPDIR cd $SH_TMPDIR echo ---- echo TMPDIR IS `pwd` echo ---- echo wait for 12 seconds sleep 12 echo ---- echo creating a file in SH_TMPDIR whoami > whoami-sh-tmpdir # copy the file back to the output subdirectory cp ${SH_TMPDIR}/whoami-sh-tmpdir ${SH_WORKDIR}/output cd $SH_WORKDIR echo ---- echo Job ended at `date`
To run it:
$ chmod u+x tutorial.sh $ ./tutorial.sh
Look at the output created on the screen and the changes in your directory to see what the script did.
Use your favorite text editor to create a file called tutorial.pbs in the BatchTutorial directory which has the following contents (remember, you can use the mouse to cut and paste text):
#PBS -l walltime=00:02:00 #PBS -l nodes=1:ppn=1 #PBS -N foobar #PBS -j oe #PBS -r n echo ---- echo Job started at `date` echo ---- echo This job is working on compute node `cat $PBS_NODEFILE` cd $PBS_O_WORKDIR echo show what PBS_O_WORKDIR is echo PBS_O_WORKDIR IS `pwd` echo ---- echo The contents of PBS_O_WORKDIR: ls -ltr echo echo ---- echo echo creating a file in PBS_O_WORKDIR whoami > whoami-pbs-o-workdir cd $TMPDIR echo ---- echo TMPDIR IS `pwd` echo ---- echo wait for 42 seconds sleep 42 echo ---- echo creating a file in TMPDIR whoami > whoami-tmpdir # copy the file back to the output subdirectory pbsdcp -g $TMPDIR/whoami-tmpdir $PBS_O_WORKDIR/output echo ---- echo Job ended at `date`
$ qsub tutorial.pbs
qstat -u [username]
to check on the progress of your job. If you see something like this$ qstat -u osu0001 Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------------ ----------- -------- ---------------- ------ ----- ------ ------ ----- - ----- 458842.oak-batch osu0001 serial foobar -- 1 1 -- 00:02 Q --
If you see something like this: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------------ ----------- -------- ---------------- ------ ----- ------ ------ ----- - ----- 458842.oak-batch osu0001 serial foobar 26276 1 1 -- 00:02 R --
qstat
command is empty, the job is done.$ ls -ltr $ cat foobar.oNNNNNN
Where (NNNNNN is your job id).
The name of the script file (tutorial.pbs) has nothing to do with the name of the output file.
Examine the contents of the output file foobar.oNNNNNN carefully. You should be able to see the results of some of the commands you put in tutorial.pbs. It also shows you the values of the variables PBS_NODEFILE, PBS_O_WORKDIR and TMPDIR. These variables exist only while your job is running. Try
$ echo $PBS_O_WORKDIR
and you will see it is no longer defined. $PBS_NODEFILE
is a file which contains a list of all the nodes your job is running on. Because this script has the line
#PBS -l nodes=1:ppn=1
the contents of $PBS_NODEFILE
is the name of a single compute node.
Notice that $TMPDIR
is /tmp/pbstmp.NNNNNN (again, NNNNNN is the id number for this job.) Try
$ ls /tmp/pbstmp.NNNNNN
Why doesn't this directory exist? Because it is a directory on the compute node, not on the login node. Each machine in the cluster has its own /tmp directory and they do not contain the same files and subdirectories. The /users directories are shared by all the nodes (login or compute) but each node has its own /tmp directory (as well as other unshared directories.)
Start off with the following:
$ mkdir TarTutorial $ cd TarTutorial $ wget http://www.mmm.ucar.edu/wrf/src/WRFDAV3.1.tar.gz $ ls -ltr
The third command will take a while because it is downloading a file from the internet. The file is call a "tarball" or a "gzipped tarball". TAR is an old unix short name for "tape archive" but a tar file is a file that contains a bunch of other files. If you have to move a bunch of files from one place to another, a good way to do it is to pack them into a tar file, move the tar file where you want it then unpack the files at the destination. A tar file usually has the extension ".tar". What about the ".gz"? This means the tar file has been further compressed with the program gzip
-- this makes it a lot smaller.
After step 1 your working directory should be ~/TarTutorial and there should be a file called WRFDAV3.1.tar.gz in it.
Now do this:
$ gunzip WRFDAV3.1.tar.gz $ ls -ltr
You should now have a file called WRFDAV3.1.tar which should be quite a bit larger in size than WRFDAV3.1.tar.gz -- this is because it has been uncompressed by the "gunzip
" command which is the opposite of the "gzip
" command.
Now do this:
$ tar -xvf WRFDAV3.1.tar $ ls -ltr
You should see a lot of filenames go by on the screen and when the first command is done and you issue the ls command you should see two things -- WRFDAV3.1.tar is still there but there is also a directory called WRFDA. You can look at the contents of this directory and navigate around in the directory tree to see what is in there. The options on the "tar
" command have the following meanings (you can do a "man tar" to get all the options):
x
: extract the contents of the tar file
v
: be verbose, i.e. show what is happening on the screen
f
: the name of the file which follows the "f" option is the tar file to expand.
Another thing you can do is see how much space is being taken up by the files. Make sure TarTutorial is your working directory then issue the following command:
$ du .
Remember that ".
" (dot) means the current working directory. The "du
" command means "disk usage" -- it shows you how much space is being used by every file and directory in the directory tree. It ends up with the highest level files and directories. You might prefer to do
$ du -h . $ ls -ltrh
Adding the "-h
" option to these commands puts the file sizes in human-readable format -- you should get a size of 66M for the tar file -- that's 66 megabytes -- and "du
" should print a size of 77M next to ./WRFDA.
Now, make your own tar file from the WRFDA directory tree:
$ tar -cf mywrf.tar WRFDA $ ls -ltrh
You have created a tar from all the files in the WRFDA directory. The options given to the "tar
" command have the following meanings:
c
: create a tar file
f
: give it the name which follows the "f" option
The files WRFDAV3.1.tar and mywrf.tar are identical. Now compress the tar file you made:
$ gzip mywrf.tar $ ls -ltrh
You should see a file called mywrf.tar.gz which is smaller than WRFDAV3.1.tar.
You don't want to leave all these files lying around. So delete them
$ rm WRFDAV3.1.tar $ rm mywrf.tar $ rm WRFDA
Oops! You can't remove the directory. You need to use the "rmdir
" command:
$ rmdir WRFDA
Oh no! That doesn't work on a directory that's not empty. So are you stuck with all those files? Maybe you can do this:
$ cd WRFDA $ rm * $ cd .. $ rmdir WRFDA
That won't work either because there are some subdirectories in WRFDA and "rm *
" won't remove them. Do you have to work your way to the all the leaves at the bottom of the directory tree and remove files then come back up and remove directories? No, there is a simpler way:
$ rm -Rf WRFDA
This will get rid of the entire directory tree. The options have the following meanings:
R
: recursively remove all files and directories
f
: force; i.e., just remove everything without asking for confirmation
I encourage you to do
$ man rm
and check out all the options. Or some of them -- there are quite a few.
This tutorial shows you some handy time-saving shortcuts in linux. Once you have a good understanding of how the command line works, you will want to learn how to work more efficiently.
Linux command line fundamentals.
Note: even if you know how to use the up arrow in linux, you need to enter the commands in this section because they are used in the following sections. So to begin this tutorial, go to your home directory and create a new directory called ShortCuts:
$ cd $ mkdir Shortcuts $ cd Shortcuts
(If a directory or file named "Shortcuts" already exists, name it something else.)
Imagine typing in a long linux command and making a typo. This is one of the frustrating things about a command line interface -- you have to retype the command, correcting the typo this time. Or what if you have to type several similar commands -- wouldn't it be nice to have a way to recall a previous command, make a few changes, and enter the new command? This is what the up arrow is for.
Try the following:
$ cd .. $ cd ShortCuts (type a capital C)
Linux should tell you there is no directory with that name.
Now type the up arrow key -- the previous command you entered shows up on the command line, and you can use the left arrow to move the cursor just after the capital C, hit Backspace, and type a lower case c. Note you can also position the cursor before the capital C and hit Delete to get rid of it.
Once you have changed the capital C to a lower case c you can hit Return to enter the command -- you do not have to move the cursor to the end of the line.
Now hit the up arrow key a few times, then hit the down arrow key and notice what happens. Play around with this until you get a good feel for what is happening.
Linux maintains a history of commands you have entered. Using the up and down arrow keys, you can recall previously-entered commands to the command line, edit them and re-issue them.
Note that in addition to the left and right arrow keys you can use the Home and End keys to move to the beginning or end of the command line. Also, if you hold down the Ctrl key when you type an arrow key, the cursor will move by an entire word instead of a single character -- this is useful is many situations and works in many editors.
Let's use this to create a directory hierarchy and a few files. Start in the Shortcuts directory and enter the following commands, using the arrow keys to simplify your job:
$ mkdir directory1 $ mkdir directory1/directory2 $ mkdir directory1/directory2/directory3 $ cd directory1/directory2/diectoryr3 (remember the Home key and the Ctrl key with left and right arrows) $ hostname > file1 $ whoami > file2 $ mkdir directory4 $ cal > directory4/file3
Linux has short, cryptic command names to save you typing -- but it is still a command line interface, and that means you interact with the operating system by typing in commands. File names can be long, directory hierarchies can be deep, and this can mean you have to type a lot to specify the file you want or change to current working directory. Not only that, but you have to remember the names of files and directories you type in. The TAB key gives you a way to enter with commands with less typing and less memorization.
Go back to the Shortcuts directory:
$ cd $ cd Shortcuts
Now enter the following:
$ hostname > file1 $ cal > file2 $ whoami > different-file $ date > other-file $ cal > folio5
Now type the following, without hitting the Return key:
$ cat oth <Tab>
What happened? Linux completed the name "other-file" for you! The Tab key is your way of telling Linux to finish the current word you are typing, if possible. Because there is only one file in the directory whose name begins with "oth", when you hit the Tab key Linux is able to complete the name.
Hit Return (if you haven't already) to enter the cat command. Now try
$ cat d <Tab>
As you would expect, Linux completes the name "different-file"
What if you enter
$ cat fi <Tab>
Notice Linux completes as much of the name as possible. You can now enter a "1" or a "2" to finish it off.
But what if you forget what the options are? What if you can't remember if you created "file1" and "file2" or if you created "fileA" and fileB"?
With the comman line showing this:
$ cat file
hit the Tab key twice. Aha! Linux shows you the possible choices for completing the word.
Try
$ cat f <Tab>
The Tab will not add anything -- the command line will still read
$ cat f
Now type the letter o followed by a Tab -- once you add the o there is only one possible completion -- "folio".
Now enter the following:
$ cat directory1/directory2/directory3/directory4/file3
That's kind of a painful to type.
Now type the following without entering Return:
$ ls dir <Tab>
Nice! As you would expect, Linux completes the name of the directory for you. This is because there is only one file in the Shortcuts directory whose name begins with "dir"
Hit Return and Linux will tell you that directory1 contains directory2.
Now type this:
$ ls dir <Tab>
and before you hit return type another d followed by another Tab. Your command line should now look like this:
$ ls directory1/directory2/
If you hit Return, Linux will tell you that directory2 contains directory3.
Now try this:
$ ls dir <Tab>
then type another d followed by <Tab> then another d followed by tab. Don't hit Return yet. Your command line should look like this:
$ ls directory1/directory2/directory3/
Don't hit Return yet. Now type the letter f followed by a Tab. What do you think should happen?
Hitting the up arrow key is a nice way to recall previously-used commands, but it can get tedious if you are trying to recall a command you entered a while ago -- hitting the same key 30 times is a good way to make yourself feel like an automaton. Fortunately, linux offers a couple of other ways to recall previous commands that can be useful.
Go back to the Shortcuts directory
$ cd ~/Shortcuts
and enter the following:
$ hostname $ cal $ date $ whoami
Now enter this:
$ !c
and hit return.
What happened? Now try
$ !h
and hit return.
The exclamation point ("bang" to Americans, "shriek" to some Englishmen I've worked with) is a way of telling linux you want to recall the last command which matches the text you type after it. So "!c
" means recall the last command that starts with the letter c, the "cal
" command in this case. You can enter more than one character after the exclamation point in order to distinguish between commands. For example if you enter
$ cd ~/Shortcuts $ cat file1 $ cal $ !c
the last command will redo the "cal
" command. But if you enter
$ cat file1 $ cal $ !cat
the last command re-executes the "cat
" command.
One problem with using the exclamation point to recall a previous command is that you can feel blind -- you don't get any confirmation about exactly which command you are recalling until it has executed. Sometimes you just aren't sure what you need to type after the exclamation point to get the command you want.
Typing Ctrl-r (that's holding down the Ctrl key and typing a lower case r) is another way to repeat previous commands without having to type the whole command, and it's much more flexible than the bang. The "r" is for "reverse search" and what happens is this. After you type Ctrl-r, start typing the beginning of a previously entered command -- linux will search, in reverse order, for commands that match what you type. To see it in action, type in the following commands (but don't hit <Enter> after the last one):
$ cd ~/Shortcuts $ cat file1 $ cat folio5 $ cal $ Ctrl-r cat
You should see the following on your command line:
(reverse-i-search)`cat': cat folio5
Try playing with this now. Type in " fi" (that's a space, an "f" and an "i") -- did the command shown at the prompt change? Now hit backspace four times.
Now enter a right or left arrow key and you will find yourself editing the matching command. This is one you have to play around with a bit before you understand exactly what it is doing. So go ahead and play with it.
Now type
$ history
and hit return.
Cool, huh? You get to see all the commands you have entered (probably a maximum of 1000.) You can also do something like
$ history | grep cal
to get all the commands with the word "cal" in them. You can use the mouse to cut and paste a previous command, or you can recall it by number with the exclamation point:
$ !874
re-executes the command number 874 in your history.
For more information about what you can do to recall previous commands, check out http://www.thegeekstuff.com/2011/08/bash-history-expansion/
I am just including this because to me it is a fun piece of linux trivia. I don't find it particularly useful. Type
$ cat file1
and hit <Return>. Now hit the up arrow key to recall this command and hist the left arrow key twice so the cursor is on the "e" of "file1". Now hit Ctrl-t (again, hold down the control key and type a lower case t.) What just happened? Try hitting Ctrl-t a couple more times. That's right -- it transposes two characters in the command line -- the one the cursor is on and the one to its left. Also, it moves the cursor to the right. Frankly, it takes me more time to think about what is going to happen if I type Ctrl-t than it takes me to delete some characters and retype them in the correct order. But somewhere out there is a linux black belt who gets extra productivity out of this shortcut.
Another nice feature of linux is the alias command. If there is a command you enter a lot you can define a short name for it. For example, we have been typing "cat folio5
" a lot in this tutorial. You must be getting sick of typing "cat folio5
". So enter the following:
$ alias cf5='cat folio5'
Now type
$ cf5
and hit return. Nice -- you now have a personal shortcut for "cat folio5
". I use this for the ssh commands:
$ alias gogl='ssh -Y jeisenl@pitzer.osc.edu'
I put this in the .bash_aliases
file on my laptop so that it is always available to me.
Classroom projects will not be billed under the Ohio academic fee structure; all fees will be fully discounted at the time of billing.
Please submit a new project request for a classroom project. You will request a $500 budget. If an additional budget is needed or you want to re-use your project code, you can apply through MyOSC or contact us at OSCHelp. We require a class syllabus; this will be uploaded on the last screen before you submit the request.
During setup, OSC staff test accounts may be added to the project for troubleshooting purposes.
We suggest that students consider connecting to our OnDemand portal to access the HPC resources. All production supercomputing resources can be accessed via that website, without having to worry about client configuration. We have a guide for new students to help them figure out the basics of using OSC.
class.osc.edu
We currently have two production clusters, Pitzer, and Owens, with Nvidia GPUs available. All systems have "debug" queues, during typical business hours, that allow small jobs of less than 1 hour to start much quicker than they might otherwise.
If you need to reserve access to particular resources, please contact OSC Help, preferably with at least two weeks lead time, so that we can put in the required reservations to ensure resources are available during lab or class times.
We have a list of supported software, including sample batch scripts, in our documentation. If you have specific needs that we can help with, let OSC Help know.
If you are using Rstudio, please see this webpage.
If you are using Jupyter, please see the page Using Jupyter for Classroom.
Our classroom information guide will instruct you on how to get students added to your project using our client portal. For more information, see the documentation. You must also add your username as an authorized user.
We can provide you with project space to have the students submit assignments through our systems. Please ask about this service and see our how-to. We typically grant 1-5 TB for classroom projects.
Help can be found by contacting OSC Help weekdays, 9AM to 5PM (1-800-686-6472 or 614-292-1800).
Fill out a request online.
We update our web pages to show relevant events at the center (including training) and system notices on our main page (osc.edu). We also provide important information in the “message of the day” (visible when you log in). Furthermore, you can receive notices by following @HPCNotices on Twitter.
FAQ: http://www.osc.edu/supercomputing/faq
Main Supercomputing pages: http://www.osc.edu/supercomputing/
We provide the instructor (PI of the classroom project) with information on how to get students added to the classroom project. Usually, invites will be sent to new OSC users; existing OSC users will be automatically added to the project if they have an OSC account which matches the email information provided by the PI.
Another way to be added to a classroom project is to use the project code and a valid project access number provided by the PI. You can check our user management page for more information.
You can manage your OSC account with OSC client portal (my.osc.edu), including:
If your class uses a custom R or Jupyter environment at OSC, please connect to class.osc.edu
If you do not see your class there, we suggest connecting to ondemand.osc.edu
.
You can log into class.osc.edu
or ondemand.osc.edu
either using your OSC HPC Credentials or Third-Party Credentials. See this OnDemand page for more information.
There are a few different ways of transferring files between OSC storage and your local computer. We suggest using OnDemand File App if you are new to Linux and looking to transfer smaller-sized files - measured in MB to several hundred MB. For larger files, please use an SFTP client to connect to sftp.osc.edu
or Globus.
We have a guide for new users to help them figure out the basics of using OSC; included are basics on getting connected, HPC system structure, file transfers, and batch systems.
FAQ: http://www.osc.edu/supercomputing/faq
Main Supercomputing pages: http://www.osc.edu/supercomputing/
Help can be found by contacting OSC Help weekdays, 9AM to 5PM (1-800-686-6472 or 614-292-1800).
Fill out a request online.
OSC provide an isolated and custom Jupyter environment for each classroom project that requires Jupyter Notebook or JupyterLab.
The instructor must apply for a classroom project that is unique for the course. More details on the classroom project can be found in our classroom project guide. Once we get the information, we will provide you a project ID
and a course ID
(which is commonly the course ID provided by instructor + school code, e.g. MATH_2530_OU
). The instructor can set up a Jupyter environment for the course using the information (see below). The Jupyter environment will be tied to the project ID
.
The instructor can set up a Jupyter environment for the course once the project space is initialized:
project ID
and course ID
:~support/classroom/tools/setup_jupyter_classroom /fs/ess/project_ID course_ID
If the Jupyter environment is created successfully, please inform us so we can update you when your class is ready at class.osc.edu.
You may need to upgrade Jupyter kernels to the latest stable version for a security vulnerability or trying out new features. Please run upgrade script with the project ID
and course ID
:
~support/classroom/tools/upgrade_jupyter_classroom /fs/ess/project_ID course_ID
When your class is ready, launch your class session at class.osc.edu. Then open a notebook and use pip install
to install packages.
To enable or install nbextension, please use --sys-prefix
to install into the classroom Jupyter environment, e.g.
!jupyter contrib nbextension install --sys-prefix
Please do not use --user,
which install to your home directory and could mess up the Jupyter environment.
To install labextension, simply click Extension Manager icon at the side bar
By default this Jupyter environment is an isolated Python environment. Anyone launches python
from this environment can only access packages installed inside unless PYTHONPATH
is used. The instructor can change it by setting include-system-site-packages = true
in /fs/ess/project_ID/course_ID/jupyter/pyvenv.cfg
. This will allows students to access packages in home directory ~/.local/lib/pythonX.X/site-packages
,and install packages via pip install –user
When a class session starts, we create a classroom workspace under the instructor's and students' home space: $HOME/osc_classes/course_ID
, and launch Jupyter at the workspace. The root /
will appear in the landing page (Files) but everything can be found in $HOME/osc_classes/course_ID
on OSC system.
The instructor can upload class material to /fs/ess/project_ID/course_ID/materials
. When a student launch a Jupyter session, the diretory will be copied to the student's worksapce $HOME/osc_classes/course_ID
. The student will see the directory materials on the landing page. PI can add files to the material source directory. New files will be copied to the destination every time when a new Jupyter session starts. But If PI modifies existing files, the changes won't be copied as the files were copied before. Therefore we recommend renaming the file after the update so that it will be copied
The instructor and TAs can access a student's workspace with limited permissions. First, the instructor sends us a request with the information including the instructor's and TAs' OSC accounts. After a student launches a class session, you can access known files and directories in the student's workspace. For example, you cannot explore the student's workspace
ls /users/PZS1234/student1/osc_classes/course_ID ls: cannot open directory /users/PZS1234/student1/osc_classes/course_ID: Permission denied
but you can access a known file or directory in the workspace
ls /users/PZS1234/student1/osc_classes/course_ID/homework
OSC provides an isolated and custom R environment for each classroom project that requires Rstudio. The interface can be accessed at class.osc.edu
. Before using this interface, please apply for a classroom project account that is unique for the course. More details on the classroom project can be found here. The custom R environment for the course will be tied to this project ID. Please inform us if you have additional requirements for the class. Once we get the information, we will provide you a course_ID
(which is commonly the course ID provided by instructor + school code, e.g. MATH2530_OU
)and add your course to the server with the class module created using the course_ID
. After login to the class.osc.edu
server, you will see several Apps listed. Pick Rstudio server and that will take you to the Rstudio Job submission page. Please pick your course from the drop-down menu under the Class materials and the number of hours needed.
Clicking on the Launch
will submit the Rstudio job to the scheduler and you will see Connect to Rstudio server
option when the resource is ready for the job. Each Rstudio launch will run on 1 core on Owens machine with 4GB of memory.
Rstudio will open up in a new tab with a custom and isolated environment that is set through a container-based solution. This will create a folder under $HOME/osc_classes/course_ID
for each user. Please note that inside the Rstudio, you won't be able to access any files other than class materials. However, you can access the class directory outside of Rstudio to upload or download files.
You can quit a Rstudio session by clicking on File
from the top tabs then on the Quit
. This will only quit the session, but the resource you requested is still held until walltime limit is reached. To release the resource, please click on DELETE
in the Rstudio launch page.
PI can store and share materials like data, scripts, etc, and R packages with the class. We will set up a project space for the project ID of the course. This project space will be created under /fs/ess/project_ID
. Once the project space is ready, please login to Owens or Pitzer as PI account of the classroom project. Run the following script with the project ID
and course ID
. This will create a folder with the course_ID
under the project space and then two subfolders 1) Rpkgs 2) materials under it.
~support/classroom/tools/setup_rstudio_classroom /fs/ess/project_ID course_ID
Once the class module is ready, PI can access the course at class.osc.edu
under the Rstudio job submission page. PI can launch the course environment and install R packages for the class.
After launching Rstudio, please run the .libPaths()
as follows
> .libPaths() [1] "/users/PZS0680/soottikkal/osc_classes/OSCWORKSHOP/R" "/fs/ess/PZS0687/OSCWORKSHOP/Rpkgs" [3] "/usr/local/R/gnu/9.1/3.6.3/site/pkgs" "/usr/local/R/gnu/9.1/3.6.3/lib64/R/library"
Here you will see four R library paths. The last two are system R library paths and are accessible for all OSC users. OSC installs a number of popular R packages at the site location. You can check available packages with library()
command. The first path is a personal R library of each user in the course environment and is not shared with students. The second lib path is accessible to all students of the course(Eg: /fs/ess/PZS0687/OSCWORKSHOP/Rpkgs). PI should install R packages in this library to share with the class. As a precaution, it is a good idea to eliminate PI's personal R library from .libPaths()
before R package installation as follows. Please note that this step is needed to be done only when preparing course materials by PI.
> .libPaths(.libPaths()[-1]) > .libPaths() [1] "/fs/ess/PZS0687/OSCWORKSHOP/Rpkgs" "/usr/local/R/gnu/9.1/3.6.3/site/pkgs" [3] "/usr/local/R/gnu/9.1/3.6.3/lib64/R/library"
Now there is only one writable R library path such that all packages will be installed into this library path and shared for all users.
PI can install all packages required for the class using install.packages()
function. Once the installation is complete, students will have access to all those packages.
Please note that students can also install their own packages. Those packages will be installed into their personable library in the class environment i.e., the first path listed under .libPaths()
PI can share materials like data, scripts, and rmd files stored in /fs/ess/project_ID/course_ID/materials with students. When a student launch a Rstduio session, the directory will be copied to the student's workspace $HOME/osc_classes/courseID
(destination). Please inform us if you want to use a source directory other than /fs/ess/project_ID/course_ID/materials. The student will see the directory materials on the landing page. PI can add files to the material source directory. New files will be copied to the destination every time when a new Rstudio session starts. But If PI modifies existing files, the changes won't be copied as the files were copied before. Therefore we recommend renaming the file after the update so that it will be copied.
There are several different ways to copy materials manually from a directory to students' workspace. T
class.osc.edu
server, click on Files
from the top tabs, then on $HOME
directory. From the top right, click on Go to
and enter the storage path (Eg: /fs/ess/PZS0687/
) in the box and press OK. This will open up storage path and users can copy files. Open the class folder from the $HOME
tree shown on left and paste files there. All files copied to $HOME/osc_classes/course_ID
will appear in the Rstudio FIle browser.class.osc.edu
server, Click on Clusters
from the top tabs, then on Owens Shell Access
. This will open up a terminal on Owens where students can enter Unix command for copying. Eg: cp -r /fs/ess/PZS0687/OSCWORKSHOP/materials $HOME/osc_classes/course_ID
$HOME/osc_classes/course_ID
will be created only after launching Rstudio instance at least once.upload
tab located in the File browser
of Rstudio from their local computer. This assumes they have already downloaded materials to their computers.Please reach out to oschelp@osc.edu if you have any questions.
You can install nbgrader in a notebook:
!pip install nbgrader !jupyter nbextension install --sys-prefix --py nbgrader --overwrite !jupyter nbextension enable --sys-prefix --py nbgrader !jupyter serverextension enable --sys-prefix --py nbgrader
To check the installed extensions, run
!jupyter nbextension list
There are six enabled extensions
In order to upload and collect assignments, nbgrader requires a exchange directory with write permissions for everyone. For example, to create a directory in project space, run:
%%bash mkdir -p /fs/ess/projectID/courseID/exchange chmod a+wx /fs/ess/projectID/courseID/exchange
Then get your cousre ID for configuratin. In a notebook, run:
%%bash echo $OSC_CLASS_ID
Finally create the nbgrader configuration at the root of the workspace. In a notebook, run
%%file nbgrader_config.py c = get_config() c.CourseDirectory.course_id = "courseID" # it must be the value of $OSC_CLASS_ID c.Exchange.root = "/fs/ess/projectID/courseID/exchange" c.Exchange.timezone = 'EST'
Once the file is created, you can launch a new Jupyter session then start creating assignments. For using nbgrader, please refer the nbgrader documents.
To let students access the assignments, students need to have the following configuration file in the root of their workspace:
%%file nbgrader_config.py c = get_config() c.Exchange.root = "/fs/ess/projectID/courseID/exchange"
Our HOWTO collection contains short tutorials that help you step through some of the common (but potentially confusing) tasks users may need to accomplish, that do not quite rise to the level of requiring more structured training materials. Items here may explain a procedure to follow, or present a "best practices" formula that we think may be helpful.
The XDMoD tool at xdmod.osc.edu can be used to get an overview of how accurate the requested time of jobs are with the elapsed time of jobs.
One way of specifying a time request is:
#SBATCH --time=xx:xx:xx
The elapsed time is how long the job ran for before completing. This can be obtained using the sacct
command.
$ sacct -u <username> --format=jobid,account,elapsed
It is important to understand that the requested time is used when scheduling a submitted job. If a job requests a time that is much more than the expected elapsed time, then it may take longer to start because the resources need to be allocated for the time that the job requests even if the job only uses a small portion of that requested time.
This allows one to view the requested time accuracy for an individual job, but XDMoD can be used to do this for jobs submitted in over a time range.
First, login to xdmod.osc.edu, see this page for more instructions.
https://www.osc.edu/supercomputing/knowledge-base/xdmod_tool
Then, navigate to the Metric Explorer tab.
Look for the Metric Catalog on the left side of the page and expand the SUPREMM options. Select Wall Hours: Requested: Per Job and group by None.
This will now show the average time requested.
The actual time data can be added by navigating to Add Data -> SUPREMM -> Wall Hours: Per Job.
This will open a new window titled Data Series Definition, to change some parameters before showing the new data. In order to easily distinguish between elapsed and requested time, change the Display Type to Bar, then click add to view the new data.
Now there is a line which shows the average requested time of jobs, and bars which depict the average elapsed time of jobs. Essentialy, the closer the bar is to the line, without intersecting the line, the more accurate the time predicition. If the bar intersects the line, then it may indicate the there was not enough time requested for a job to complete, but remember that these values are averages.
One can also view more detailed information about these jobs by clicking a data point and using the Show raw data option.
While our Python installations come with many popular packages installed, you may come upon a case in which you need an additional package that is not installed. If the specific package you are looking for is available from anaconda.org (formerlly binstar.org), you can easily install it and required dependencies by using the conda package manager.
The following steps are an example of how to set up a Python environment and install packages to a local directory using conda. We use the name local
for the environment, but you may use any other name.
We have python
and miniconda3
modules. python
modules are based on Anaconda package manager, and miniconda3
module is based on Miniconda package manager. python
modules are typically recommended when you use Python in a standard environment that we provide. However, if you want to create your own python environment, we recommend using miniconda3
module, since you can start with minimal configurations.
module load miniconda3
Three alternative create commands are listed. These cover the most common cases.
The following will create a minimal Python installation without any extraneous packages:
conda create -n local
If you want to clone the full base Python environment from the system, you may use the following create command:
conda create -n local --clone base
You can augment the command above by listing specific packages you would like installed into the environment. For example, the following will create a minimal Python installation with only the specified packages (in this case, numpy
and babel
):
conda create -n local numpy babel
By default, conda will install the newest versions of the packages it can find. Specific versions can be specified by adding =<version>
after the package name. For example, the following will create a Python installation with Python version 2.7 and NumPy version 1.16:
conda create -n local python=2.7 numpy=1.16
To verify that a clone has been created, use the command
conda info -e
For additional conda command documentation see https://docs.conda.io/projects/conda/en/latest/commands.html#conda-general-commands
Before the created environment can be used, it must be activated.
For the bash shell:
source activate local
conda create
step, you may saw a message from the installer that you can use conda activate
command for activating environment. But, please don't use conda activate
command, because it will try to update your shell configuration file and it may cause other issues. So, please use source activate
command as we suggest above.On newer versions of Anaconda on the Owens cluster you may also need to perform the removal of the following packages before trying to install your specific packages:
conda remove conda-build
conda remove conda-env
To install additional packages, use the conda install
command. For example, to install the yt
package:
conda install yt
By default, conda will install the newest version if the package that it can find. Specific versions can be specified by adding =<version>
after the package name. For example, to install version 1.16 of the NumPy package:
conda install numpy=1.16
If you need to install packages with pip
, then you can install pip
in your virtual environment by
conda install pip
Then, you can install packages with pip
as
pip install PACKAGE
Now we will test our installed Python package by loading it in Python and checking its location to ensure we are using the correct version. For example, to test that NumPy is installed correctly, run
python -c "from __future__ import print_function; import numpy; print(numpy.__file__)"
and verify that the output generally matches
$HOME/.conda/envs/local/lib/python3.6/site-packages/numpy/__init__.py
To test installations of other packages, replace all instances of numpy
with the name of the package you installed.
If the method using conda above is not working, or if you prefer, you can consider installing Python packages from the source. Please read HOWTO: install your own Python packages.
See the comparison to these package management tools here:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands
pip
installations are supported:
module load python module list # check which python you just loaded pip install --user --upgrade PACKAGE # where PACKAGE is a valid package name
Note the default installation prefix is set to the system path where OSC users cannot install the package. With the option --user
, the prefix is set to $HOME/.local
where lib, bin, and other top-level folders for the installed packages are placed. Finally, the option --upgrade
will upgrade the existing packages to the newest available version.
The one issue with this approach is portability with multiple Python modules. If you plan to stick with a single Python module, then this should not be an issue. However, if you commonly switch between different Python versions, then be aware of the potential trouble in using the same installation location for all Python versions.
Typically, you can install packages with the methods shown in Install packages section above, but in some cases where the conda package installations have no source from conda channels or have dependency issues, you may consider using pip
in an isolated Python virtual environment.
To create an isolated virtual environment:
module reset python3 -m venv --without-pip $HOME/venv/mytest --prompt "local" source $HOME/venv/mytest/bin/activate (local) curl https://bootstrap.pypa.io/get-pip.py |python # get the newest version of pip (local) deactivate
where we use the path $HOME/venv/mytest
and the name local
for the environment, but you may use any other path and name.
To activate and deactivate the virtual environment:
source $HOME/venv/mytest/bin/activate (local) deactivate
To install packages:
source $HOME/venv/mytest/bin/activate (local) pip install PACKAGE
You don't need the --user
option within the virtual environment.
Conda Test Drive: https://conda.io/docs/test-drive.html
This documentation describes how to install tensorflow package locally in your $HOME space.
Load python module
module load python/3.6-conda5.2
Three alternative create commands are listed. These cover the most common cases:
conda create -n local --clone="$PYTHON_HOME"
This will clone the entire python installation to ~/envs/local directory. The process will take several minutes.
conda create -n local
This will create a local python installation without any packages. If you need a small number of packages, you may choose this option.
conda create -n local python={version} anaconda
If you like to install a specific version of python, you can specify it with "python" option. For example, you can use "python=3.6" for version 3.6.
To verify that a clone has been created, use the command
conda info -e
For additional conda command documentation see https://conda.io/docs/commands.html
For the bash shell:
source activate local
On newer versions of Anaconda on the Owens cluster you may also need to perform the removal of the following packages before trying to install your specific packages:
conda remove conda-build
conda remove conda-env
Install the latest version of tensorflow that is gpu compatible.
pip install tensorflow-gpu
Now we will test tensorflow package by loading it in python and checking its location to ensure we are using the correct version.
python -c "import tensorflow;print (tensorflow.__file__)"
Output:
$HOME/.conda/envs/local/lib/python2.7/site-packages/tensorflow/__init__.py
If the method using conda above is not working or if you prefer, you can consider installing python modules from the source. Please read HOWTO: install your own python modules.
While we provide a number of Python packages, you may need a package we do not provide. If it is a commonly used package or one that is particularly difficult to compile, you can contact OSC Help for assistance. We also have provided an example below showing how to build and install your own Python packages and make them available inside of Python. These instructions use "bash" shell syntax, which is our default shell. If you are using something else (csh, tcsh, etc), some of the syntax may be different.
First, you need to collect what you need in order to perform the installation. We will do all of our work in $HOME/local/src
. You should make this directory now.
mkdir -p $HOME/local/src
Next, we will need to download the source code for the package we want to install. In our example, we will use "NumExpr," a package we already provide in the system version of Python. You can either download the file to your desktop and then upload it to OSC, or directly download it using the wget
utility (if you know the URL for the file).
cd ~/local/src wget http://numexpr.googlecode.com/files/numexpr-2.0.1.tar.gz
Next, extract the downloaded file. In this case, since it's a "tar.gz" format, we can use tar to decompress and extract the contents.
tar xvfz numexpr-2.0.1.tar.gz
You can delete the downloaded archive now or keep it should you want to start the installation from scratch.
To build the package, we will want to first create a temporary environment variable to aid in installation. We'll call INSTALL_DIR
.
export INSTALL_DIR=${HOME}/local/numexpr/2.0.1
We are roughly following the convention we use at the system level. This allows us to easily install new versions of software without risking breaking anything that uses older versions. We have specified a folder for the program (numexpr), and for the version (2.0.1). To be consistent with Python installations, we will create a second temporary environment variable that will contain the actual installation location.
export TREE=${INSTALL_DIR}/lib/python2.7/site-packages
Next, make the directory tree.
mkdir -p $TREE
To compile the package, we should switch to the GNU compilers. The system installation of Python was compiled with the GNU compilers, and this will help avoid any unnecessary complications. We will also load the Python package, if it hasn't already been loaded.
module swap intel gnu module load python
Next, build it. This step may vary a bit, depending on the package you are compiling. You can execute python setup.py --help
to see what options are available. Since we are overriding the install path to one that we can write to and that fits our management plan, we need to use the --prefix
option.
python setup.py install --prefix=$INSTALL_DIR
At this point, the package is compiled and installed in ~/local/numexpr/2.0.1/lib/python2.7/site-packages
. Occasionally, some files will be installed in ~/local/numexpr/2.0.1/bin
as well. To ensure Python can locate these files, we need to modify our environment.
The most immediate way -- but the one that must be repeated every time you wish to use the package -- is to manually modify your environment. If files are installed in the "bin" directory, you'll need to add it to your path. As before, these examples are for bash, and may have to be modified for other shells. Also, you will have to modify the directories to match your install location.
export PATH=$PATH:~/local/numexpr/2.0.1/bin
And for the Python libraries:
export PYTHONPATH=$PYTHONPATH:~/local/numexpr/2.0.1/lib/python2.7/site-packages
We don't recommend this option, as it is less flexible and can cause conflicts with system software. But if you want, you can modify your .bashrc (or similar file, depending on your shell) to set these environment variables automatically. Be extra careful; making a mistake in .bashrc (or similar) can destroy your login environment in a way that will require a system administrator to fix. To do this, you can copy the lines above modifying $PATH
and $PYTHONPATH
into .bashrc. Remember to test them interactively first. If you destroy your shell interactively, the fix is as simple as logging out and then logging back in. If you break your login environment, you'll have to get our help to fix it.
This is the most complicated option, but it is also the most flexible, as you can have multiple versions of this particular software installed and specify at run-time which one to use. This is incredibly useful if a major feature changes that would break old code, for example. You can see our tutorial on writing modules here, but the important variables to modify are, again, $PATH
and $PYTHONPATH
. You should specify the complete path to your home directory here and not rely on any shortcuts like ~
or $HOME
. Below is a modulefile written in Lua:
If you are following the tutorial on writing modules, you will want to place this file in $HOME/local/share/modulefiles/numexpr/2.0.1.lua
:
-- This is a Lua modulefile, this file 2.0.1.lua can be located anywhere -- But if you are following a local modulefile location convention, we place them in -- $HOME/local/share/modulefiles/ -- For numexpr we place it in $HOME/local/share/modulefiles/numexpr/2.0.1.lua -- This finds your home directory local homedir = os.getenv("HOME") prepend_path("PYTHONPATH", pathJoin(homedir, "/local/numexpr/2.0.1/lib/python2.7/site-packages")) prepend_path(homedir, "local/numexpr/2.0.1/bin"))
Once your module is created (again, see the guide), you can use your Python package simply by loading the software module you created.
module use $HOME/local/share/modulefiles/ module load numexpr/2.0.1
This page outlines ways to generate and view performance data for your program using tools available at OSC.
This section describes how to use performance tools from Intel. Make sure that you have an Intel module loaded to use these tools.
Intel VTune is a tool to generate profile data for your application. Generating profile data with Intel VTune typically involves three steps:
You need executables with debugging information to view source code line detail: re-compile your code with a -g
option added among the other appropriate compiler options. For example:
mpicc wave.c -o wave -g -O3
Profiles are normally generated in a batch job. To generate a VTune profile for an MPI program:
mpiexec <mpi args> amplxe-cl <vtune args> <program> <program args>
where <mpi args>
represents arguments to be passed to mpiexec, <program>
is the executable to be run, <vtune args>
represents arguments to be passed to the VTune executable amplxe-cl, and <program args>
represents arguments passed to your program.
For example, if you normally run your program with mpiexec -n 12 wave_c
, you would use
mpiexec -n 12 amplxe-cl -collect hotspots -result-dir r001hs wave_c
To profile a non-MPI program:
amplxe-cl <vtune args> <program> <program args>
The profile data is saved in a .map file in your current directory.
As a result of this step, a subdirectory that contains the profile data files is created in your current directory. The subdirectory name is based on the -result-dir argument and the node id, for example, r001hs.o0674.ten.osc.edu
.
3. Analyze your profile data.
You can open the profile data using the VTune GUI in interactive mode. For example:
amplxe-gui r001hs.o0674.ten.osc.edu
One should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows). Note that X11 forwarding can be distractingly slow for interactive applications.
Intel Trace Analyzer and Collector (ITAC) is a tool to generate trace data for your application. Generating trace data with Intel ITAC typically involves three steps:
You need to compile your executbale with -tcollect
option added among the other appropriate compiler options to insert instrumentation probes calling the ITAC API. For example:
mpicc wave.c -o wave -tcollect -O3
mpiexec -trace <mpi args> <program> <program args>
For example, if you normally run your program with mpiexec -n 12 wave_c
, you would use
mpiexec -trace -n 12 wave_c
As a result of this step, .anc, .f, .msg, .dcl, .stf, and .proc files will be generated in your current directory.
You will need to use traceanalyzer
to view the trace data. To open Trace Analyzer:
traceanalyzer /path/to/<stf file>
where the base name of the .stf file will be the name of your executable.
One should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows) to view the trace data. Note that X11 forwarding can be distractingly slow for interactive applications.
Intel's Application Performance Snapshot (APS) is a tool that provides a summary of your application's performance . Profiling HPC software with Intel APS typically involves four steps:
Regular executables can be profiled with Intel APS. but source code line detail will not be available. You need executables with debugging information to view source code line detail: re-compile your code with a -g
option added among the other approriate compiler options. For example:
mpicc wave.c -o wave -tcollect -O3
Profiles are normally generated in a batch job. To generate profile data for an MPI program:
mpiexec -trace <mpi args> <program> <program args>
where <mpi args>
represents arguments to be passed to mpiexec, <program>
is the executable to be run and <program args>
represents arguments passed to your program.
For example, if you normally run your program with mpiexec -n 12 wave_c
, you would use
mpiexec -n 12 wave_c
To profile a non-MPI program:
aps <program> <program args>
The profile data is saved in a subdirectory in your current directory. The directory name is based on the date and time, for example, aps_result_YYYYMMDD/.
To generate the html profile file from the result subdirectory:
aps --report=./aps_result_YYYYMMDD
to create the file aps_report_YYYYMMDD_HHMMSS.html.
You can open the profile data file using a web browswer on your local desktop computer. This option typically offers the best performance.
This section describes how to use performance tools from ARM.
Instructions for how to use MAP is available here.
Instructions for how to use DDT is available here.
Instructions for how to use Performance Reports is available here.
This section describes how to use other performance tools.
Rice University's HPC Toolkit is a collection of performance tools. Instructions for how to use it at OSC is available here.
TAU Commander is a user interface for University of Oregon's TAU Performance System. Instructions for how to use it at OSC is available here.
To connect to OSC services, a secure tunnel to a session is required. This can be done relatively simply in OSX and Linux by using the SSH functionality built into the system, but Windows users have had to configure and use third party applications like PuTTY or Java to access secure resources at OSC. OSC Connect is a native windows application written in C# and compiled for .NET 2.0, providing preconfigured management of secure tunnel connections for Windows users, as well as providing a launcher for secure file transfer, VNC, terminal, and web based services.
OSC Connect is supported on Windows versions from Windows XP through Windows 10.
We've created a brief video:
OSCConnect.exe
and download. Use "Save link as" to download this file to a folder of your choice. OSCConnect.exe
icon to run the application. In the current state, OSC Connect is entirely deployed by a single executable file; no further installation is required. plink.exe
is the command-line version of PuTTY used by the application to create the secure connection to OSC resources.putty.exe
is the GUI application of PuTTY used to provide terminal emulation remote console connections to OSC resources.vncviewer.exe
is the VNC viewer client used to view a remote desktop session.WinSCP.exe
is the SFTP client used for file transfer.Once your connections to OSC services as well as the OSC Connect app is closed, the temporary folder named "ConnectFiles" will be removed automatically.
After you double-click the OSCConnect.exe
icon , the application graphical user interface is shown as below:
Network Status: it indicates which OSC cluster you will be connected to. The option can be changed in "Settings".
Settings: it provides several configuration options to modify the behavior of the application.
Connection Settings: use this dropdown to select the default host/cluster. Selecting a server here will change the endpoint for tunneling, sftp connections, console connections, and connectivity checking.
System Settings:
Detect Clipboard Activity: when this option is enabled, the application will detect valid data on the Windows clipboard and populate the application. ( Default: Off )
Check for New Versions: when this option is enabled, the application will check for version updates. (Default: on)
Automation Settings:
Save User Credentials: when this option is enabled, it allows the application to remember the user when the application is reopened. This saves the user credentials to the user settings using DPAPI Encryption. Passwords are decrypted only by current the Windows user account. ( Default: Off )
Launch Tunnel On Import: when this option is enabled, the tunnel will automatically connect when the application detects a valid clipboard string and the user credentials have been entered. ( Default: On )
VNC Settings
After you provide your OSC Credentials, i.e. your OSC HPC username and password, more functionalities are available as shown below:
In addition, Session Type is provided such that advanced users can connect to a running session mannually.
The OSC Connect application can be used to connect to a running session launched through OSC OnDemand.
osc://xxxxx
. osc://xxxxx
is a custom URI scheme that is registered when you launch the application. Simply click the link to populate the configuration information and connect to your running session. If OSCConnect.exe
is not running when you click the URI, the OSC Connect application will be popped up. Enter your OSC HPC username and password and you will be able to connect to the session by clicking the "Connect" button.OSCConnect.exe
at least once before you use it to connect to a running session. The initial launch will add a key to your user registry that initializes the URI scheme. I've clicked the osc://
link and nothing happened.
Be sure to run OSCConnect.exe
at least once. The initial launch will add a key to your user registry that initializes the URI scheme. If you move or rename the OSCConnect.exe
file, you will need to run the application again manually to update the path in the handler.
I've received the error "Unable to open helper application. The protocol specified in this address is not valid."
This issue appears in some earlier versions of Internet Explorer when attempting to launch the application from a Temporary location. Download and run the OSCConnect.exe
application, being sure to save the file to a non-temporary location.
This article focuses on debugging strategies for C/C++ codes, but many are applicable to other languages as well.
This approach is a great starting point. Say you have written some code, and it does not do what you expect it to do. You have stared at it for a few minutes, but you cannot seem to spot the problem.
Try explaining what the problem is to a rubber duck. Then, walk the rubber duck through your code, line by line, telling it what it does. Don’t have a rubber duck? Any inanimate object will do (or even an animate one if you can grab a friend).
It sounds silly, but rubber duck debugging helps you to get out of your head, and hopefully look at your code from a new perspective. Saying what your code does (or is supposed to do) out loud has a good chance of revealing where your understanding might not be as good as you think it is.
You’ve written a whole bunch of new code. It takes some inputs, chugs along for a while, and then creates some outputs. Somewhere along this process, something goes wrong. You know this because the output is not at all what you expected. Unfortunately, you have no idea where things are going wrong in the code.
This might be a good time to try out printf() debugging. It’s as simple as its name implies: simply add (more) printf() statements to your code. You’ve likely seen this being used. It’s the name given to the infamous ‘printf(“here”);’ calls used to verify that a particular codepath is indeed taken.
Consider printing out arguments and return values to key functions. Or, the results or summary statistics from large calculations. These values can be used as “sanity checks” to ensure that up until that point in the code, everything is going as expected.
Assertion calls, such as "assert(...)", can also be used for a similar purpose. However, often the positive feedback you get from print statements is helpful in when you’re debugging. Seeing a valid result printed in standard out or a log file tells you positively that at least something is working correctly.
Debuggers are tools that can be used to interactively (or with scripts) debug your code. A fairly common debugger for C and C++ codes is gdb. Many guides exist online for using gdb with your code.
OSC systems also provide the ARM DDT debugger. This debugger is designed for use with HPC codes and is arguably easier to use than gdb. It can be used to debug MPI programs as well.
Debuggers allow you to interact with the program while it is running. You can do things like read and write variable values, or check to see if/when certain functions are called.
Okay, this one isn’t exactly a debugging strategy. It’s a method to catch bugs early, and even prevent the addition of bugs. Writing a test suite for your code that’s easy to run (and ideally fast) lets you test new changes to ensure they don’t break existing functionality.
There are lots of different philosophies on testing software. Too many to cover here. Here’s two concepts that are worth looking into: unit testing and system testing.
The idea behind unit testing is writing tests for small “units” of code. These are often functions or classes. If you know that the small pieces that make up your code work, then you’ll have more confidence in the overall assembled program. There’s an added architecture benefit here too. Writing code that is testable in the first place often results in code that’s broken up into separate logical pieces (google “separation of concerns”). This makes your code more modular and less “spaghetti-like”. Your code will be easier to modify and understand.
The second concept – system testing – involves writing tests that run your entire program. These often take longer than unit tests, but have the added benefit that they’ll let you know whether or not your entire program still works after introducing a new change.
When writing tests (both system and unit tests), it’s often helpful to include a couple different inputs. Occasionally a program may work just fine for one input, but fail horribly with another input.
Maybe your code takes a couple hours (or longer…) to run. There’s a bug in it, but every time you try to fix it, you have to wait a few hours to see if the fix worked. This is driving you crazy.
A possible approach to make your life easier is to try to make a Minimal, Reproducible Example (see this stackoverflow page for information).
Try to extract just the code that fails, from your program, and also its inputs. Wrap this up into a separate program. This allows you to run just the code that failed, hopefully greatly reducing the time it takes to test out fixes to the problem.
Once you have this example, can you make it smaller? Maybe take out some code that’s not needed to reproduce the bug, or shrink the input even further? Doing this might help you solve the problem.
In December 2021 OSC updated its firewall to enhance security. As a result, SSH sessions are being closed more quickly than they used to be. It is very easy to modify your SSH options in the client you use to connect to OSC to keep your connection open.
In ~/.ssh/config (use the command touch ~/.ssh/config
to create it if there is no exisitng one), you can set 3 options:
TCPKeepAlive=no ServerAliveInterval=60 ServerAliveCountMax=5
Please refer to your SSH client documentation for how to set these options in your client.
An eligible principal investigator (PI) heads a project account and can authorize/remove user accounts under the project account (please check our Allocations and Accounts documentation for more details). This document shows you how to identify users on a project account and check the status of each user.
If the project account (projectID) is known, the OSCgetent
command will list all users on the project:
$ OSCgetent group projectID
The returned information is in the format of:
projectID:*:gid: list of user IDs
gid is the group identifier number unique for the project account projectID.
For example, the command OSCgetent group PZS0712
lists all users on the project account PZS0712 as below:
$ OSCgetent group PZS0712 PZS0712:*:5513:amarcum,guilfoos,hhamblin,kcahill,xwang
Multiple groups can also be queried at once.
For Example, the command OSCgetent group PZS0712 PZS0726 lists all users on both PZS0712 and PZS0726:
PZS0712:*:5513:amarcum,guilfoos,hhamblin,kcahill,xwang PZS0726:*:6129:amarcum,kkappel
Details on a project can also be obtained along with the user list using the OSCfinger
command.
$ OSCfinger -g projectID
This returns:
Group: projectID GID: XXXX Status: 'active/restricted/etc' Type: XX Principal Investigator: 'PI email' Admins: NA Members: 'list of users' Category: NA Institution: 'affliated institution' Description: 'short description' ---
If the project account is not known, but the username is known, use the OSCfinger
command to list all of the groups the user belongs to:
OSCfinger username
The returned information is in the format of:
Login: username Name: First Last Directory: home directory path Shell: /bin/bash E-mail: user's email address Primary Group: user's primary project Groups: list of projects and other groups user is in Password Changed: date password was last changed Password Expires: date password expires Login Disabled: TRUE/FALSE Password Expired: TRUE/FALSE Current Logins: Displays if user is currently logged in and from where/when
For example, with the username as amarcum, the command OSCfinger amarcum
returns the information as below:
$ OSCfinger amarcum Login: amarcum Name: Antonio Marcum Directory: /users/PZS0712/amarcum Shell: /bin/bash E-mail: amarcum@osc.edu Primary Group: PZS0712 Groups: sts,ruby,l2supprt,oscall,clntstf,oscstaff,clntall,PZS0712,PZS0726 Password Changed: May 12 2019 15:47 (calculated) Password Expires: Aug 11 2019 12:05 AM Login Disabled: FALSE Password Expired: FALSE Current Logins: On since Mar 07 2019 12:12 on pts/14 from pitzer-login01.hpc.osc.edu ----
If the project account or username is not known, use the OSCfinger -e
command with the '-e' flag to get the user account based on the user's name.
Use the following command to list all of the user accounts associated with a First and Last name:
$ OSCfinger -e 'First Last'
For example, with user's first name as Summer and last name as Wang, the command
OSCfinger -e 'Summer Wang'
returns the information as below:
$ OSCfinger -e 'Summer Wang' Login: xwang Name: Summer Wang Directory: /users/oscgen/xwang Shell: /bin/bash E-mail: xwang@osc.edu Primary Group: PZS0712 Groups: amber,abaqus,GaussC,comsol,foampro,sts,awsmdev,awesim,ruby,matlab,aasheats,mars,ansysflu,wrigley,lgfuel,l2supprt,fsl,oscall,clntstf,oscstaff,singadm,clntall,dhgremot,fsurfer,PZS0530,PCON0003,PZS0680,PMIU0149,PZS0712,PAS1448 Password Changed: Jan 08 2019 11:41 Password Expires: Jul 08 2019 12:05 AM Login Disabled: FALSE Password Expired: FALSE ---
Once you know the user account username, follow the discussions in the previous section identify users on a project to get all user accounts on the project. Please contact OSC Help if you have any questions.
Use the OSCfinger
command to check the status of a user account as below:
OSCfinger username
For example, if the username is xwang, the command OSCfinger xwang
will return:
$ OSCfinger xwang Login: xwang Name: Summer Wang Directory: /users/oscgen/xwang Shell: /bin/bash E-mail: xwang@osc.edu Primary Group: PZS0712 Groups: amber,abaqus,GaussC,comsol,foampro,sts,awsmdev,awesim,ruby,matlab,aasheats,mars,ansysflu,wrigley,lgfuel,l2supprt,fsl,oscall,clntstf,oscstaff,singadm,clntall,dhgremot,fsurfer,PZS0530,PCON0003,PZS0680,PMIU0149,PZS0712,PAS1448 Password Changed: Jan 08 2019 11:41 Password Expires: Jul 08 2019 12:05 AM Login Disabled: FALSE Password Expired: FALSE ---
Directory: /users/oscgen/xwang
Shell: /bin/bash
). If the information is Shell:/access/denied
, it means this user account has been either archived or restricted. Please contact OSC Help if you'd like to reactivate this user account.Mail forwarded to xwang@osc.edu
). Please contact OSC Help if the email address associated with this user account has been changed to ensure important notifications/messages/reminders from OSC may be received in a timely manner.All users see their file system usage statistics when logging in, like so:
As of 2018-01-25T04:02:23.749853 userid userID on /users/projectID used XGB of quota 500GB and Y files of quota 1000000 files
The information is from the file /users/reporting/storage/quota/*_quota.txt , which is updated twice a day. Some users may see multiple lines associated with a username, as well as information on project space usage and quota of their Primary project, if there is one. The usage and quota of the home directory of a username is provided by the line including the file server your home directory is on (for more information, please visit Home Directories), while others (generated due to file copy) can be safely ignored.
You can check any user's home directory or a project's project space usage and quota by running:
grep -h 'userID' OR 'projectID' /users/reporting/storage/quota/*_quota.txt
Here is an example of project PZS0712:
$ grep -h PZS0712 /users/reporting/storage/quota/*_quota.txt As of 2019-03-07T13:55:01.000000 project/group PZS0712 on /fs/project used 262 GiB of quota 2048 GiB and 166987 files of quota 200000 files As of 2019-03-07T13:55:01.000000 userid xwang on /fs/project/PZS0712 used 0 GiB of quota 0 GiB and 21 files of quota 0 files As of 2019-03-07T13:55:01.000000 userid dheisterberg on /fs/project/PZS0712 used 262 GiB of quota 0 GiB and 166961 files of quota 0 files As of 2019-03-07T13:55:01.000000 userid amarcum on /fs/project/PZS0712 used 0 GiB of quota 0 GiB and 2 files of quota 0 files As of 2019-03-07T13:55:01.000000 userid root on /fs/project/PZS0712 used 0 GiB of quota 0 GiB and 2 files of quota 0 files As of 2019-03-07T13:55:01.000000 userid guilfoos on /fs/project/PZS0712 used 0 GiB of quota 0 GiB and 1 files of quota 0 files As of 2019-03-07T13:51:23.000000 userid amarcum on /users/PZS0712 used 399.86 MiB of quota 500 GiB and 8710 files of quota 1000000 files
Here is an example for username amarcum:
$ grep -h amarcum /users/reporting/storage/quota/*_quota.txt As of 2019-03-07T13:55:01.000000 userid amarcum on /fs/project/PZS0712 used 0 GiB of quota 0 GiB and 2 files of quota 0 files As of 2019-03-07T13:56:39.000000 userid amarcum on /users/PZS0645 used 4.00 KiB of quota 500 GiB and 1 files of quota 1000000 files As of 2019-03-07T13:56:39.000000 userid amarcum on /users/PZS0712 used 399.86 MiB of quota 500 GiB and 8710 files of quota 1000000 files
The OSCusage
commnad can provide detailed information about computational usage for a given project and user.
See the OSCusage command page for details.
If you need to use a MATLAB toolbox that is not provided through our installations. You can follow these instructions, and if you have any difficulties you can contact OSC Help for assistance.
First, we recommend making a new directory within your home directory in order to keep everything organized. You can use the unix command to make a new directory: "mkdir"
Now you can download the toolbox either to your desktop, and then upload it to OSC, or directly download it using the "wget" utility (if you know the URL for the file).
Now you can extract the downloaded file.
There are two methods on how to add the MATLAB toolbox path.
Method 1: Load up the Matlab GUI and click on "Set Path" and "Add folder"
Method 2: Use the "addpath" fuction in your script. More information on the function can be found here: https://www.mathworks.com/help/matlab/ref/addpath.html
Please refer to the instructions given alongside the toolbox. They should contain instructions on how to run the toolbox.
While we provide a number of Perl modules, you may need a module we do not provide. If it is a commonly used module, or one that is particularly difficult to compile, you can contact OSC Help for assistance, but we have provided an example below showing how to build and install your own Perl modules. Note, these instructions use "bash" shell syntax; this is our default shell, but if you are using something else (csh, tcsh, etc), some of the syntax may be different.
CPAN, the Comprehensive Perl Achive Network, is the primary source for publishing and fetching the latest modules and libraries for the Perl programming language. The default method for installing Perl modules using the "CPAN Shell", provides users with a great deal of power and flexibility but at the cost of a complex configuration and inelegant default setup.
To use CPAN Minus, we must first load it, if it hasn't already been loaded. Note that this is not necessary if you loaded a version of Perl with the module load command.
module load cpanminus
Next, in order to use cpanminus, you will need to run the following command only ONCE:
perl -I $CPANMINUS_INC -Mlocal::lib
In most cases, using CPAN Minus to install modules is as simple as issuing a command in the following form:
cpanm [Module::Name]
For example, below are three examples of installing perl modules:
cpanm Math::CDF cpanm SET::IntervalTree cpanm DB_File
To test a perl module import, here are some examples below:
perl -e "require Math::CDF" perl -e "require Set::IntervallTree" perl -e "require DB_File"
The modules are installed correctly if no output is printed.
To show the local modules you have installed in your user account:
perldoc perllocal
Reseting Module Collection
If you should ever want to start over with your perl module collection, delete the following folders:
rm -r ~/perl5 rm -r ~/.cpanm
Sometimes the best way to get access to a piece of software on the HPC systems is to install it yourself as a "local install". This document will walk you through the OSC-recommended procedure for maintaining local installs in your home directory or project space. The majority of this document describes the process of "manually" building and installing your software. We also show a partially automated approach through the use of a bash script in the Install Script section near the end.
Before installing your software, you should first prepare a place for it to live. We recommend the following directory structure, which you should create in the top-level of your home directory:
local
|-- src
|-- share
`-- lmodfiles
This structure is analogous to how OSC organizes the software we provide. Each directory serves a specific purpose:
local
- Gathers all the files related to your local installs into one directory, rather than cluttering your home directory. Applications will be installed into this directory with the format "appname/version". This allows you to easily store multiple versions of a particular software install if necessary.local/src
- Stores the installers -- generally source directories -- for your software. Also, stores the compressed archives ("tarballs") of your installers; useful if you want to reinstall later using different build options.local/share/lmodfiles
- The standard place to store module files, which will allow you to dynamically add or remove locally installed applications from your environment.You can create this structure with one command:
mkdir -p $HOME/local/src $HOME/local/share/lmodfiles
(NOTE: $HOME is defined by the shell as the full path of your home directory. You can view it from the command line with the command echo $HOME
.)
Now that you have your directory structure created, you can install your software. For demonstration purposes, we will install a local copy of Git.
First, we need to get the source code onto the HPC filesystem. The easiest thing to do is find a download link, copy it, and use the wget
tool to download it on the HPC. We'll download this into $HOME/local/src
:
cd $HOME/local/src
wget https://github.com/git/git/archive/v2.9.0.tar.gz
Now extract the tar file:
tar zxvf
v2.9.0.tar.gz
Next, we'll go into the source directory and build the program. Consult your application's documentation to determine how to install into $HOME/local/"software_name"/"version"
. Replace "software_name"
with the software's name and "version"
with the version you are installing, as demonstrated below. In this case, we'll use the configure
tool's --prefix
option to specify the install location.
You'll also want to specify a few variables to help make your application more compatible with our systems. We recommend specifying that you wish to use the Intel compilers and that you want to link the Intel libraries statically. This will prevent you from having to have the Intel module loaded in order to use your program. To accomplish this, add CC=icc CFLAGS=-static-intel
to the end of your invocation of configure
. If your application does not use configure
, you can generally still set these variables somewhere in its Makefile or build script.
Then, we can build Git using the following commands:
cd git-2.9.0
autoconf # this creates the configure file
./configure --prefix=$HOME/local/git/2.9.0 CC=icc CFLAGS=-static-intel
make && make install
Your application should now be fully installed. However, before you can use it you will need to add the installation's directories to your path. To do this, you will need to create a module.
Modules allow you to dynamically alter your environment to define environment variables and bring executables, libraries, and other features into your shell's search paths.
We can use the mkmod script to create a simple Lua module for the Git installation:
module load mkmod create_module.sh git 2.9.0 $HOME/local/git/2.9.0
It will create the module $HOME/local/share/lmodfiles/git/2.9.0.lua
. Please note that by default our mkmod script only creates module files that define some basic environment variables PATH
, LD_LIBRARY_PATH
, MANPATH
, and GIT_HOME
. These default variables may not cover all paths desired. We can overwrite these defaults in this way:
module load mkmod TOPDIR_LDPATH_LIST="lib:lib64" \ TOPDIR_PATH_LIST="bin:exe" \ create_module.sh git 2.9.0 $HOME/local/git/2.9.0
This adds $GIT_HOME/bin
, $GIT_HOME/exe
to PATH
and $GIT_HOME/lib
, $GIT_HOME/lib64
to LD_LIBRARY_PATH
.
We can also add other variables by using ENV1, ENV2, and more. For example, suppose we want to change the default editor to vim for Git:
module load mkmod ENV1="GIT_EDITOR=vim" \ create_module.sh git 2.9.0 $HOME/local/git/2.9.0
We will be using the filename 2.9.0.lua ("version".lua). A simple Lua module for our Git installation would be:
-- Local Variables local name = "git" local version = "2.9.0" -- Locate Home Directory local homedir = os.getenv("HOME") local root = pathJoin(homedir, "local", name, version)
-- Set Basic Paths prepend_path("PATH", pathJoin(root, "bin"))
prepend_path("LD_LIBRARY_PATH", root .. "/lib") prepend_path("LIBRARY_PATH", root .. "/lib") prepend_path("INCLUDE", root .. "/include") prepend_path("CPATH", root .. "/include") prepend_path("PKG_CONFIG_PATH", root .. "/lib/pkgconfig")prepend_path("MANPATH", root .. "/share/man")
NOTE: For future module files, copy our sample modulefile from ~support/doc/modules/sample_module.lua
. This module file follows the recommended design patterns laid out above and includes samples of many common module operations
Any module file you create should be saved into your local lmodfiles directory ($HOME/local/share/lmodfiles). To prepare for future software installations, create a subdirectory within lmodfiles named after your software and add one module file to that directory for each version of the software installed.
In the case of our Git example, you should create the directory $HOME/local/share/lmodfiles/git
and create a module file within that directory named 2.9.0.lua
.
To make this module usable, you need to tell lmod where to look for it. You can do this by issuing the command module use $HOME/local/share/lmodfiles
in our example. You can see this change by performing module avail
. This will allow you to load your software using either module load git
or module load git/2.9.0
.
module use
$HOME/local/share/lmodfiles and module load "software_name" need to be entered into the command line every time you enter a new session on the system.If you install another version later on (lets say version 2.9.1) and want to create a module file for it, you need to make sure you call it 2.9.1.lua
. When loading Git, lmod will automatically load the newer version. If you need to go back to an older version, you can do so by specifying the version you want: module load git/2.9.0
.
To make sure you have the correct module file loaded, type which git
which should emit "~/local/git/2.9.0/bin/git" (NOTE: ~ is equivalent to $HOME).
To make sure the software was installed correctly and that the module is working, type git --version which should emit "git version 2.9.0".
Simplified versions of the scripts used to manage the central OSC software installations are provided at ~support/share/install-script
. The idea is that you provide the minimal commands needed to obtain, compile, and install the software (usually some variation on wget
, tar
, ./configure
, make
, and make install
) in a script, which then sources an OSC-maintained template that provides all of the "boilerplate" commands to create and manage a directory structure similar to that outlined in the Getting Started section above. You can copy an example install script from ~support/share/install-script/install-osc_sample.sh
and follow the notes in that script, as well as in ~support/share/install-script/README.md
, to modify it to install software of your choosing.
$HOME/osc_apps/lmodfiles
, so you will need to run module use $HOME/osc_apps/lmodfiles
and module load [software-name]
every time you enter a new session on the system and want to use the software that you have installed.For more information about modules, be sure to read the webpage indicated at the end of module help
. If you have any questions about modules or local installations, feel free to contact the OSC Help Desk and oschelp@osc.edu.
An ACL (access control list) is a list of permissions associated with a file or directory. These permissions allow you to restrict access to a certain file or directory by user or group.
OSC supports NFSv4 ACL on our home directory and POSIX ACL on our project and scratch file systems. Please see the how to use NFSv4 ACL for home directory ACL management and how to use POSIX ACL for managing ACLs in project and scratch file systems.
This document shows you how to use the NFSv4 ACL permissions system. An ACL (access control list) is a list of permissions associated with a file or directory. These permissions allow you to restrict access to a certian file or directory by user or group. NFSv4 ACLs provide more specific options than typical POSIX read/write/execute permissions used in most systems.
These commands are useful for managing ACLs in the dir locations of /users/<project-code>.
This is an example of an NFSv4 ACL
A::user@nfsdomain.org:rxtncy
A::alice@nfsdomain.org:rxtncy
A::alice@nfsdomain.org:rxtncy
A::alice@nfsdomain.org:rxtncy
The following sections will break down this example from left to right and provide more usage options
The 'A' in the example is known as the ACE (access control entry) type. The 'A' denotes "Allow" meaning this ACL is allowing the user or group to perform actions requiring permissions. Anything that is not explicitly allowed is denied by default.
The above example could have a distinction known as a flag shown below
A:d:user@osc.edu:rxtncy
The 'd' used above is called an inheritence flag. This makes it so the ACL set on this directory will be automatically established on any new subdirectories. Inheritence flags only work on directories and not files. Multiple inheritence flags can be used in combonation or omitted entirely. Examples of inheritence flags are listed below:
Flag | Name | Function |
---|---|---|
d | directory-inherit | New subdirectories will have the same ACE |
f | file-inherit | New files will have the same ACE minus the inheritence flags |
n | no-propogate inherit | New subdirectories will inherit the ACE minus the inheritence flags |
i | inherit-only | New files and subdirectories will have this ACE but the ACE for the directory with the flag is null |
The 'user@nfsdomain.org' is a principal. The principle denotes the people the ACL is allowing access to. Principals can be the following:
A:g:group@osc.edu:rxtncy
The 'rxtncy' are the permissions the ACE is allowing. Permissions can be used in combonation with each other. A list of permissions and what they do can be found below:
Permission | Function |
---|---|
r | read-data (files) / list-directory (directories) |
w | write-data (files) / create-file (directories) |
a | append-data (files) / create-subdirectory (directories) |
x | execute (files) / change-directory (directories) |
d | delete the file/directory |
D | delete-child : remove a file or subdirectory from the given directory (directories only) |
t | read the attributes of the file/directory |
T | write the attribute of the file/directory |
n | read the named attributes of the file/directory |
N | write the named attributes of the file/directory |
c | read the file/directory ACL |
C | write the file/directory ACL |
o | change ownership of the file/directory |
Note: Aliases such as 'R', 'W', and 'X' can be used as permissions. These work simlarly to POSIX Read/Write/Execute. More detail can be found below.
Alias | Name | Expansion |
---|---|---|
R | Read | rntcy |
W | Write | watTNcCy (with D added to directory ACE's |
X | Execute | xtcy |
This section will show you how to set, modify, and view ACLs
To set an ACE use this command:
nfs4_setfacl [OPTIONS] COMMAND file
To modify an ACE, use this command:
nfs4_editfacl [OPTIONS] file
Where file is the name of your file or directory. More information on Options and Commands can be found below.
Commands are only used when first setting an ACE. Commands and their uses are listed below.
COMMAND | FUNCTION |
---|---|
-a acl_spec [index] | add ACL entries in acl_spec at index (DEFAULT: 1) |
-x acl_spec | index | remove ACL entries or entry-at-index from ACL |
-A file [index] | read ACL entries to add from file |
-X file | read ACL entries to remove from file |
-s acl_spec | set ACL to acl_spec (replaces existing ACL) |
-S file | read ACL entries to set from file |
-m from_ace to_ace | modify in-place: replace 'from_ace' with 'to_ace' |
Options can be used in combination or ommitted entirely. A list of options is shown below:
OPTION | NAME | FUNCTION |
---|---|---|
-R | recursive | Applies ACE to a directory's files and subdirectories |
-L | logical | Used with -R, follows symbolic links |
-P | physical | Used with -R, skips symbolic links |
To view ACLs, use the following command:
nfs4_getfacl file
Where file is your file or directory
First, make the top-level of home dir group executable.
nfs4_setfacl -a A:g:<group>@osc.edu:X $HOME
Next create a new folder to store shared data
mkdir share_group
Move all data to be shared that already exists to this folder
mv <src> ~/share_group
Apply the acl for all current files and dirs under ~/share_group, and set acl so that new files created there will automatically have proper group permissions
nfs4_setfacl -R -a A:dfg:<group>@osc.edu:RX ~/share_group
One can also specify the acl to be used in a single file, then apply that acl to avoid duplicate entries and keep the acl entries consistent.
$ cat << EOF > ~/group_acl.txt A:fdg:clntstf@osc.edu:rxtncy A::OWNER@:rwaDxtTnNcCy A:g:GROUP@:tcy A::EVERYONE@:rxtncy EOF $ nfs4_setfacl -R -S ~/group_acl.txt ~/share_group
Assume that you want to share a directory (e.g data) and its files and subdirectories, but it is not readable by other users,
> ls -ld /users/PAA1234/john/data drwxr-x--- 3 john PAA1234 4096 Nov 21 11:59 /users/PAA1234/john/data
Like before, allow the user execute permissions to $HOME.
> nfs4_setfacl -a A::userid@osc.edu:X $HOME
set an ACL to the directory 'data' to allow specific user access:
> cd /users/PAA1234/john > nfs4_setfacl -R -a A:df:userid@osc.edu:RX data
or to to allow a specific group access:
> cd /users/PAA1234/john > nfs4_setfacl -R -a A:dfg:groupname@osc.edu:RX data
You can repeat the above commands to add more users or groups.
Sometimes one wishes to share their entire home dir with a particular group. Care should be taken to only share folders with data and not any hidden dirs.
~/.ssh
dir, which should always have read permissions only for the user that owns it.Use the below command to only assign group read permissions only non-hidden dirs.
After sharing an entire home dir with a group, you can still create a single share folder with the previous instructions to share different data with a different group only. So, all non-hidden dirs in your home dir would be readable by group_a, but a new folder named 'group_b_share' can be created and its acl altered to only share its contents with group_b.
Please contact oschelp@osc.edu if there are any questions.
This document shows you how to use the POSIX ACL permissions system. An ACL (access control list) is a list of permissions associated with a file or directory. These permissions allow you to restrict access to a certian file or directory by user or group.
These commands are useful for project and scratch dirs located in /fs/proejct, /fs/scratch, /fs/ess.
An example of a basic POSIX ACL would look like this:
# file: foo.txt # owner: tellison # group: PZSXXXX user::rw- group::r-- other::r--
The first three lines list basic information about the file/directory in question: the file name, the primary owner/creator of the file, and the primary group that has permissions on the file. The following three lines show the file access permissions for the primary user, the primary group, and any other users. POSIX ACLs use the basic rwx permissions, explaned in the following table:
Permission | Explanation |
---|---|
r | Read-Only Permissions |
w | Write-Only Permissions |
x |
Execute-Only Permissions |
This section will show you how to set and view ACLs, using the setfacl and getfacl commands
The getfacl command displays a file or directory's ACL. This command is used as the following
$ getfacl [OPTION] file
Where file is the file or directory you are trying to view. Common options include:
Flag | Description |
---|---|
-a/--access | Display file access control list only |
-d/--default | Display default access control list only (only primary access), which determines the default permissions of any files/directories created in this directory |
-R/--recursive | Display ACLs for subdirectories |
-p/--absolute-names | Don't strip leading '/' in pathnames |
A simple getfacl call would look like the following:
$ getfacl foo.txt # file: foo.txt # owner: user # group: PZSXXXX user::rw- group::r-- other::r--
A recursive getfacl call through subdirectories will list each subdirectories ACL separately
$ getfacl -R foo/ # file: foo/ # owner: user # group: PZSXXXX user::rwx group::r-x other::r-x # file: foo//foo.txt # owner: user # group: PZSXXXX user::rwx group::--- other::--- # file: foo//bar # owner: user # group: PZSXXXX user::rwx group::--- other::--- # file: foo//bar/foobar.py # owner: user # group: PZSXXXX user::rwx group::--- other::---
The setfacl command allows you to set a file or directory's ACL. This command is used as the following
$ setfacl [OPTION] COMMAND file
Where file is the file or directory you are trying to modify.
setfacl takes several commands to modify a file or directory's ACL
Command | Function |
---|---|
-m/--modify=acl |
modify the current ACL(s) of files. Use as the following setfacl -m u/g:user/group:r/w/x file |
-M/--modify-file=file |
read ACL entries to modify from a file. Use as the following setfaclt -M file_with_acl_permissions file_to_modify |
-x/--remove=acl |
remove entries from ACL(s) from files. Use as the following setfaclt -x u/g:user/group:r/w/x file |
-X/--remove-file=file |
read ACL entries to remove from a file. Use as the following setfaclt -X file_with_acl_permissions file_to_modify |
-b/--remove-all | Remove all extended ACL permissions |
Common option flags for setfacl are as follows:
Option | Function |
---|---|
-R/--recursive | Recurse through subdirectories |
-d/--default | Apply modifications to default ACLs |
--test | test ACL modifications (ACLs are not modified |
You can set a specific user's access priviledges using the following
setfacl -m u:username:-wx foo.txt
Similarly, a group's access priviledges can be set using the following
setfacl -m g:PZSXXXX:rw- foo.txt
You can remove a specific user's access using the following
setfacl -x user:username foo.txt
Grant a user recursive read access to a dir and all files/dirs under it (notice that the capital 'X' is used to provide execute permissions only to dirs and not files):
setfacl -R -m u:username:r-X shared-dir
Set a dir so that any newly created files or dirs under will inherit the parent dirs facl:
setfacl -d -m u:username:r-X shared-dir
This HOWTO will demonstrate how to lower ones' disk space usage. The following procedures can be applied to all of OSC's file systems.
We recommend users regularly check their data usage and clean out old data that is no longer needed.
Users who need assistance lowering their data usage can contact OSC Help.
Users should ensure that their jobs are written in such a way that temporary data is not saved to permanent file systems, such as the project space file system or their home directory.
If your job copies data from the scratch file system or its node's local disk ($TMPDIR
) back to a permanent file system, such as the project space file system or a home directory ( /users/PXX####/xxx####/
), you should ensure you are only copying the files you will need later.
The following commands will help you identify old data using the find
command.
find
commands may produce an excessive amount of output. To terminate the command while it is running, click CTRL + C
.This command will recursively search the users home directory and give a detailed listing of all files not accessed in the past 100 days.
The last access time atime
is updated when a file is opened by any operation, including grep
, cat
, head
, sort
, etc.
find ~ -atime +100 -exec ls -l {} \;
~
with the path you wish to search. A period .
can be used to search the current directory.100
with your desired number of days.find
, you can add | awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
to the end of the command:find ~ -atime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
This command will recursively search the users home directory and give a detailed listing of all files not modified in the past 100 days.
The last modified time mtime
is updated when a file's contents are updated or saved. Viewing a file will not update the last modified time.
find ~ -mtime +100 -exec ls -l {} \;
~
with the path you wish to search. A period .
can be used to search the current directory.100
with your desired number of days.find
, you can add | awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
to the end of the command:find ~ -mtime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
Adding the -size <size>
option and argument to the find command allows you to only view files larger than a certain size. This option and argument can be added to any other find command.
For example, to view all files in a users home directory that are larger than 1GB:
find ~ -size 1G -exec ls -l {} \;
If you no longer need the old data, you can delete it using the rm
command.
If you need to delete a whole directory tree (a directory and all of its subcontents, including other directories), you can use the rm -R
command.
For example, the following command will delete the data directory in a users home directory:
rm -R ~/data
If you would like to be prompted for confirmation before deleting every file, use the -i
option.
rm -Ri ~/data
Enter y
or n
when prompted. Simply pressing the enter button will default to n
.
find
The rm
command can be combined with any find
command to delete the files found. The syntax for doing so is:
find <location> <other find options> -exec rm -i {} \;
Where <other find options>
can include one or more of the options -atime <time>
, -mtime <time>
, and -size <size>
.
The following command would find all files in the ~/data
directory 1G or larger that have not been accessed in the past 100 days, and then prompt for confirmation to delete each file:
find ~/data -atime +100 -size 1G -exec rm -i {} \;
If you are absolutely sure the files identified by find
are okay to delete you can remove the -i
option to rm
and you will not be prompted. Extreme caution should be used when doing so!
If you still need the data but do not plan on needing the data in the immediate future, contact OSC Help to discuss moving the data to an archive file system. Requests for data to be moved to the archive file system should be larger than 1TB.
If you need the data but do not access the data frequently, you should compress the data using tar or gzip.
If you have the space available locally you can transfer your data there using sftp or Globus.
Globus is recommended for large transfers.
The OnDemand File application should not be used for transfers larger than 1GB.
This page outlines a way a professor can set up a file submission system at OSC for his/her classroom project.
After connecting to OSC system, professor runs submit_prepare
as
$ /users/PZS0645/support/bin/submit_prepare
Follow the instruction and provided the needed information (name of the assignment, TA username if appropriate, a size limit if not the default 1000MB per student, and whether or not you want the email notification of a submit). It will create a designated directory where students submit their assignments, as well as generate submit
for students used to submit homework to OSC, both of which are located in the directory specified by the professor.
If you want to create multiple directories for different assignments, simply run the following command again with specifying the different assignment number:
$ /users/PZS0645/support/bin/submit_prepare
The PI can also enforce the deadline by simply changing the permission of the submission directory or renaming the submission directory at the deadline.
(Only works on Owens): One way is to use at
command following the steps below:
at
command to specify the deadline:at [TIME]
where TIME
is formatted HH:MM AM/PM MM/DD/YY. For example:
at 2:30 PM 08/21/2017
$ chmod 700 [DIRECTORY]
where DIRECTORY
is the assignment folder to be closed off.
The permission of DIRECTORY
will be changed to 700 at 2:30PM, August 21, 2018. After that, the student will get an error message when he/she tries to submit an assignment to this directory.
A student should create one directory which includes all the files he/she wants to submit before running this script to submit his/her assignment. Also, the previous submission of the same assignment from the student will be replaced by the new submission.
To submit the assignment, the student runs submit
after connecting to OSC system as
$ /path/to/directory/from/professor/submit
Follow the instructions. It will allow students to submit an assignment to the designated directory specified by the professor and send a confirmation email, or return an error message.
Often users want to submit a large number of jobs all at once, with each using different parameters for each job. These parameters could be anything, including the path of a data file or different input values for a program. This how-to will show you how you can do this using a simple python script, a CSV file, and a template script. You will need to adapt this advice for your own situation.
Consider the following batch script:
#!/bin/bash #SBATCH --ntasks-per-node=2 #SBATCH --time=1:00:00 #SBATCH --job-name=week42_data8 # Copy input data to the nodes fast local disk cp ~/week42/data/source1/data8.in $TMPDIR cd $TMPDIR # Run the analysis full_analysis data8.in data8.out # Copy results to proper folder cp data8.out ~/week42/results
Let's say you need to submit 100 of these jobs on a weekly basis. Each job uses a different data file as input. You recieve data from two different sources, and so your data is located within two different folders. All of the jobs from one week need to store their results in a single weekly results folder. The output file name is based upon the input file name.
As you can see, this job follows a general template. There are three main parameters that change in each job:
full_analysis
If we replace these parameters with variables, prefixed by the dollar sign $
and surrounded by curly braces { }
, we get the following template script:
#!/bin/bash #SBATCH --ntasks-per-node=2 #SBATCH --time=1:00:00 # Copy input data to the nodes fast local disk cp ~/${WEEK}/data/${SOURCE}/${DATA}.in $TMPDIR cd $TMPDIR # Run the analysis full_analysis ${DATA}.in ${DATA}.out # Copy results to proper folder cp ${DATA}.out ~/${WEEK}/results
We can now use the sbatch --export
option to pass parameters to our template script. The format for passing parameters is:
sbatch --job-name=name --export=var_name=value[,var_name=value...]
Submitting 100 jobs using the sbatch --export
option manually does not make our task much easier than modifying and submitting each job one by one. To complete our task we need to automate the submission of our jobs. We will do this by using a python script that submits our jobs using parameters it reads from a CSV file.
Note that python was chosen for this task for its general ease of use and understandability -- if you feel more comfortable using another scripting language feel free to interpret/translate this python code for your own use.
The script for submitting multiple jobs using parameters can be found at ~support/share/misc/submit_jobs.py
Use the following command to run a test with the examples already created:
<your-proj-code>
with a project you are a member of to charge jobs to.~support/share/misc/submit_jobs.py -t ~support/share/misc/submit_jobs_examples/job_template2.sh WEEK,SOURCE,DATA ~support/share/misc/submit_jobs_examples/parameters_example2.csv <your-proj-code>
This script will open the CSV file and step through the file line by line, submitting a job for each line using the line's values. If the submit command returns a non-zero exit code, usually indicating it was not submitted, we will print this out to the display. The jobs will be submitted using the general format (using the example WEEK,SOURCE,DATA environment variables):
sbatch -A <project-account> -o ~/x/job_logs/x_y_z.job_log --job-name=x_y_z --export=WEEK=x,SOURCE=y,DATA=z job.sh
Where x, y and z are determined by the values in the CSV parameter file. Below we relate x to week, y to source and z to data.
We now need to create a CSV file with parameters for each job. This can be done with a regular text editor or using a spreadsheet editor such as Excel. By default you should use commas as your delimiter.
Here is our CSV file with parameters:
week42,source1,data1 week42,source1,data2 week42,source1,data3 ... week42,source2,data98 week42,source2,data99 week42,source2,data100
The submit script would read in the first row of this CSV file and form and execute the command:
sbatch -A <project-account> -o week42/job_logs/week42_source1_data1.job_log --job-name=week42_source1_data1 --export=WEEK=week42,SOURCE=source1,DATA=data1 job.sh
Once all the above is done, all you need to do to submit your jobs is to make sure the CSV file is populated with the proper parameters and run the automatic submission script with the right flags.
Try using submit_jobs.py --help
for an explanation:
$ ~support/share/misc/submit_jobs.py --help usage: submit_jobs.py [-h] [-t] jobscript parameter_names job_parameters_file account Automatically submit jobs using a csv file; examples in ~support/share/misc/submit_jobs_examples/ positional arguments: jobscript job script to use parameter_names comma separated list of names for each parameter job_parameters_file csv parameter file to use account project account to charge jobs to optional arguments: -h, --help show this help message and exit -t, --test test script without submitting jobs
-t
flag as well to check the submit commands.It is a good idea to copy the ~support/share/misc/submit_jobs.py
file and modify for unique use cases.
Contact oschelp@osc.edu and OSC staff can assist if there are questions using the default script or adjusting the script for unique use cases.
This tutorial goes over techniques to tune the performance of your application. Keep in mind that correctness of results, code readability/maintainability, and portability to future systems are more important than performance. Some factors that can affect performance are:
We will be using this code based on the HPCCD miniapp from Mantevo. It performs the Conjugate Gradient (CG) on a 3D chimney domain. CG is an iterative algorithm to numerically approximate the solution to a system of linear equations.
Run code with:
srun -n <numprocs> ./test_HPCCG nx ny nz
where nx, ny, nz are the number of nodes in the x, y, and z dimension on each processor.
First start an interactive Pitzer Desktop session with OnDemand.
You need to load intel 19.0.5 and mvapich2 2.3.3:
module load intel/19.0.5 mvapich2/2.3.3
Then clone the repository:
git clone https://code.osu.edu/khuvis.1/performance_handson.git
Debuggers let you execute your program one line at a time, inspect variable values, stop your programming at a particular line, and open a core file after the program crashes.
For debugging, use the -g flag and remove optimzation or set to -O0. For example:
icc -g -o mycode.c
gcc -g -O0 -o mycode mycode.c
To see compiler warnings and diagnostic options:
icc -help diag
man gcc
ARM DDT is a commercial debugger produced by ARM. It can be loaded on all OSC clusters:
module load arm-ddt
To run a non-MPI program from the command line:
ddt --offline --no-mpi ./mycode [args]
To run an MPI program from the command line:
ddt --offline -np num.procs ./mycode [args]
Compile and run the code:
make
srun -n 2 ./test_HPCCG 150 150 150
You should have received the following error message at the end of the program output:
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 308893 RUNNING AT p0200 = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPPLICATIN TERMINATED WITH EXIT STRING: Segmentation fault (signal 11) This typically referes to a problem with your application. Please see tthe FAQ page for debugging suggestions
Set compiler flags -O0 -g to CPP_OPT_FLAGS in Makefile. Then recompile and run with ARM DDT:
make clean; make module load arm-ddt ddt -np 2 ./test_HPCCG 150 150 150
When DDT stops on the segmentation fault, the stack is in the YAML_Element::~YAML_Element function of YAML_Element.cpp. Looking at this function, we see that the loop stops at children.size() instead of children.size()-1. So, line 13 should be changed from
for(size_t i=0; i<=children.size(); i++) {
to
for(size_t i=0; i<children.size(); i++) {
On Pitzer, there are 40 cores per node (20 cores per socket and 2 sockets per node). There is support for AVX512, vector length 8 double or 16 single precision values and fused multiply-add. (There is hardware support for 4 thread per core, but it is currently not enabled on OSC systems.)
There are three cache levels on Pitzer, and the statistics are shown in the table below:
Cache level | Size (KB) | Latency (cycles) | Max BW (bytes/cycle) | Sustained BW (bytes/cycle) |
---|---|---|---|---|
L1 DCU | 32 | 4-6 | 192 | 133 |
L2 MLC | 1024 | 14 | 64 | 52 |
L3 LLC | 28160 | 50-70 | 16 | 15 |
Never do heavy I/O in your home directory. Home directories are for long-term storage, not scratch files.
One option for I/O intensive jobs is to use the local disk on a compute node. Stage files to and from your home directory into $TMPDIR using the pbsdcp command (e.g. pbsdcp file1 file2 $TMPDIR), and execute the program in $TMPDIR.
Another option is to use the scratch file system ($PFSDIR). This is faster than other file systems, good for parallel jobs, and may be faster than local disk.
For more information about OSC's file system, click here.
For example batch scripts showing the use of $TMPDIR and $PFSDIR, click here.
For more information about Pitzer, click here.
FLOPS stands for "floating point operations per second." Pitzer has a theoretical maximum of 720 teraflops. With the LINPACK benchmark of solving a dense system of linear equations, 543 teraflops. With the STREAM benchmark, which measures sustainable memory bandwidth and the corresponding computation rate for vector kernels, copy: 299095.01 MB/s, scale: 298741.01 MB/s, add: 331719.18 MB/s, and traid: 331712.19 MB/s. Application performance is typically much less than peak/sustained performance since applications usually do not take full advantage of all hardware features.
You can time a program using the /usr/bin/time command. It gives results for user time (CPU time spent running your program), system time (CPU time spent by your program in system calls), and elapsed time (wallclock). It also shows % CPU, which is (user + system) / elapsed, as well as memory, pagefault, swap, and I/O statistics.
/usr/bin/time j3
5415.03user 13.75system 1:30:29elapsed 99%CPU \
(0avgtext+0avgdata 0maxresident)k \
0inputs+0outputs (255major+509333minor)pagefaults 0 swaps
You can also time portions of your code:
C/C++ | Fortran 77/90 | MPI (C/C++/Fortran) | |
---|---|---|---|
Wallclock |
time(2), difftime(3), getrusage(2) |
SYSTEM_CLOCK(2) | MPI_Wtime(3) |
CPU | times(2) | DTIME(3), ETIME(3) | X |
A profiler can show you whether code is compute-bound, memory-bound, or communication bound. Also, it shows how well the code uses available resources and how much time is spent in different parts of your code. OSC has the following profiling tools: ARM Performance Reports, ARM MAP, Intel VTune, Intel Trace Analyzer and Collector (ITAC), Intel Advisor, TAU Commander, and HPCToolkit.
For profiling, use the -g flag and specify the same optimization level that you normally would normally use with -On. For example:
icc -g -O3 -o mycode mycode.c
Look for
ARM PR works on precompiled binaries, so the -g flag is not needed. It gives a summary of your code's performance that you can view with a browser.
For a non-MPI program:
module load arm-pr
perf-report --no-mpi ./mycode [args]
For an MPI program:
module load arm-pr
perf-report --np num_procs ./mycode [args]
Interpreting this profile requires some expertise. It gives details about your code's performance. You can view and explore the resulting profile using an ARM client.
For a non-MPI program:
module load arm-map
map --no-mpi ./mycode [args]
For an MPI program:
module load arm-pr
map --np num_procs ./mycode [args]
For more information about ARM Tools, view OSC resources or visit ARM's website.
ITAC is a graphical tool for profiling MPI code (Intel MPI).
To use:
module load intelmpi # then compile (-g) code
mpiexec -trace ./mycode
View and explore the results using a GUI with traceanalyzer:
traceanalyzer <mycode>.stf
HPC software is traditionally written in Fortran or C/C++. OSC supports several compiler families. Intel (icc, icpc, ifort) usually gives fastest code on Intel architecture). Portland Group (PGI - pgcc, pgc++, pgf90) is good for GPU programming, OpenACC. GNU (gcc, g++, gfortran) is open source and universally available.
Compiler options are easy to use and let you control aspects of the optimization. Keep in mind that different compilers have different values for options. For all compilers, any highly optimized builds, such as those employing the options herein, should be thoroughly validated for correctness.
Some examples of optimization include:
Compiler flags to try first are:
Faster operations are sometimes less accurate. For Intel compilers, fast math is default with -O2 and -O3. If you have a problem, use -fp-model precise. For GNU compilers, precise math is default with -O2 and -O3. If you want faster performance, use -ffast-math.
Inlining is replacing a subroutine or function call with the actual body of the subprogram. It eliminates overhead of calling the subprogram and allows for more loop optimizations. Inlining for one source file is typically automatic with -O2 and -O3.
Options for Intel compilers are shown below. Don't use -fast for MPI programs with Intel compilers. Use the same compiler command to link for -ipo with separate compilation. Many other optimization options can be found in the man pages. The recommended options are -O3 -xHost. An example is ifort -O3 program.f90.
-fast | Common optimizations |
-On |
Set optimization level (0, 1, 2, 3) |
-ipo | Interprocedural optimization, multiple files |
-O3 | Loop transforms |
-xHost | Use highest instruction set available |
-parallel | Loop auto-parallelization |
Options for PGI compilers are shown below. Use the same compiler command to link for -Mipa with separate compilation. Many other optimization options can be found in the man pages. The recommended option is -fast. An example is pgf90 -fast program.f90.
-fast | Common optimizations |
-On |
Set optimization level (0, 1, 2, 3, 4) |
-Mipa | Interprocedural optimization |
-Mconcur | Loop auto-parallelization |
Options for GNU compilers are shown below. Use the same compiler command to link for -Mipa with separate compilation. Many other optimization options can be found in the man pages. The recommended options are -O3 -ffast-math. An example is gfortran -O3 program.f90.
-On | Set optimization level (0, 1, 2, 3) |
N/A for separate compilation | Interprocedural optimization |
-O3 | Loop transforms |
-ffast-math | Possibly unsafe floating point optimizations |
-march=native | Use highest instruction set available |
Compile and run with different compiler options:
time srun -n 2 ./test_HPCCG 150 150 150
Using the optimal compiler flags, get an overview of the bottlenecks in the code with the ARM performance report:
module load arm-pr
perf-report -np 2 ./test_HPCCG 150 150 150
On Pitzer, sample times were:
Compiler Option | Runtime (seconds) |
---|---|
-g | 129 |
-O0 -g | 129 |
-O1 -g | 74 |
-O2 -g | 74 |
-O3 -g |
74 |
The performance report shows that the code is compute-bound.
Compiler optimization reports let you understand how well the compiler is doing at optimizing your code and what parts of your code need work. They are generated at compile time and describe what optimizations were applied at various points in the source code. The report may tell you why optimizations could not be performed.
For Intel compilers, -qopt-report and outputs to a file.
For Portland Group compilers, -Minfo and outputs to stderr.
For GNU compilers, -fopt-info and ouputs to stderr by default.
A sample output is:
LOOP BEGIN at laplace-good.f(10,7)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at laplace-good.f(11,10)
<Peeled loop for vectorization>
LOOP END
LOOP BEGIN at laplace-good.f(11,10)
remark #15300: LOOP WAS VECTORIZED
LOOP END
LOOP BEGIN at laplace-good.f(11,10)
<Remainder loop for vectorization>
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP BEGIN at laplace-good.f(11,10)
<Remainder loop for vectorization>
LOOP END
LOOP END
Add the compiler flag -qopt-report=5 and recompile to view an optimization report.
Code is structured to operate on arrays of operands. Vector instructions are built into the processor. On Pitzer, the vector length is 16 single or 8 double precision. The following is a vectorizable loop:
do i = 1,N a(i) = b(i) + x(1) * c(i) end do
Some things that can inhibit vectorization are:
Use ARM MAP to identify the most expensive parts of the code.
module load arm-map map -np 2 ./test_HPCCG 150 150 150
Check the optimization report previously generated by the compiler (with -qopt-report=5) to see if any of the loops in the regions of the code are not being vectorized. Modify the code to enable vectorization and rerun the code.
Map shows that the most expensive segment of the code is lines 83-84 of HPC_sparsemv.cpp:
for (int j=0; j< cur_nnz; j++) y[i] += cur_vals[j]*x[cur_inds[j]];
The optimization report confirms that the loop was not vectorized due to a dependence on y.
Incrementing a temporary variable instead of y[i], should enable vectorization:
for (int j=0; j< cur_nnz; j++) sum += cur_vals[j]*x[cur_inds[j]]; y[i] = sum;
Recompiling and rerunning with change reduces runtime from 74 seconds to 63 seconds.
Memory access is often the most important factor in your code's performance. Loops that work with arrays should use a stride of one whenever possible. C and C++ are row-major (store elements consecutively by row in 2D arrays), so the first array index should be the outermost loop and the last array index should be the innermost loop. Fortran is column-major, so the reverse is true. You can get factor of 3 or 4 speedup just by using unit stride. Avoid using arrays of derived data types, structs, or classes. For example, use structs of arrays instead of arrays of structures.
Efficient cache usage is important. Cache lines are 8 words (64 bytes) of consecutive memory. The entire cache line is loaded when a piece of data is fetched.
The code below is a good example. 2 cache lines are used for every 8 loop iterations, and it is unit stride:
real*8 a(N), b(N)
do i = 1,N
a(i) = a(i) + b(i)
end do
! 2 cache lines:
! a(1), a(2), a(3) ... a(8)
! b(1), b(2), b(3) ... b(8)
The code below is a bad example. 1 cache line is loaded for each loop iteration, and it is not unit stride:
TYPE :: node
real*8 a, b, c, d, w, x, y, z
END TYPE node
TYPE(node) :: s(N)
do i = 1, N
s(i)%a = s(i)%a + s(i)%b
end do
! cache line:
! a(1), b(1), c(1), d(1), w(1), x(1), y(1), z(1)
Look again at the most expensive parts of the code using ARM MAP:
module load arm-map map -np 2 ./test_HPCCG 150 150 150
Look for any inefficient memory access patterns. Modify the code to improve memory access patterns and rerun the code. Do these changes improve performance?
Lines 110-148 of generate_matrix.cpp are nested loops:
for (int ix=0; ix<nx; ix++) { for (int iy=0; iy<ny; iy++) { for (int iz=0; iz<nz; iz++) { int curlocalrow = iz*nx*ny+iy*nx+ix; int currow = start_row+iz*nx*ny+iy*nx+ix; int nnzrow = 0; (*A)->ptr_to_vals_in_row[curlocalrow] = curvalptr; (*A)->ptr_to_inds_in_row[curlocalrow] = curindptr; . . . } } }
The arrays are accessed in a manner so that consecutive values of ix are accesssed in order. However, our loops are ordered so that the ix is the outer loop. We can reorder the loops so that ix is iterated in the inner loop:
for (int iz=0; iz<nz; iz++) { for (int iy=0; iy<ny; iy++) { for (int ix=0; ix<nx; ix++) { . . . } } }
This reduces the runtime from 63 seconds to 22 seconds.
OpenMP is a shared-memory, threaded parallel programming model. It is a portable standard with a set of compiler directives and a library of support functions. It is supported in compilers by Intel, Portland Group, GNU, and Cray.
The following are parallel loop execution examples in Fortran and C. The inner loop vectorizes while the outer loop executes on multiple threads:
PROGRAM omploop INTEGER, PARAMETER :: N = 1000 INTEGER i, j REAL, DIMENSION(N, N) :: a, b, c, x ... ! Initialize arrays !$OMP PARALLEL DO do j = 1, N do i = 1, N a(i, j) = b(i, j) + x(i, j) * c(i, j) end do end do !$OMP END PARALLEL DO END PROGRAM omploop
int main() { int N = 1000; float *a, *b, *c, *x; ... // Allocate and initialize arrays #pragma omp parallel for for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { a[i*N+j] = b[i*N+j] + x[i*N+j] * c[i*N+j] } } }
You can add an option to compile a program with OpenMP.
For Intel compilers, add the -qopenmp option. For example, ifort -qopenmp ompex.f90 -o ompex.
For GNU compilers, add the -fopenmp option. For example, gcc -fopenmp ompex.c -o ompex.
For Portland group compilers, add the -mp option. For example, pgf90 -mp ompex.f90 -o ompex.
To run an OpenMP program, requires multiple processors through Slurm (--N 1 -n 40) and set the OMP_NUM_THREADS environment variable (default is use all available cores). For the best performance, run at most one thread per core.
An example script is:
#!/bin/bash #SBATCH -J omploop #SBATCH -N 1 #SBATCH -n 40 #SBATCH -t 1:00 export OMP_NUM_THREADS=40 /usr/bin/time ./omploop
For more information, visit http://www.openmp.org, OpenMP Application Program Interface, and self-paced turorials. OSC will host an XSEDE OpenMP workshop on November 5, 2019.
MPI stands for message passing interface for when multiple processes run on one or more nodes. MPI has functions for point-to-point communication (e.g. MPI_Send, MPI_Recv). It also provides a number of functions for typical collective communication patterns, including MPI_Bcast (broadcasts value from root process to all other processes), MPI_Reduce (reduces values on all processes to a single value on a root process), MPI_Allreduce (reduces value on all processes to a single value and distributes the result back to all processes), MPI_Gather (gathers together values from a group of processes to a root process), and MPI_Alltoall (sends data from all processes to all processes).
A simple MPI program is:
#include <mpi.h> #include <stdio.h> int main(int argc, char *argv[]) { int rank, size; MPI_INIT(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_COMM_size(MPI_COMM_WORLD, &size); printf("Hello from node %d of %d\n", rank size); MPI_Finalize(); return(0); }
MPI implementations available at OSC are mvapich2, Intel MPI (only for Intel compilers), and OpenMPI.
MPI programs can be compiled with MPI compiler wrappers (mpicc, mpicxx, mpif90). They accept the same arguments as the compilers they wrap. For example, mpicc -o hello hello.c.
MPI programs must run in batch only. Debugging runs may be done with interactive batch jobs. srun automatically determines exectuion nodes from PBS:
#!/bin/bash #SBATCH -J mpi_hello #SBATCH -N 2 #SBATCH --ntasks-per-node=40 #SBATCH -t 1:00 cd $PBS_O_WORKDIR srun ./hello
For more information about MPI, visit MPI Forum and MPI: A Message-Passing Interface Standard. OSC will host an XSEDE MPI workshop on September 3-4, 2019. Self-paced tutorials are available here.
Use ITAC to get a timeline of the run of the code.
module load intelmpi LD_PRELOAD=libVT.so \ mpiexec -trace -np 40 ./test_HPCCG 150 150 150 traceanalyzer <stf_file>
Look at the Event Timeline (under Charts). Do you see any communication patterns that could be replaced by a single MPI command?
Looking at the Event Timeline, we see that a large part of runtime is spent in the following communication pattern: MPI_Barrier, MPI_Send/MPI_Recv, MPI_Barrier. We also see that during this communication rank 0 is sending data to all other rank. We should be able to replace all of these MPI calls with a single call to MPI_Bcast.
The relavent code is in lines 82-89 of ddot.cpp:
MPI_Barrier(MPI_COMM_WORLD); if(rank == 0) { for(int dst_rank=1; dst_rank < size; dst_rank++) { MPI_Send(&global_result, 1, MPI_DOUBLE, dst_rank, 1, MPI_COMM_WORLD); } } if(rank != 0) MPI_Recv(&global_result, 1, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Barrier(MPI_COMM_WORLD);
and can be replaced with:
MPI_Bcast(&global_result, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Although many of the tools we already mentioned can also be used with interpreted languages, most interpreted languages such as Python and R have their own profiling tools.
Since they are still running on th same hardware, the performance considerations are very similar for interpreted languages as they are for compiled languages:
One of Python's most common profiling tools is cProfile. The simplest way to use cProfile is to add several arguments to your Python call so that an ordered list of the time spent in all functions called during executation. For instance, if a program is typically run with the command:
python ./mycode.py
replace that with
python -m cProfile -s time ./mycode.py
Here is a sample output from this profiler:
See Python's documentation for more details on how to use cProfile.
One of the most popular profilers for R is profvis. It is not available by default with R so it will need to be installed locally before its first use and loaded into your environment prior to each use. To profile your code, just put how you would usually call your code as the argument into profvis:
$ R > install.packages('profvis') > library('profvis') > profvis({source('mycode.R')}
Here is a sample output from profvis:
For more information on profvis is available here.
First, enter the Python/ subdirectory of the code containing the python script ns.py. Profile this code with cProfile to determine the most expensive functions of the code. Next, rerun and profile with the array as an argument to ns.py. Which versions runs faster? Can you determine why it runs faster?
Execute the following commands:
python -m cProfile -s time ./ns.py python -m cProfile -s time ./ns.py array
In the original code, 66 seconds out 68 seconds are spent in presPoissPeriodic. When the array argument is passed, the time spent in this function is approximately 1 second and the total runtime goes down to about 2 seconds.
The speedup comes from the vectorization of the main computation in the body of presPoissPeriodic by replacing nester for loops with a single like operation on arrays.
Now, enter the R/ subdirectory of the code containing the R script lu.R. Make sure that you have the R module loaded. First, run the code with profvis without any additional arguments and then again with frmt="matrix".
Which version of the code runs faster? Can you tell why it runs faster based on the profile?
Runtime for the default version is 28 seconds while the runtime when frmt="matrix" is 20 seconds.
Here is the profile with default arguments:
And here is the profile with frmt="matrix":
We can see that most of the time is being spent in lu_decomposition. The difference, however, is that the dataframe version seems to have a much higher overhead associated with accessing elements of the dataframe. On the other hand, the profile of the matrix version seems to be much flatter with fewer functions being called during LU decomposition. This reduction in overhead by using a matrix instead of a dataframe results in the better performance.
This article discusses memory tuning strategies for VASP.
Typically the first approach for memory sensitive VASP issues is to tweak the data distribution (via NCORE or NPAR). The information at https://www.vasp.at/wiki/index.php/NPAR covers a variety of machines. OSC has fast communications via Infiniband.
Performance and memory consumption are dependent on the simulation model. So we recommend a series of benchmarks varying the number of nodes and NCORE. The recommended initial value for NCORE is the processor count per node which is the ntasks-per-node value in Slurm (the ppn value in PBS). Of course, if this benchmarking is intractable then one must reexamine the model. For general points see: https://www.vasp.at/wiki/index.php/Memory_requirements and https://www.vasp.at/wiki/index.php/Not_enough_memory And of course one should start small and incrementally improve or scale up one's model.
Using the key parameters with respect to memory scaling listed at the VASP memory requirements page one can rationalize VASP memory usage. The general approach is to study working calculations and then apply that understanding to scaled up or failing calculations. This might help one identify if a calculation is close to a node's memory limit and happens to cross over the limit for reasons that might be out of ones control, in which case one might need to switch to higher memory nodes.
Here is an example of rationalizing memory consumption. Extract from a simulation output the key parameters:
Dimension of arrays: k-points NKPTS = 18 k-points in BZ NKDIM = 18 number of bands NBANDS= 1344 total plane-waves NPLWV = 752640 ... dimension x,y,z NGXF= 160 NGYF= 168 NGZF= 224 support grid NGXF= 320 NGYF= 336 NGZF= 448
This yields 273 GB of memory, NKDIM*NBANDS*NPLWV*16 + 4*(NGXF/2+1)*NGYF*NGZF*16, according to
https://www.vasp.at/wiki/index.php/Memory_requirements
This estimate should be compared to actual memory reports. See for example XDModD and grafana. Note that most application software has an overhead in the ballpack of ten to twenty percent. In addition, disk caching can consume significant memory. Thus, one must adjust the memory estimate upward. It can then be comapred to the available memory per cluster and per cluster node type.
rclone
is a tool that can be used to upload and download files to a cloud storage (like Microsoft OneDrive, BuckeyeBox) from the command line. It's shipped as a standalone binary, but requires some user configuration before using. In this page, we will provide instructions on how to use rclone
to upload data to OneDrive. For instructions with other cloud storage, check rclone
Online documentation.
Before configuration, please first log into OSC OnDemand and request a Pitzer VDI session. Walltime of 1 hour should be sufficient to finish the configuration.
Once the session is ready, open a terminal. In the terminal, run the command
rclone config
It prompts you with a bunch of questions:
It shows "No remotes found -- make a new one" or list available remotes you made before
Answer "n" for "New remote"
Create an empty hello.txt
file and upload it to OneDrive using 'rclone copy' as below in a terminal:
touch hello.txt rclone copy hello.txt OneDrive:/test
This creates a toplevel directory in OneDrive called 'test' if it does not already exist, and uploads the file hello.txt
to it.
To verify the uploading is successful, you can either login to OneDrive in a web browser to check the file, or use rclone ls
command in the terminal as:
rclone ls OneDrive:/test
ls
on a large directory, because it's recursive. You can add a '--max-depth 1' flag to to stop the recursion.
Address Sanitizer is a tool developed by Google detect memory access error such as use-after-free and memory leaks. It is built into GCC versions >= 4.8 and can be used on both C and C++ codes. Address Sanitizer uses runtime instrumentation to track memory allocations, which mean you must build your code with Address Sanitizer to take advantage of it's features.
There is extensive documentation on the AddressSanitizer Github Wiki.
Memory leaks can increase the total memory used by your program. It's important to properly free memory when it's no longer required. For small programs, loosing a few bytes here and there may not seem like a big deal. However, for long running programs that use gigabytes of memory, avoiding memory leaks becomes increasingly vital. If your program fails to free the memory it uses when it no longer needs it, it can run out of memory, resulting in early termination of the application. AddressSanitizer can help detect these memory leaks.
Additionally, AddressSanitizer can detect use-after-free bugs. A use-after-free bug occurs when a program tries to read or write to memory that has already been freed. This is undefined behavior and can lead to corrupted data, incorrect results, and even program crashes.
We need to use gcc to build our code, so we'll load the gcc module:
module load gnu/9.1.0
The "-fsanitize=address" flag is used to tell the compiler to add AddressSanitizer.
Additionally, due to some environmental configuration settings on OSC systems, we must also statically link against Asan. This is done using the "-static-libasan" flag.
It's helpful to compile the code with debug symbols. AddressSanitizer will print line numbers if debug symbols are present. To do this, add the "-g" flag. Additionally, the "-fno-omit-frame-pointer" flag may be helpful if you find that your stack traces do not look quite correct.
In one command, this looks like:
gcc main.c -o main -fsanitize=address -static-libasan -g
Or, splitting into separate compiling and linking stages:
gcc -c main.c -fsanitize=address -g gcc main.o -o main -fsanitize=address -static-libasan
Notice that both the compilation and linking steps require the "-fsanitize-address" flag, but only the linking step requires "-static-libasan". If your build system is more complex, it might make sense to put these flags in CFLAGS and LDFLAGS environment variables.
And that's it!
First, let's look at a program that has no memory leaks (noleak.c):
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, const char *argv[]) { char *s = malloc(100); strcpy(s, "Hello world!"); printf("string is: %s\n", s); free(s); return 0; }
To build this we run:
gcc noleak.c -o noleak -fsanitize=address -static-libasan -g
And, the output we get after running it:
string is: Hello world!
That looks correct! Since there are no memory leaks in this program, AddressSanitizer did not print anything. But, what happens if there are leaks?
Let's look at the above program again, but this time, remove the free call (leak.c):
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, const char *argv[]) { char *s = malloc(100); strcpy(s, "Hello world!"); printf("string is: %s\n", s); return 0; }
Then, to build:
gcc leak.c -o leak -fsanitize=address -static-libasan
And the output:
string is: Hello world! ================================================================= ==235624==ERROR: LeakSanitizer: detected memory leaks Direct leak of 100 byte(s) in 1 object(s) allocated from: #0 0x4eaaa8 in __interceptor_malloc ../../.././libsanitizer/asan/asan_malloc_linux.cc:144 #1 0x5283dd in main /users/PZS0710/edanish/test/asan/leak.c:6 #2 0x2b0c29909544 in __libc_start_main (/lib64/libc.so.6+0x22544) SUMMARY: AddressSanitizer: 100 byte(s) leaked in 1 allocation(s).
This is a leak report from AddressSanitizer. It detected that 100 bytes were allocated, but never freed. Looking at the stack trace that it provides, we can see that the memory was allocated on line 6 in leak.c
Say we found the above leak in our code, and we wanted to fix it. We need to add a call to free. But, what if we add it in the wrong spot?
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, const char *argv[]) { char *s = malloc(100); free(s); strcpy(s, "Hello world!"); printf("string is: %s\n", s); return 0; }
The above (uaf.c) is clearly wrong. Albiet a contrived example, the allocated memory, pointed to by "s", was written to and read from after it was freed.
To Build:
gcc uaf.c -o uaf -fsanitize=address -static-libasan
Building it and running it, we get the following report from AddressSanitizer:
================================================================= ==244157==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b0000000f0 at pc 0x00000047a560 bp 0x7ffcdf0d59f0 sp 0x7ffcdf0d51a0 WRITE of size 13 at 0x60b0000000f0 thread T0 #0 0x47a55f in __interceptor_memcpy ../../.././libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790 #1 0x528403 in main /users/PZS0710/edanish/test/asan/uaf.c:8 #2 0x2b47dd204544 in __libc_start_main (/lib64/libc.so.6+0x22544) #3 0x405f5c (/users/PZS0710/edanish/test/asan/uaf+0x405f5c) 0x60b0000000f0 is located 0 bytes inside of 100-byte region [0x60b0000000f0,0x60b000000154) freed by thread T0 here: #0 0x4ea6f7 in __interceptor_free ../../.././libsanitizer/asan/asan_malloc_linux.cc:122 #1 0x5283ed in main /users/PZS0710/edanish/test/asan/uaf.c:7 #2 0x2b47dd204544 in __libc_start_main (/lib64/libc.so.6+0x22544) previously allocated by thread T0 here: #0 0x4eaaa8 in __interceptor_malloc ../../.././libsanitizer/asan/asan_malloc_linux.cc:144 #1 0x5283dd in main /users/PZS0710/edanish/test/asan/uaf.c:6 #2 0x2b47dd204544 in __libc_start_main (/lib64/libc.so.6+0x22544) SUMMARY: AddressSanitizer: heap-use-after-free ../../.././libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790 in __interceptor_memcpy Shadow bytes around the buggy address: 0x0c167fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c167fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c167fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c167fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c167fff8000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd =>0x0c167fff8010: fd fd fd fd fd fa fa fa fa fa fa fa fa fa[fd]fd 0x0c167fff8020: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa 0x0c167fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c167fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c167fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c167fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==244157==ABORTING
This is a bit intimidating. It looks like there's alot going on here, but it's not as bad as it looks. Starting at the top, we see what AddressSanitizer detected. In this case, a "WRITE" of 13 bytes (from our strcpy). Immediately below that, we get a stack trace of where the write occured. This tells us that the write occured on line 8 in uaf.c in the function called "main".
Next, AddressSanitizer reports where the memory was located. We can ignore this for now, but depending on your use case, it could be helpful information.
Two key pieces of information follow. AddressSanitizer tells us where the memory was freed (the "freed by thread T0 here" section), giving us another stack trace indicating the memory was freed on line 7. Then, it reports where it was originally allocated ("previously allocated by thread T0 here:"), line 6 in uaf.c.
This is likely enough information to start to debug the issue. The rest of the report provides details about how the memory is laid out, and exactly which addresses were accessed/written to. You probably won't need to pay too much attention to this section. It's a bit "down in the weeds" for most use cases.
AddresssSanitizer can also detect heap overflows. Consider the following code (overflow.c):
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, const char *argv[]) { // whoops, forgot c strings are null-terminated // and not enough memory was allocated for the copy char *s = malloc(12); strcpy(s, "Hello world!"); printf("string is: %s\n", s); free(s); return 0; }
The "Hello world!" string is 13 characters long including the null terminator, but we've only allocated 12 bytes, so the strcpy above will overflow the buffer that was allocated. To build this:
gcc overflow.c -o overflow -fsanitize=address -static-libasan -g -Wall
Then, running it, we get the following report from AddressSanitizer:
==168232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000003c at pc 0x000000423454 bp 0x7ffdd58700e0 sp 0x7ffdd586f890 WRITE of size 13 at 0x60200000003c thread T0 #0 0x423453 in __interceptor_memcpy /apps_src/gnu/8.4.0/src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:737 #1 0x5097c9 in main /users/PZS0710/edanish/test/asan/overflow.c:8 #2 0x2ad93cbd7544 in __libc_start_main (/lib64/libc.so.6+0x22544) #3 0x405d7b (/users/PZS0710/edanish/test/asan/overflow+0x405d7b) 0x60200000003c is located 0 bytes to the right of 12-byte region [0x602000000030,0x60200000003c) allocated by thread T0 here: #0 0x4cd5d0 in __interceptor_malloc /apps_src/gnu/8.4.0/src/libsanitizer/asan/asan_malloc_linux.cc:86 #1 0x5097af in main /users/PZS0710/edanish/test/asan/overflow.c:7 #2 0x2ad93cbd7544 in __libc_start_main (/lib64/libc.so.6+0x22544) SUMMARY: AddressSanitizer: heap-buffer-overflow /apps_src/gnu/8.4.0/src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:737 in __interceptor_memcpy Shadow bytes around the buggy address: 0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c047fff8000: fa fa 00 fa fa fa 00[04]fa fa fa fa fa fa fa fa 0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==168232==ABORTING
This is similar to the use-after-free report we looked at above. It tells us that a heap buffer overflow occured, then goes on to report where the write happened and where the memory was originally allocated. Again, the rest of this report describes the layout of the heap, and probably isn't too important for your use case.
AddressSanitizer can be used on C++ codes as well. Consider the following (bad_delete.cxx):
#include <iostream> #include <cstring> int main(int argc, const char *argv[]) { char *cstr = new char[100]; strcpy(cstr, "Hello World"); std::cout << cstr << std::endl; delete cstr; return 0; }
What's the problem here? The memory pointed to by "cstr" was allocated with new[]. An array allocation must be deleted with the delete[] operator, not "delete".
To build this code, just use g++ instead of gcc:
g++ bad_delete.cxx -o bad_delete -fsanitize=address -static-libasan -g
And running it, we get the following output:
Hello World ================================================================= ==257438==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60b000000040 #0 0x4d0a78 in operator delete(void*, unsigned long) /apps_src/gnu/8.4.0/src/libsanitizer/asan/asan_new_delete.cc:151 #1 0x509ea8 in main /users/PZS0710/edanish/test/asan/bad_delete.cxx:9 #2 0x2b8232878544 in __libc_start_main (/lib64/libc.so.6+0x22544) #3 0x40642b (/users/PZS0710/edanish/test/asan/bad_delete+0x40642b) 0x60b000000040 is located 0 bytes inside of 100-byte region [0x60b000000040,0x60b0000000a4) allocated by thread T0 here: #0 0x4cf840 in operator new[](unsigned long) /apps_src/gnu/8.4.0/src/libsanitizer/asan/asan_new_delete.cc:93 #1 0x509e5f in main /users/PZS0710/edanish/test/asan/bad_delete.cxx:5 #2 0x2b8232878544 in __libc_start_main (/lib64/libc.so.6+0x22544) SUMMARY: AddressSanitizer: alloc-dealloc-mismatch /apps_src/gnu/8.4.0/src/libsanitizer/asan/asan_new_delete.cc:151 in operator delete(void*, unsigned long) ==257438==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0 ==257438==ABORTING
This is similar to the other AddressSanitizer outputs we've looked at. This time, it tells us there's a mismatch between new and delete. It prints a stack trace for where the delete occured (line 9) and also a stack trace for where to allocation occured (line 5).
The documentation states:
This tool is very fast. The average slowdown of the instrumented program is ~2x
AddressSanitizer is much faster than tools that do similar analysis such as valgrind. This allows for usage on HPC codes.
However, if you find that AddressSanitizer is too slow for your code, there are compiler flags that can be used to disable it for specific functions. This way, you can use address sanitizer on cooler parts of your code, while manually auditing the hot paths.
The compiler directive to skip analyzing functions is:
__attribute__((no_sanitize_address)
It is possible to utilize Cron and the OSCusage command to send regular usage reports via email
It is easy to create Cron jobs on the Owens and Pitzer clusters at OSC. Cron is a Linux utility which allows the user to schedule a command or script to run automatically at a specific date and time. A cron job is the task that is scheduled.
Shell scripts run as a cron job are usually used to update and modify files or databases; however, they can perform other tasks, for example a cron job can send an email notification.
In order to use what cron has to offer, here is a list of the command name and options that can be used
Usage: crontab [options] file crontab [options] crontab -n [hostname] Options: -udefine user -e edit user's crontab -l list user's crontab -r delete user's crontab -i prompt before deleting -n set host in cluster to run users' crontabs -c get host in cluster to run users' crontabs -s selinux context -x enable debugging
crontab -l
crontab -e
MIN HOUR DOM MON DOW CMD
* * * * * {cmd} | mail -s "title of the email notification" {your email}
12 15 * * * /opt/osc/bin/OSCusage | mail -s "OSC usage on $(date)" {your email} 2> /path/to/file/for/stdout/and/stderr 2>&1
$ /opt/osc/bin/OSCusage --help usage: OSCusage.py [-h] [-u USER] [-s {opt,pitzer,glenn,bale,oak,oakley,owens,ruby}] [-A] [-P PROJECT] [-q] [-H] [-r] [-n] [-v] [start_date] [end_date] positional arguments: start_date start date (default: 2020-04-23) end_date end date (default: 2020-04-24) optional arguments: -h, --help show this help message and exit -u USER, --user USER username to run as. Be sure to include -P or -A. (default: kalattar) -s {opt,pitzer,glenn,bale,oak,oakley,owens,ruby}, --system {opt,pitzer,glenn,bale,oak,oakle -A Show all -P PROJECT, --project PROJECT project to query (default: PZS0715) -q show user data -H show hours -r show raw -n show job ID -v do not summarize
OSCusage 2018-01-24
OSCusage 2018-01-24 2018-01-25
ps aux | grep crontab
kill {PID}
crontab -e
It is now possible to run Docker and Singularity containers on the Owens and Pitzer clusters at OSC. Single-node jobs are currently supported, including GPU jobs; MPI jobs are planned for the future.
From the Docker website: "A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings."
This document will describe how to run Docker and Singularity containers on the Owens and Pitzer. You can use containers from Docker Hub, Sylabs Cloud, Singularity Hub, or any other source. As examples we will use hello-world
from Singularity Hub and ubuntu
from Docker Hub.
If you encounter any error, check out Known Issues on using Apptainer/Singularity at OSC. If the issue can not be resolved, please contact OSC help.
The most up-to-date help on Apptainer/Singularity comes from the command itself.
singularity help
User guides and examples can be found in Apptainer documents.
No setup is required. You can use Apptainer/Singularity directly on all clusters.
A Singularity container is a single file with a .sif
extension.
You can simply download ("pull") a container from a hub. Popular hubs are Docker Hub and Singularity Hub. You can go there and search if they have a container that meets your needs. Docker Hub has more containers and may be more up to date but supports a much wider community than just HPC. Singularity Hub is for HPC, but the number of available containers are fewer. Additionally there are domain and vendor repositories such as biocontainers and NVIDIA HPC containers that may have relevant containers.
Pull from the 7.2.0 branch of the gcc repository on Docker Hub. The 7.2.0 is called a tag.
singularity pull docker://gcc:7.2.0
Filename: gcc_7.2.0.sif
Pull an Ubuntu container from Docker Hub.
singularity pull docker://ubuntu:18.04
Filename: ubuntu_18.04.sif
Pull the singularityhub/hello-world
ontainer from the Singularity hub. Since no tag is specified it pulls from the master branch of the repository.
singularity pull shub://singularityhub/hello-world
Filename: hello-world_latest.sif
Downloading containers from the hubs is not the only way to get one. You can, for example get a copy from your colleague's computer or directory. If you would like to create your own container you can start from the user guide below. If you have any questions, please contact OSC Help.
There are four ways to run a container under Apptainer/Singularity.
You can do this either in a batch job or on a login node.
We note that the operating system on Owens is Red Hat:
[owens-login01]$ cat /etc/os-release NAME="Red Hat Enterprise Linux Server" VERSION="7.5 (Maipo)" ID="rhel" [..more..]
In the examples below we will often check the operating system to show that we are really inside a container.
If you simply run the container image it will execute the container’s runscript.
Example: Run singularityhub/hello-world
Note that this container returns you to your native OS after you run it.
[owens-login01]$ ./hello-world_latest.sif Tacotacotaco
The Singularity “run” sub-command does the same thing as running a container directly as described above. That is, it executes the container’s runscript.
Example: Run a container from a local file
[owens-login01]$ singularity run hello-world_latest.sif Tacotacotaco
Example: Run a container from a hub without explicitly downloading it
[owens-login01]$ singularity run shub://singularityhub/hello-world INFO: Downloading shub image Progress |===================================| 100.0% Tacotacotaco
The Singularity “exec” sub-command lets you execute an arbitrary command within your container instead of just the runscript.
Example: Find out what operating system the singularityhub/hello-world
container uses
[owens-login01]$ singularity exec hello-world_latest.sif cat /etc/os-release NAME="Ubuntu" VERSION="14.04.5 LTS, Trusty Tahr" ID=ubuntu [..more..]
The Singularity “shell” sub-command invokes an interactive shell within a container.
Example: Run an Ubuntu shell. Note the “Singularity” prompt within the shell.
[owens-login01 ~]$ singularity shell ubuntu_18.04.sif Singularity ubuntu_18.04.sif:~> cat /etc/os-release NAME="Ubuntu" VERSION="18.04 LTS (Bionic Beaver)" ID=ubuntu [.. more ..] Singularity ubuntu_18.04.sif:~> exit exit
When you use a container you run within the container’s environment. The directories available to you by default from the host environment are
/fs/project
/fs/scratch
/tmp
You can review our Available File Systems page for more details about our file system access policy.
If you run the container within a job you will have the usual access to the $PFSDIR
environment variable with adding node attribute "pfsdir
" in the job request (nodes=XX:ppn=XX:pfsdir
). You can access most of our file systems from a container without any special treatment.
If you have a GPU-enabled container you can easily run it on Owens or Pitzer just by adding the --nv
flag to the singularity exec or run command. The example below comes from the "exec" command section of Singularity User Guide. It runs a TensorFlow example using a GPU on Owens. (Output has been omitted from the example for brevity.)
[owens-login01]$ sinteractive -n 28 -g 1...
[o0756]$
git clone https://github.com/tensorflow/models.git
[o0756]$
singularity exec --nv docker://tensorflow/tensorflow:latest-gpu \ python ./models/tutorials/image/mnist/convolutional.py
In some cases it may be necessary to bind the CUDA_HOME path and add $CUDA_HOME/lib64
to the shared library search path:
[owens-login01]$ sinteractive -n 28 -g 1...
[o0756]$
module load cuda [o0756]$ export APPTAINER
_BINDPATH=$CUDA_HOME [o0756]$ export APPTAINERENV_LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64[o0756]$
singularity exec --nv my_container mycmd
If you want to create or modify a container, you need root-like privilege on any OSC system. Alternatively you can create a container on a hub or a local computer then pull/upload it to OSC system.
JupyterLab stores the main build of JupyterLab with associated data, including extensions in Application Directory. The default Application Directory is the JupyterLab installation directory where is read-only for OSC users. Unlike Jupyter Notebook, JupyterLab cannot accommodate multiple paths for extensions management. Therefore we set the user's home directory for Application Directory so as to allow user to manage extensions.
After launching a JupyterLab session, open a notebook and run
!jupyter lab path
Check if home directory is set for to the Application Directory
Application directory: /users/PXX1234/user/.jupyter/lab/3.0 User Settings directory: /users/PXX1234/user/.jupyter/lab/user-settings Workspaces directory: /users/PXX1234/user/ondemand/data/sys/dashboard/batch_connect/dev/bc_osc_jupyter/output/f2a4f918-b18c-4d2a-88bc-4f4e1bdfe03e
If home directory is NOT set, try removing the corresonding directory, e.g. if you are using JupyterLab 2.2, remove the entire directory $HOME/.jupyter/lab/2.2
and re-launch JupyterLab 2.2.
If this is the first time to use extension or use extensions that are installed with different Jupyter version or on different cluster, you will need to run
!jupyter lab build
to initialize the JupyterLab application.
To manage and install extensions, simply click Extension Manager icon at the side bar:
Globus is a cloud-based service designed to let users move, share, and discover research data via a single interface, regardless of its location or number of files or size.
Globus was developed and is maintained at the University of Chicago and is used extensively at supercomputer centers and major research facilities.
Globus is available as a free service that any user can access. More on how Globus works can be found on the Globus "How It Works" page.
Further Reading
Globus is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems. It aims to make transfers a "click-and-forget" process by setting up configuration details in the background and automating fault recovery.
Globus can be used for both file transfers between OSC and:
Users transferring between OSC and another computing institution with Globus installed do not need to install Globus Connect Personal, and can skip this page.
To use Globus to transfer from a personal computer, you will need to install the Globus Connect Personal client on your computer following the steps below. Those transferring between OSC and another computing institution can skip to Usage.
Watch How to Install Globus Personal
tar -zxvf
globusconnect
, found within the unzipped directoryBy default, Globus will only add certain default folders to the list of files and directories accessible by Globus. To change/add/remove files and directories from this list:
Windows
Mac
Linux
~/.globusonline/lta/config-paths
file. This file is a plain text file, with each line corresponding to the configuration of a particular directory path you wish to make accessible. Each line consists of 3 comma-separated fields as below
<path1>,<sharing flag>,<R/W flag> <path2>,<sharing flag>,<R/W flag> <path3>,<sharing flag>,<R/W flag> ...
Path: an absolute directory/path to be permitted. A leading tilde "~" can be used to represent the home directory of the user that runs globusconnectpersonal.
Sharing Flag: it controls sharing, with a value of "1" allowing sharing for the path and a value of "0" disallowing sharing for the path.
R/W Flag: it determines whether the path will be accessible read-only or read/write, with a "1" permitting read/write access and a "0" marking the path as read-only.
~/.globusonline/lta/config-paths
file, you must stop and restart Globus Connect Personal before the changes will take effect as below$ ./globusconnectpersonal -stop $ ./globusconnectpersonal -start &
Globus is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems. It aims to make transfers a "click-and-forget" process by setting up configuration details in the background and automating fault recovery.
Globus can be used for both file transfers between OSC and:
Users transferring between OSC and another computing institution with Globus installed do not need to install Globus Connect Personal.
sftp.osc.edu
. Our general recommendation is that for small files - measured in MB to several hundred MB - to use OnDemand or SFTP. You can continue to use SFTP and get reasonable performance up to file sizes of several GB. For transfers of several GB or larger, you should consider using Globus.We provide instructions on how to transfer data on this page. If you would like to share your data with your collaborators, please see this page.
Watch How to Transfer Files Using Globus
this page. Select 'Ohio Supercomputer Center (OSC)' as the identity provider. You will be redirected to the page below. Provide your OSC HPC credentials:
Endpoint | |
---|---|
OSC's home directory | OSC $HOME |
OSC's project directory | OSC /fs/project |
OSC's scratch directory | OSC /fs/scratch |
OSC's ess storage | OSC /fs/ess |
AWS S3 storage | OSC S3 |
Globus Connect Server allows OSC users to share data with their collaborators who do not have OSC HPC account (the collaborator needs to sign up for a free Globus account though). The advantage of data sharing via Globus is that you do not have to move your data in order to share it. You can select directory paths to be securely shared with your collaborator, and grant them read-only or read-write access.
Watch How to Share Files Using Globus
Endpoint | |
---|---|
OSC's home directory | OSC $HOME |
OSC's project directory | OSC /fs/project |
OSC's scratch directory | OSC /fs/scratch |
OSC's ess storage | OSC /fs/ess |
AWS S3 storage | OSC S3 |
Globus Connect Server v5 allows OSC clients to connect to Amazon S3. Please follow the steps below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllBuckets",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "*"
},
{
"Sid": "Bucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::osc-globus-test"
},
{
"Sid": "Objects",
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::osc-globus-test/*"
}
]
}
Adding InCommon authentication to your Globus account allows you to login to Globus Online using your university credentials. Using this process you can store your Globus username password for safe keeping, and instead use your university username and password to login. If your already logged in to your university authentication system, logging in to Globus can be as simple as two clicks away.
To use this feature, your university needs to be a InCommon participant. Some Ohio universities active in InCommon include: Ohio State University, Case Western University, Columbus State Community College, Miami University, Ohio Northern University, Ohio University, University of Findlay, University of Dayton, and many more.
For a complete list, visit https://incommon.org/participants/ .
When you go to login next, click "alternative login" and then "InCommon / CILogon". Select your university on the next page, and login using your university credentials. Globus will remember this preference, and automatically prompt you to login using your university authentication next time.
OSC clients who are affiliated with Ohio State can deploy their own endpoint on a server using OSU subscriptions. Please follow the steps below:
SSHing directly to a compute node at OSC - even if that node has been assigned to you in a current batch job - and starting VNC is an "unsafe" thing to do. When your batch job ends (and the node is assigned to other users), stray processes will be left behind and negatively impact other users. However, it is possible to use VNC on compute nodes safely.
The examples below are for Pitzer. If you use other systems, please see this page for supported versions of TurboVNC on our systems.
Step one is to create your VNC server inside a batch job.
The preferred method is to start an interactive job, requesting an gpu node, and then once your job starts, you can start the VNC server.
salloc --nodes=1 --ntasks-per-node=40 --gpus-per-node=1 --gres=vis --constraint=40core srun --pty /bin/bash
This command requests an entire GPU node, and tells the batch system you wish to use the GPUs for visualization. This will ensure that the X11 server can access the GPU for acceleration. In this example, I have not specified a duration, which will then default to 1 hour.
module load virtualgl module load turbovnc
Then start your VNC server. (The first time you run this command, it may ask you for a password - this is to secure your VNC session from unauthorized connections. Set it to whatever password you desire. We recommend a strong password.)
vncserver
vncpasswd
command.The output of this command is important: it tells you where to point your client to access your desktop. Specifically, we need both the host name (before the :), and the screen (after the :).
New 'X' desktop is p0302.ten.osc.edu:1
Because the compute nodes of our clusters are not directly accessible, you must log in to one of the login nodes and allow your VNC client to "tunnel" through SSH to the compute node. The specific method of doing so may vary depending on your client software.
The port assigned to the vncserver will be needed. It is usually 5900 + <display_number>. e.g.
New 'X' desktop is p0302.ten.osc.edu:1
would use port 5901.
I will be providing the basic command line syntax, which works on Linux and MacOS. You would issue this in a new terminal window on your local machine, creating a new connection to Pitzer.
ssh -L <port>:<node_hostname>.ten.osc.edu:<port> <username>@pitzer.osc.edu
The above command establishes a proper ssh connection for the vnc client to use for tunneling to the node.
Open your VNC client, and connect to localhost:<screen_number>
, which will tunnel to the correct node on Pitzer.
This example uses Chicken of the VNC, a MacOS VNC client. It is a vncserver started on host n0302 with port 5901 and display 1.
The default window that comes up for Chicken requires the host to connect to, the screen (or port) number, and optionally allows you to specify a host to tunnel through via SSH. This screenshot shows a proper configuration for the output of vncserver shown above. Substitute your host, screen, and username as appropriate.
When you click [Connect], you will be prompted for your HPC password (to establish the tunnel, provided you did not input it into the "password" box on this dialog), and then (if you set one), for your VNC password. If your passwords are correct, the desktop will display in your client.
This example shows how to create a SSH tunnel through your ssh client. We will be using Putty in this example, but these steps are applicable to most SSH clients.
First, make sure you have x11 forwarding enabled in your SSH client.
Next, open up the port forwarding/tunnels settings and enter the hostname and port you got earlier in the destination field. You will need to add 5900 to the port number when specifiying it here. Some clients may have separate boxes for the desination hostname and port.
For source port, pick a number between 11-99 and add 5900 to it. This number between 11-99 will be the port you connect to in your VNC client.
Make sure to add the forwaded port, and save the changes you've made before exiting the configutations window.
Now start a SSH session to the respective cluster your vncserver is running on. The port forwarding will automatically happen in the background. Closing this SSH session will close the forwarded port; leave the session open as long as you want to use VNC.
Now start a VNC client. TurboVNC has been tested with our systems and is recommended. Enter localhost:[port], replacing [port] with the port between 11-99 you chose earlier.
If you've set up a VNC password you will be prompted for it now. A desktop display should pop up now if everything is configured correctly.
Occasionally you may make a mistake and start a VNC server on a login node or somewhere else you did not want to. In this case it is important to know how to properly kill your VNC server so no processes are left behind.
The command syntax to kill a VNC session is:
vncserver -kill :[screen]
In the example above, screen would be 1.
You need to make sure you are on the same node you spawned the VNC server on when running this command.
The IPython kernel for a Conda/virtual environment* must be installed on Jupyter prior to use.
To perform the kernel installation, users should load the preferred version of Python*
module load python
module spider python
to view available python versionsand run one of the following commands, depending on how your Conda/virtual environment being created. Make sure to replace MYENV with the name of your conda environment or path to the environment.
if the Conda environment is created via conda create -n MYENV
~support/classroom/tools/create_jupyter_kernel conda MYENV
if the Conda environment is created via conda create -p /path/to/MYENV
~support/classroom/tools/create_jupyter_kernel conda /path/to/MYENV
if the Python virtual environment is created via python3 -m venv /path/to/MYENV
~support/classroom/tools/create_jupyter_kernel venv /path/to/MYENV
According to Jupyterlab page, debugger requires ipykernel >= 6. Please create your own kernel with conda using the following commands:
$ module load miniconda $ conda create -n jupyterlab-debugger -c conda-forge "ipykernel>=6" xeus-python $ ~support/classroom/tools/create_jupyter_kernel conda jupyterlab-debugger
You should see a kernelspec 'conda_jupyterlab-debugger' created in home directory. Once the debugger kernel is done, you can use it:
1. go to OnDemand
2. request a JupyterLab app with kernel 3
3. open a notebook with the debugger kernel.
4. you can enable debug mode at upper-right kernel of the notebook
If the envirnoment is rebuilt or renamed, users may want to erase any custom jupyter kernel installations.
rm -rf ~/.local/share/jupyter/kernels/${MYENV}
If the create_jupyter_kernel
script does not work for you, try the following steps to manually install kernel:
# change to the proper version of python
module load python
# replace with the name of conda env
MYENV=useful-project-name
# Activate your conda/virtual environment
## For Conda environment
source activate $MYENV
# ONLY if you created venv instead of conda env
## For Python Virtual environment
source /path/to/$MYENV/bin/activate
# Install Jupyter kernel
python -m ipykernel install --user --name $MYENV --display-name "Python ($MYENV)"
Many software packages require a license. These licenses are usually made available via a license server, which allows software to check out necessary licenses. In this document external refers to a license server that is not hosted inside OSC.
If you have such a software license server set up using a license manager, such as FlexNet, this guide will instruct you on the necessary steps to connect to and use the licenses at OSC.
Users who wish to host their software licenses inside OSC should consult OSC Help.
Broadly speaking, there are two different ways in which the external license server's network may be configured. These differ by whether the license server is directly externally reachable or if it sits behind a private internal network with a port forwarding firewall.
If your license server sits behind a private internal network with a port forwarding firewall you will need to take additional steps to allow the connection from our systems to the license server to be properly routed.
If you are unsure about which category your situation falls under contact your local IT administrator.
In order for connections from OSC to reach the license server, the license server's firewall will need to be configured. All outbound network traffic from all of OSC's compute nodes are routed through a network address translation host (NAT).
The license server should be configured to allow connections from nat.osc.edu including the following IP addresses to the SERVER:PORT where the license server is running:
A typical FlexNet-based license server uses two ports: one is server port and the other is daemon port, and the firewall should be configured for the both ports. A typical license file looks, for example,
SERVER licXXX.osc.edu 0050XXXXX5C 28000 VENDOR {license name} port=28001
In this example, "28000" is the server port, and "28001" is the daemon port. The daemon port is not mandatory if you use it on a local network, however it becomes necessary if you want to use it outside of your local network. So, please make sure you declared the daemon port in the license file and configured the firewall for the port.
The firewall settings should be verified by attempting to connect to the license server from the compute environment using telenet.
Get on to a compute node by requesting a short, small, interactive job and test the connection using telenet:
telnet <License Server IP Address> <Port#>
It is also recommended to restrict accessibility using the remote license server's access control mechanisms, such as limiting access to particular usernames in the options.dat file used with FlexNet-based license servers.
For FlexNet tools, you can add the following line to your options.dat file, one for each user.
INCLUDEALL USER <OSC username>
If you have a large number of users to give access to you may want to define a group using GROUP
within the options.dat file and give access to that whole group using INCLUDEALL GROUP <group name>
.
Users who use other license managers should consult the license manager's documentation.
The software must now be told to contact the license server for it's licenses. The exact method of doing so can vary between each software package, but most use an environment variable that specifies the license server IP address and port number to use.
For example LS DYNA uses the environment variable LSTC_LICENSE
and LSTC_LICENSE_SERVER
to know where to look for the license. The following lines would be added to a job script to tell LS-DYNA to use licenses from port 2345 on server 1.2.3.4, if you use bash:
export LSTC_LICENSE=network export LSTC_LICENSE_SERVER=2345@1.2.3.4
or, if you use csh:
setenv LSTC_LICENSE network setenv LSTC_LICENSE_SERVER 2345@1.2.3.4
If the license server is behind a port forwarding firewall, and has a different IP address from the IP address of the firewall, additional steps must be taken to allow connections to be properly routed within the license server's internal network.
The following outlines details particular to a specific software package.
Uses the following environment variables:
ANSYSLI_SERVERS=<port>@<IP> ANSYSLMD_LICENSE_FILE=<port>@<IP>
If your license server is behind a port forwarding firewall and you cannot use a fully qualified domain name in the license file, you can add ANSYSLI_EXTERNAL_IP={external IP address} to ansyslmd.ini on the license server.
This document shows you how to set soft limits using the ulimit
command.
The ulimit
command sets or reports user process resource limits. The default limits are defined and applied when a new user is added to the system. Limits are categorized as either soft or hard. With the ulimit
command, you can change your soft limits for the current shell environment, up to the maximum set by the hard limits. You must have root user authority to change resource hard limits.
ulimit [-HSTabcdefilmnpqrstuvx [Limit]]
flags | description |
---|---|
-H | Specifies that the hard limit for the given resource is set. If you have root user authority, you can increase the hard limit. Anyone can decrease it |
-S | Specifies that the soft limit for the given resource is set. A soft limit can be increased up to the value of the hard limit. If neither the -H nor -S flags are specified, the limit applies to both |
-a | Lists all of the current resource limits |
-b | The maximum socket buffer size |
-c | The maximum size of core files created |
-d | The maximum size of a process's data segment |
-e | The maximum scheduling priority ("nice") |
-f | The maximum size of files written by the shell and its children |
-i | The maximum number of pending signals |
-l | The maximum size that may be locked into memory |
-m | The maximum resident set size (many systems do not honor this limit) |
-n | The maximum number of open file descriptors (most systems do not allow this value to be set) |
-p | The pipe size in 512-byte blocks (this may not be set) |
-q | The maximum number of bytes in POSIX message queues |
-r | The maximum real-time scheduling priority |
-s | The maximum stack size |
-t | The maximum amount of cpu time in seconds |
-u | The maximum number of processes available to a single user |
-v | The maximum amount of virtual memory available to the shell and, on some systems, to its children |
-x | The maximum number of file locks |
-T | The maximum number of threads |
The limit for a specified resource is set when the Limit parameter is specified. The value of the Limit parameter can be a number in the unit specified with each resource, or the value "unlimited." For example, to set the file size limit to 51,200 bytes, use:
ulimit -f 100
To set the size of core dumps to unlimited, use:
ulimit –c unlimited
The ulimit
command affects the current shell environment. When a MPI program is started, it does not spawn in the current shell. You have to use srun to start a wrapper script that sets the limit if you want to set the limit for each process. Below is how you set the limit for each shell (We use ulimit –c unlimited
to allow unlimited core dumps, as an example):
#!/bin/bash #SBATCH --ntasks=2 #SBATCH --time=5:00:00 #SBATCH ... ... srun ./test1 ...
#!/bin/bash ulimit –c unlimited .....(your own program)
sbatch myjob
The Ohio Supercomputer Center provides High Performance Computing resources and expertise to academic researchers across the State of Ohio. Any paper citing this document has utilized OSC to conduct research on our production services. OSC is a member of the Ohio Technology Consortium, a division of the Ohio Department of Higher Education.
OSC services can be cited by visiting the documentation for the service in question and finding the "Citation" page (located in the menu to the side).
HPC systems currently in production use can be found here: https://www.osc.edu/supercomputing/hpc
Decommissioned HPC systems can be found here: https://www.osc.edu/supercomputing/hpc/decommissioned.
Please refer to our branding webpage.
We prefer that you cite OSC when using our services, using the following information, taking into account the appropriate citation style guidelines. For your convenience, we have included the citation information in BibTeX and EndNote formats.
Ohio Supercomputer Center. 1987. Ohio Supercomputer Center. Columbus OH: Ohio Supercomputer Center. http://osc.edu/ark:/19495/f5s1ph73.
BibTeX:
@misc{OhioSupercomputerCenter1987, ark = {ark:/19495/f5s1ph73}, url = {http://osc.edu/ark:/19495/f5s1ph73}, year = {1987}, author = {Ohio Supercomputer Center}, title = {Ohio Supercomputer Center} }
EndNote:
%0 Generic %T Ohio Supercomputer Center %A Ohio Supercomputer Center %R ark:/19495/f5s1ph73 %U http://osc.edu/ark:/19495/f5s1ph73 %D 1987
Here is an .ris file to better suit your needs. Please change the import option to .ris.
Recorded on January 13, 2021.
[Slide: "An introduction to OSC services, hardwware, and environment"] So we're going to do an introduction to OSC services, hardware and environment today. My name is Kate Cahill. As I said, I'm the education and training specialist at OSC, and I've been doing this for several years. My background is chemistry. So I was a computational chemist and I was working as a postdoc at Ohio State. And then I started working at OSC.
[Slide: "Outline"] So today, we're going to cover just a brief overview of what what OSC is and some high performance computing concepts for new users to keep in mind. We'll go over the hardware that OSC offers to our users, how to get a new… how to get started as a new user or a new API with a new project, the user environment on our clusters, some information about the software that's available at OSC Systems, a brief introduction to batch processing – talking about how to submit jobs to our batch system – and then, like I said, we'll switch over to a browser and look at our OnDemand portal.
[Slide: "What is the Ohio Supercomputer Center?"] So what is the Ohio Supercomputer Center?
[Slide: "About OSC"] Well, OSC has been around since 1987, and we're part of the Ohio Department of Higher Education, so we're actually a state agency and we're a statewide high performance computing and computational science resource for all universities in Ohio. So we provide the high performance computing resources as well as expertise in doing different kinds of computational science for all higher ed institutions, and also commercial entities in Ohio as well can use us.
[Slide: "Service Catalog"] And so here are some of the services that OSC provides. You probably are aware of our clusters, so our main services are cluster computing, but we also provide different types of research data storage for different data needs, education services, education and training services – like workshops and support for educa…, you know, using HPC and education in different ways. We have a Web Software Development team that works to make our portal more efficient and more effective for our users, and our Scientific Software Development team, and they work on keeping our software on our on our clusters up-to-date and effective and also to support users doing different things with the software that we have.
[Slide: "Client Services"] So you can see kind of what we've been up to in the past year. How many universities [27] have been active with us. We have 47 companies as well that that do computation using our our resources. We also have [41] universities outside of Ohio that also contract to use our resources. Over four thousand [4,419] clients, we had 64 courses that used HPC resources through us, several hundred [701] new projects and active projects over the past year, as well as several [30] training opportunities [462 trainees]. And then we also track publications, so we had almost two hundred [195] publications that cited (that used) OSC as part of their research.
[Slide: "Fields of Study"] And you can see here [a pie graph], this is just sort of a breakdown of the information that we have about the areas of science and technology that use OSC resources. So this is kind of a breakdown by but the information that we have – we may not have this for all of our jobs – but for the jobs that we do know this information, you can see that the large majority of our jobs are in natural sciences with chemistry and biology being the large ones and then engineering and technology of different types. But we have jobs that run for many different domains, so this is just a snapshot.
[Slide: "OSC Classroom Usage"] And then here, some details [a bar chart] about OSC classroom usage, so this is number of classes on the left by institution that have used OSC in the past year [19 universities overall in 2020], and then on the right this is the number of students per institution [7,600 students in 2020]. So you can see we have a lot of different types of courses at different size institutions all around Ohio.
[Slide: "HPC Concepts"] And so now I'm going to switch over to talk about the high performance computing concepts that can be important to sort of cover quickly, particularly for people who have never used the high performance computing research before. It's just good to have these concepts kind of clear at the beginning.
[Slide: "Why Use HPC?"] And so there are many different reasons why people need to use high performance computing. It serves a lot of different types of calculations. And certainly in recent years a lot of new types of calculations have been developed. So data intensive or machine learning type applications are new and growing areas. But from the user perspective, you could come to to use HPC resources because the job that you are running on your your laptop or your PC is just taking too long. So you may have an analysis that takes several days or a week or more to run on your personal computer. And you want to see if if a high performance computing resource will speed that process up for you. And so there are a lot of different types of hardware you can take advantage of to change how long your calculations might take or the calculations you may want to… you can run one on your PC, but you may want to run one hundred of these calculations, so you just need more processing ability. And then there's a question of data size. So if the data that you were working with or generating is too large to be stored or accessed effectively on a on a personal-computer-size device, you need more memory or storage to work with that. And so those resources are also part of what HPC can do. And for some people, it's that there's a specific software package that they can have access to or that will run effectively on an HPC system that we can support or we can help you work with if we don't support it ourselves.
[Slide: "What is the difference between your laptop and a supercomputer?"] And it's important to kind of just recognize the difference between using an HPC resource and using a laptop. So just from a basic perspective, your laptop has a certain number of cores. It has a certain number of memory, a certain amount of memory and storage. A supercomputer will be ten thousand times or more that processing and data analysis ability. Another feature of a supercomputer is that it's a remote system. So you're not going to you're not going to connect with it directly. You're not going to sit at a terminal and just type right into the supercomputer. You have to connect with it over a network. And so that changes how you interact with it. You need to learn the tools of how to connect remotely, and the type of network you use can change your experience. So if you have a slower network for some reason, because you're away from your office or traveling, that can change how you can interact with the system in some ways – it depends. And the other point to make is that this is a shared resource, so at any given time you'll have hundreds of users all accessing the system, and so that means that we have to use the system in a certain way so that everybody can share it effectively and that there's not any kind of slowdown or lag because you are not able to… because somebody is using the system improperly. So we'll talk about some of the things you have to keep in mind to be a good citizen of HPC.
[Slide: "HPC Terminology"] And then some terminology that's important to keep in mind also, it's just nice to have this clear, so we talk about a "compute node" on a on a cluster and that is about equivalent to high end personal computer or desktop computer. So that would have a certain number of cores in it and a certain amount of memory and storage. That's kind of a unit of a supercomputer. The "cluster" is all of those nodes networked together. And so that is… that's why we call it a "cluster", because it's a group of nodes. And that's also what we call a "supercomputer". And an "individual core": that's the CPU unit. So most computers these days have multiple cores, so you can talk about "quad core" or something like that to refer to a four-core system. And you'll see that in a supercomputer there are many, many cores in each node. And you do have to know the specifics of that so you can request your job correctly. And then the other term it's useful to to be aware of is the GPU, which stands for "graphical processing unit". And that's another type of parallel processing unit that's available on supercomputers. It's just a separate multicore processor. And depending on the type of calculation you want to do or the software that you want to access, it may change – it may affect – whether GPS are going to be useful to you, but if you can use them, they do provide a great deal of speed up to parallel calculations.
[Slide: "Memory"] And then just to talk very briefly about memory. Memory is – the definition of memory that we're using is – it is a space to hold data that is being calculated on, so actively being used, as well as the instructions that you're giving to the machine to do your calculation. So it's just a place to kind of manage data that's being actively accessed for your calculation. And there's different types of memory that you have to be aware of when you're working with a supercomputer. So on a single node, if you think of a single node as like a single PC, you have your processor and your memory and your storage all together. So we consider that shared memory because it's shared across all the processors. Once you start working with multiple nodes, then you have memory that is distributed across those nodes, so the memory that's on Node 1 is separate from the memory that's on Node 2, and so you have to be aware of where your data is being kept. Is it accessible to all the the the CPUs that are going to do your calculations? These are just some considerations. As you as you work with the system and develop your jobs, you'll see how to take advantage of that and how to make sure that you're accounting for that in your computational instructions. And each core will have – each core of a node would have – an associated amount of memory. And so that can vary with different types of hardware. So you'll see we have some standard nodes and then we have some large memory nodes that'll provide just greater memory capacity for jobs that require that.
[Slide: "Storage"] Storage is – so we take memory and storage is the other option – storage is storing data for longer-term usage. So it's data that's not necessarily being actively-calculated, used in a calculation. And there's different types of storage for different needs. And so OSC has several different types of storage from storage on the compute node that you access during your job to longer-term project storage to be available when needed: scratch storage to longer-term or archive storage. And so all those have different features that can optimize for those different uses. And as a user, you'll have access to most of those.
[Slide: "Structure of a Supercomputer"] So here is just a sort of quick overview [diagram of the parts of the supercomputer system] of what a supercomputer – how all the components of a supercomputer…. So as a user, you're going to access this system remotely. So you're going to use some tool to do that, either a terminal window or web portal, and you're going to log into the system. And when you do that, you're going to access these specialized nodes called "login nodes". And so these are not the main part of the system. These are just where you access the system directly to manage your files or to submit jobs and read your output and things like that. And these are shared nodes. This is where you'd have to be aware of any calculations you try and run, any memory you try and use. It can affect other users if you try and use too much of the resources on the login nodes. And then the main part of the cluster are the compute nodes, and so you can see they're represented here is these small boxes that are all linked together. So that is individual nodes making up a cluster. And there are different types of compute nodes. That's why you'll see that some of them are larger or different colors would indicate different features. And I'll show you the specifics of that when we talk about our specific hardware. And so those are all networked together. And then we have our different data storage areas and those are all accessible through the the entire system. So it's just a general overview of just a supercomputer, nonspecific.
[Slide: "Hardware Overview"] So, and now I can move on to what we what we have at OSC available specifically for our our our current clusters, but I'll just stop for a minute and see if anybody has any questions so far. So I see a question, "Has OSC been involved in COVID-19 research?" We certainly have. We've been a host to some COVID-19 data platforms so that people can store data and present information about COVID-19 data to the community. We've also supported research efforts by different research groups around the state. So we've had a separate effort to kind of support researchers who are doing COVID-19 research, so that we can give them specific attention. And you see that Wilbur shared a link [not displayed] to some of the activities that we've been doing to support people doing research on COVID-19.
So I see another question about signing in through the terminal on the computer using SSH. So it depends on what your credentials are, I guess, and what – I mean, you're probably using just the terminal app, so that probably would work OK. If you can post your question, Rohan, again, just to everyone or to Wilbur directly, he'll see if he can troubleshoot that for you, because it should work: SSH is one of the main ways to access the supercomputers.
[Slide: "Owens Compute Nodes"] So now I'm going to go on to our clusters and the details of our hardware, so I'm going to start – we have two clusters right now, Owens and Pitzer – so I'm going to start talking about Owens. So we've had Owens longer and Owens has 648 standard nodes, and each of those nodes has 28 cores, and there's one 128 GB of memory on each of those nodes. So as a user you can take advantage of a single node or you could request multiple nodes, but you'll have access to to 28 cores per node and 128 GB of memory.
[Slide: "Owens Data Analytics Nodes"] And then Owens also has data analytics nodes – so these are large memory nodes – there are 16 total available on Owens, and each of those nodes has 48 cores per node, just so you have a much greater parallelism per node. And then the memory available on each node is 1.5 TB. So these are made specifically for jobs that require more memory than is available on our standard compute nodes.
[Slide: "Owens GPU Nodes"] And Owens also has GPU nodes, so I mentioned the graphical processing units, so there's 160 standard compute nodes that also have a GPU available. And so that is the same 28 cores per node, 128 GB of memory, and then you also have access to a GPU on each node.
[Slide: "Owens Cluster Specifications"] And here's all that information put together, and you can find this schematic and the technical details of the Owen's hardware on our website, I put the link here, but you can just go under "Services" > "Cluster Computing", and then when you click "Owens", and you can see all these details. And I mentioned I will put these slides up on our event page so you can access them and get all these links that are in here.
[Slide: "Pitzer Cluster Specifications"] And so our other cluster is Pitzer, and so Pitzer is a newer cluster and it's also just had an expansion. So this this first view of Pitzer is the original Pitzer just so it's a little less difficult to see. So Pitzer similarly has standard compute nodes. There are 224 standard nodes that have a total of 40 cores per node and each node has 192 GB of memory. There's also GPU nodes on on the original Pitzer and those are slightly different different GPU's, but for similar purpose. So those are standard compute nodes, again, with the GPU available. And then for huge memory nodes – and these are set up differently – there's 80 cores per node and there's 3 TB of memory. So we found that people using Owen's actually needed more memory on the large memory node. So we expanded to these nodes. Now, Pitzer is had a very recent expansion. So now you can see it's it's sort of like two clusters in one. So we have our original setup with the compute nodes, GPU nodes, and huge memory nodes. And now we've added another set of compute nodes, large memory nodes, and GPU nodes. And so these would be newer, newer hardware, slightly different set up. You can see on the – so the expansion's on the bottom side – that we have 340 standard nodes, and each of those nodes has 48 cores per node and with the GPU nodes are the same standard node plus now two GPUs per node. And so there's other other features on the on the new Pitzer cluster. So we have a lot of new software, I mean new hardware, available with the with the Pitzer expansion.
[Slide: "Login Nodes"] And just to remind you, as I mentioned, when you log into the system you're working on the login nodes – you're not accessing the compute nodes directly there – and so on the login nodes, you have to be aware of not interrupting anybody else's work. So the loggin nodes are mainly for file editing, and managing your files, and setting up your your files and your jobs to submit to the batch system so that you can access the compute nodes. There are hard limits of activity on the login nodes: 20 minutes of CPU time. So if you start a process of any kind, like compiling a small file or something, if it runs for 20 minutes, the system will stop it. And you only have access to 1 GB of memory. So you really can't do large computing there. And if you do try and do that, it'll slow down other people's ability to access the system and, you know, the system will cut you off. So just trying to avoid doing any serious computing on the login nodes – it's really just for setting up.
[Slide: "Data Storage Systems"] So now I'll just go over the data storage systems that we have. So, as I mentioned, we have several different file systems. I'm going to focus on the four that you'll probably interact with the most. So we have a home directory. And so as a user on our system, everybody has a home directory, storage space. And you can… this is the main place for you to store your files. It is backed up daily. Generally as a general research user you would have 500 GB of storage space in your home directory and you can use your tilde username as a reference if you are familiar with Linux paths. There's also the project directory. And so this is a space that, as a whole project, so PI and all the users can share an extra storage space together. That's the project storage space, and that's available by request, so if your project finds that the individual home directory space isn't sufficient for your work, you can request project storage space as well. And that will have a reference – that will have a path that references your project number – so you will have to be aware of what your project number is and I'll talk about what that means. And then the scratch storage space. This is temporary storage. It's not backed up. And this is available to all users, but it's expected that you'll only use it for the time of your active jobs that you're running with, with this, with whatever you're storing on the scratch system. So you're not leaving it there indefinitely. If you need it for a few weeks that's what it's there for. It's optimized for very large files. So if you need to do calculations on large files without shifting them to the compute nodes, you can use scratch for that. But this is, as I said, not backed up. And there is a purge on scratch to keep it from getting too full. So after 90 or 120 days, any files that are unused but still there will get removed. And then the fourth file system we refer to is the temp directory. And so the reference there is the variable that you can use to to reference it, and that's the storage that's on the compute nodes. So when you're running a job, you have access to this to a compute node or multiple compute nodes. You have access to the storage on that compute node, but only during your job. So you can transfer files there and then you can read and write all on the compute node. And that's the recommended way to use those so that everything is contained, and you're not relying on the network too much during your job. And then so you can read and write to the the temp directory on the compute node. And then when your job ends, you, you have everything copied back to your home directory so that you have your results, because as soon as your job ends that storage is no longer available to you and it'll be wiped. So that's only available for that window when your job is running. And again, there's a link here where you can see more detail about the available filesystems. And again, I'll be posting these slides on the event page so you all can have access to it. And so this is just the detail of those four different file systems with the quota for each of them. As you can see, home and project directories are backed up and not purged. Scratch is not backed up and there is a purge for that. And then compute is only available while your job is running.
[Slide: "Getting Started at OSC"] So that was kind of all the details about our hardware. Now I'm going to talk about getting started at OSC with a new project or a new account and what you need to know is a new user. So let me know if you have any questions so far.
[Slide: "Who can get an OSC project?"] So academic projects are available at OSC to PIs (principal investigators) who are full-time faculty members or research scientists at an Ohio academic institution. Generally, that's the main person who runs the project. And then once a PI has a project, they can authorize whomever they like to work on that project with them. So they can invite students, postdocs, collaborators from other institutions, other research scientists to come and be on your project. But the PI is the manager of it. We also have classroom projects, as I mentioned. If you're teaching a course and you want to have access to OSC resources, those can be separate projects from your research project. And then commercial organizations can also have projects. And that's through purchasing time through our sales group.
[Slide: "Accounts and Projects at OSC"] And so just to be clear about how OSC uses the term, "project" and "account", a "project", as I said, is headed by a principal investigator and can include whatever other users that principal investigator chooses. And that's kind of the tool we use for managing resources is at the project level. So a project will have some some amount of resources available to use and all the users kind of charge against those resources and that's how we give access to the project directory (to software access) is usually at a project level. An "account" is related to a single user. So this is this is where you have your username and password to access the HPC systems. And this is connected to a single email. So we ask that you don't share accounts, that each person has a separate account, just so that we can always communicate with whoever is using that account, in case we see some activity that we need to question or we have a suggestion for optimizing your jobs. We always can communicate with with the person using the account. And you might be on multiple projects, but you'll have one account that can access all of those projects.
[Slide: "Usage Charges"] And I see a question about classroom usage: "If teaching a university class that has a large number of students, what happens if the initial resources are used up before the semester is finished?" And, yeah, that happens. So we do start with an initial project allocation for a classroom project at $500. And so that's for whatever compute and storage a class would need. But yeah, I mean, if you have a large course or you're expecting to to have a lot of calculations, you think that might not be enough. We can always provide more during the semester. So that's fully subsidized. And so the way we manage usage, we charge for core-hours, GPU-hours, and terabyte-months for storage, and so your project has an initial dollar balance and then whatever services you're using get charged against that. And we have some information on our website about what that looks like – these these prices are still subsidized by the state, so they're they're definitely cheaper than a commercial option – so we're still trying to be kind of a more reasonable option than than just going out and purchasing compute resources in the cloud or something like that.
[Slide: "Ohio Academic Projects"] And so for Ohio, academic project standard projects receive a $1000 grant annually to cover OSC services. So as a P.I., you can request and receive every year $1000 of OSC services and use that for whatever works for you. So that can be for compute, for storage, for whatever services we charge for. And you can make sure that you have budgeted so you don't – your students don't – accidentally run up a large bill so that there's a limit on the usage on the project. If you have a certain… if you want to stay within that thousand or you want to stay within whatever your budget is. And as I mentioned, classroom projects are fully subsidized. And this can all be managed at our client portal, which is separate from our – so we have set up several different websites – so osc.edu is our main website that has our documentation and information about our resources. My.osc.edu is our client portal, and that is where you can log in as a user and add people to your project, change your budget, request a classroom project, check on usage in your project and things like that, change your shell, actually.
[Slide: "Client Portal – my.osc.edu"] Here's just a view of the dashboard of the client portal so you can see what your active projects are on the bottom and then you have usage by project, usage by system just so you can keep an eye if you're not an active user, but you're managing some students, you can see what's going on, on your project using the snapshot. But the client portals where you make all those requests about budget and things like that.
[Slide: 'Statewide Users Group - SUG'] And we have a statewide users group, so this is open to anybody who uses our systems can come and give advice and help us help us develop our services. Over the years, we have two current committees, the Software and Activities Committee and the Hardware and Operations Committee, so you can attend and learn about our plans and make suggestions for what you think OSC should do in the future. We usually have two in-person meetings at OSC that usually include a research fair where we have posters and flash talks. We've been doing those, we've been doing the SUG meeting virtually so it's kind of pared down for the past couple of sessions. We're having a virtual SUG in March where you can attend meetings and get an OSC update and things like that, all virtually. But hopefully once we're back to in-person events, we can do the poster sessions and everything again.
[Slide: "Citing OSC"] And then there is a little information on our website about citing OSC, so definitely if you're going to publish anything, we would love to have a citation so we can, you can track those and see what people are doing.
[Slide: "User Environment"] And so now I'll move on to actually accessing the system as a User and Working with the Environment. So, a pause for a minute and give people a chance to ask questions.
[Slide: "Linux Operating System"] All right, so. Our user environment, we use the Linux operating system, which is widely used in HPC. Linux, is mostly accessed through command line and so that's a way to interact with them, with our clusters. You can choose your shells. So Bash is the default shell, but if you have a preference for other types of Linux shells, you can change your shell and you have to do that at the client portal so that would be the m y.OSC.edu website. You can change your shell there. It's open source software. There's a lot of tutorials available online to kind of learn the basics of doing some simple commands at the command line for Linux. Also, on our website, we have some links to some recommended Linux tutorials. But I'm also going to talk to you about using our On-Demand portal where you may not have to do so much with the command line.
[Slide: " Connecting to an OSC Cluster"] But so the main ways to connect to our clusters, the classic way is using SSH., which means Secure Shell. And so, if you have a terminal program on your PC already, you would open your terminal window and then type in SSH and then your user ID since this is your a HPC account ID and then at @onkosc.edu. So, if you're connecting to ONZGUO's, then you would use ONZGUO's, or switch that to Pitzer and so you just put that command on the command line and that would send a communication to the cluster, and then you'd have to put in your password and you can access the clusters that way. The other option is to use a web-based portal. So, we have our On Demand portal that you can log into and access the clusters directly through a web browser so you don't have to be as familiar with the terminal or download any particular software to do that. And if you are going to use any software that has a graphical user interface, then you would want to use an X 11 forwarding setup which takes. There's some information on our website if you need to do that. But when you log into the terminal, you'd need to change your command to SSH dash capital X to turn on that forwarding so you can run your graphical interface.
[Slide: "O SC OnDemand"] So, our OnDemand portal has several features, as I said, it's web based, so you're not working necessarily at a terminal. You don't have to download any other software because it's available on any web browser and you just need to be able to to use your username and password to log in. And then you have access to all of our resources and this includes file management, submitting jobs, looking at some graphical interfaces. So, we have some apps where you can run tools like Matlab or our Jupiter Notebook's, Abacas, ANSYS console. There is also terminal access so you can access our clusters through a terminal, through our portal as well. You can see some of the details about that on our main website under OnDemand. You can look at some of the features, but I'll go through them in a little bit.
[Slide: "Transferring Files to and from the Cluster"] So file transfers, there's several different methods you can use to, to manage your files and transfer files from your local PC over to the cluster. So, there's a tool in Linux called SFTP or SCP that you can use to transfer from the command line and those work on Linux and Mac. If you use a Windows machine, there's a software called FileZilla that can be used as well. That'll do some file transfers for you and you can do that at the General SSH login node. We also have a file transfer server so if you're transferring something large and you want to do it this way, you would connect to the SFTP.OSC.edu location and transfer your files that way. On OnDemand, you can transfer smaller files, so up to five gigabytes you can drag and drop using our file management tool and I'll show you what that looks like. And then we also have GLOBUS, which is another web based tool that can also handle large file transfers. So, that can be really useful for large files or for a group of files you want to transfer all at once. GLOBUS is a nice way to do that, kind of in the background and we have information about how you can set up that that tool on our website as well so that's what that link is at the bottom.
[Slide: "Using and Running Software at OSC"] So now I'm going to talk about more about our software at OSC, and I see a question about about versions of Git that are installed on our website. So, I mean, our software team manages software installs and so generally they keep us updated with the latest version available. So. I would say if you send an email to OSC.help and find out why, if there's a newer version of GIT, that might be more useful to you. Also with open source tools, you can also install your own versions as well in your local directory and so we'll go over some of that in this section.
[Slide: "Software Mantained by OSC"] So, there's over one hundred and forty five software packages maintained at OSC and so for any software that you want to use, definitely go to our main website, OSC.edu and you can look under resources and in the top menus, under resources, available software and you'll see a list of all of the available software and you can browse that list or you can just do a search for the software you're interested in. You definitely want to start with the software page of whatever software you want to use, because that will have the details about what versions are available on what system, how to get access to the software, if it has to be requested for some in some cases and it also gives you examples of how to use it on the different systems so, this can be really helpful to just sort of getting started and getting comfortable using the software there.
[Slide: "Third party applications"] So, some of the general programing software that's available, we have different types of compilers and debuggers and profiler's, so if you want to optimize your code, there are some tools to do that. MPI Library, Java, Python, R. Python and R being very, very popular tools. And there's lots of different versions and packages and you can also install locally, install your own packages for R and Python.
[Slide: "Third party applications"] Parallel programing software, MPI, libraries of various kinds, OpenMP, CUDA for GPU programing and OpenCL and OpenACC.
[Slide: "Acess to Licensed Software"] And so software licensing obviously is a really complicated area, but generally we try and license, get our software license for general academic usage. Sometimes software requires a signed license agreement and so that's why you have to check the software page at OSC just to make sure what the access is, because some software you may just need to request access or you may need to sign a license agreement before we can let you use that. And those details are all on each software page.
[Slide: "OSC doesn't have the software you need?"] And if we don't have the software you need, you can certainly request it so you can certainly send us some information and say I think the software is really useful and there's a whole bunch of people in my department who would use this if you if you supported it. And we can consider adding some software to our system. If the software you want to use is open source, you can install it yourself in your home directory and so that's available so that you can install it for yourself or in your group and and just manage it locally in your home directory. And the link at the bottom of the slide here is a how to of how you would install it yourself and if you have a license for software that we support that you want to use, we can help you use that license at OSC. So that's something you can ask. Just send an email to OSC.help and we can help you with that.
[Slide: "Loading Software Environment"] And so once you are on the cluster and you want to access some software, we use software modules to manage software so this is just a way that we can manage the software and then you can have access to it and make your environment work for the tools you want to use, just using these commands. So, and so, some of the main commands you want to use when you first log in, you can do a module list command and that will show you what software modules you have installed. So, there's always a set of default modules that get installed for everybody. So, you can see how we have Intel and just some general modules that --- just to set up your environment initially, you can always change those. And then if you want to search for a module, you can do a module spider with a keyword or a module avail and see what software modules are available for a certain software package you're interested in. And they'll be different versions, different modules for different versions and so when you want to add software to your environment, you use a module load and then the name of that software module. And if you're not specific about the version, there'll be a default version. That'll get loaded. You can also remove a software package from your environment by doing module unload and then the name of that software package and then you can change versions and that command is module swap where you swap out one version of software for a different version. And you just have to be careful about putting in the right versions you're interested in.
[Slide: "Batch Processing"] And I will point out that you do have to do that in your job, your batch script as well, so you want your job to have the right, the right software environment as well.
So, now I just want to give you a quick overview of what it looks like to submit jobs to the batch system. So we'll talk about batch processing. So any questions so far? I see there have been some questions in the chat and there has been been answering those. So that's good.
[Slide: "Why do supercomputers use queuing?"] So, talking about batch processing, we use batch processing or queuing in supercomputers because we have a lot of resources and we need to use them in the most efficient way possible. So, we have one hundred users all wanting to run jobs on our cluster. We have to make sure that the, the available resources are used as efficiently as possible and so that everybody gets their work done faster, even if individually we have to wait for the queue to start our job. And so the batch system has a scheduler and a manager so you submit your job to the queue and the scheduler keeps track of all the jobs that are submitted. And once the resource manager has the resources available, your job will move into into actively using the compute nodes. And then once your job completes, then you, you will have the results back, copy back to your home directory. And so it's the most efficient way to run a cluster and just make sure that that everybody gets their work done in a timely manner. But that means that you have to make all of your requests to the system in a job script and put all the relevant details in there so that your job runs effectively and you get the results you need.
[Slide: "Steps for Running a Job on the Compute Nodes"] And so the steps that you need to go through to run a job on the compute nodes, we need to make a batch script or a job script and I'll show you what that looks like and the information you have to put in those then you submit that to the queue. The job waits in the queue. When the resources become available, the job runs and then once the job is completed, your results will be available. Copy back to your home directory.
[Slide: "S pecifying Resources in a Job Script"] And so the resource requests you have to make in a job script can include the number of nodes, the number of compute nodes you need to use, the number of cores per node that you want to use. And if you want to use GPU's, you can specify a memory, but it's not required. So, memory is allocated proportionally to your request so if like on Pitzer standard, the standard nodes of the original Pitzer have 40 cores, so if you requested one node and 20 cores on Pitzer, you would get half the memory of that node. And so you can think your memory request as being implicit in your core request. So, if you are running, if you're going to run a job that you want to use maybe 10 cores, but you're going to need the full amount of memory on that node, you should request the whole node that request all 40 cores so that you have access to all of that memory as well. The other thing you have to request is wall time so if you think your job will take an hour to run, you might want to request a job for two hours just because you want to overestimate so that if your job doesn't quite finish in one hour, it doesn't get stopped because, well, time is a hard limit. You want to overestimate slightly, though, so a smaller job, you can overestimate a little more, but a larger job, if you think your job's going to take twenty four hours, you might want to request 30 or something like that just to give it a little extra time, but not so much that your job will take longer to wait in the queue if you over request your wall time.
[Slide: Specifying Resources in a Job Script"] So, a question in the chat is about a job is submitted to a single node, but requests less than the full number of cores, can a second job run on the same node? So, you can use the other cores and how does the memory get shared there? So, yeah, this is a problem. This is something, you have to be you have to consider if you're going to request a job. And not so you can only say one node and half the processors on that node. Another job could run on that node at the same time and the memory allocation will be relative to the number of cores but if a job is using more, trying to use more memory than it's supposed to use, you can have some some sort of negative interaction between those two jobs as far as what memory is available. So, that can be a problem. So, yeah, I would say that's why I say make sure you get something else you can get a sense of as you run your jobs is sort of how much memory is necessary to run your job efficiently and if you just want to request the full node so that you have access to all that memory. But as long as your request is is correct for what the resources that you need, you shouldn't run into any problems, but it can take a little while to get that to trial and error kind of process. And so the next thing you need to include in your job script is your project number. So, this is the project code. It's usually to the P and has a four digit code at the end. You need to include that in your job script so that the job can be accounted for in your project and then some software requires a specific license request as well. So, that has to be in the job script and so we'll show you what a job script looks like.
[Slide: "Batch Changes at OSC"] It is important to note that OSC has just switched over from a Torque/Moab scheduler, a resource manager, to SLURM, our scheduler and resource manager. So, we've just made this change a couple of months ago and so before this, if users were using Torque/Moab, they would use PBS scripts. This is just a PBS based job script. So, the variables look different, but the activity is the same between the old batch system and the new batch system. It's just the terminology has changed. But we have a compatibility layer active so if you are already running jobs on our system, using the PBS scripts, there generally will still work. But we we expect that anybody who's just starting out now will use SLURM as their main way to submit jobs. So, that's what we're going to talk about as far as job scripts from now on and current users, your job script should still work or you can start switching over to SLURM. We do have some information sessions monthly. The next one is scheduled for January 27. That's on the event calendar. So, you can always sign up for that to get more information about how to use SLURM at OSC. And then we have the link to sort of just general SLURM information that can give you the keywords and commands that you might want to know. But we'll go over some of that right now.
[Slide: "Sample SLURM Batch Script"] So here is a sample batch script, and this isn't using SLURM. And so in a batch script at the top, you always have your resource requests. So SLURM requires that you put in your, your shell statement right at the top. So, that first line is required and then you have all your SBatch requests. So, this is all the resource requests we talk about all the time. Number of nodes. Number of cores per node, which SLURMS calls tasks per node, you have to give the job a name and then you use the 'line account.' That's what SLURMS calls a project. So that's where you put your project code. And so that's just a general project code. You want to replace that with your specific project code and then the rest of the the job script or all the commands you need to run to manage your files and to run your calculations and then to get your results back at the end. So, you want to set up your software environment. As I said, you need to make sure that the software is accessible by your job. So, you want to load in the software modules that you need and then copy your input files over to the computer directory. So, that's the temp directory and then you want to run your job. So, here we're doing an MPI compile and then running that code and then copying back the results to your working directory so you have your results at the end of the job. So, this all becomes a text file and this is your job script and you give this text file a name and that's your job script name.
[Slide: "Submit & Manage Batch Jobs"] And so when you have your job script, then you use the command SBatch and the name of that job script and that submits it to the queue. And once you submitted a job successfully to the queue, you'll see on the bottom of this slide, you'll see a line that is submitted batch job, and then a code. That's your job ID. So, that's how the queue is going to label your job. So, if you happen to make a mistake and you want to cancel your job, you need that job ID and then you can use the command, S-cancel and that code to cancel that specific job. And then if you in general, just want to see what your jobs are doing in the queue, you can use the command S-queue with the flag dash you and then your username, and that'll give you information about all your jobs that are in the queue. And you can see their status, whether they're waiting, whether they're on hold or they're running. And when they've completed, they'll no longer be in the queue syou can see that they're gone. You can also put a job on hold if you wanted to, in case you submitted a bunch of jobs and one of them is dependent on another one finishing. You can put that on hold and then release it using the control hold and release commands and so that's how you manage, submit jobs, manage them and check on the Q.
[Slide: "Scheduling Policies and Limits"] We do have policies and limits on our scheduling just to manage the size of jobs that can get submitted generally to our clusters. So, we have a walltime limit for single node jobs that we call serial jobs, and that's one hundred and sixty eight hours. So, that's for a standard single node job. You can have up to a week of walltime for jobs that use more than one node. We have a ninety six hour limit and that's, those are called, we call them parallel jobs and then per user, you can get a total limit of running jobs of one hundred and twenty eight or two thousand forty processor cores in use at once or you can have a thousand jobs in the batch system all together. So, if you happen to be submitting a lot of jobs at once, you can have one hundred and twenty eight running and up to a thousand waiting in the queue and then group limits as well are similar. And that's just so that no one user, no one group can be submitting enough jobs to take over the whole system. And these are, these are the general guidelines we use. These are limits but if it happens to be a reason why you need to do a larger job or a longer job, you can always request that as well. We can accommodate different, different types of jobs. But these are just the standard limits that we have.
[Slide: "Waiting for Your Job to Run"] And so when you submit your job, it's sometimes hard to know how long it will take. It's always a question of system loads. Generally, our systems are pretty busy, but generally things we try and keep an eye on how fast things are going through so we hope to keep so you don't have to wait too long. But it's also about what resources you request. So, that's why you want to make sure that you're walltime isn't unnecessarily long or your core request isn't too large, isn't larger than you need. It can be as large as it needs to be. But you want to try and make it as reasonable as possible so that your job doesn't wait longer to start and then if you're requesting specific resources like memory or GPU use, that can change how long your job will take. But yeah, it's sometimes hard to tell. So, and it may vary.
[Slide: "Interactive Batch Jobs"] Another type of job that we do support is interactive jobs, and those are still submitted through the batch system, so the same resource limits and they're useful for debugging or just working on something to test out some stuff just in case to try and before you submit to a general batch job. Our system isn't optimized for interactive batch jobs, so you still have to wait for that job to start and you have to be there when it starts to actually take advantage of the resources. So, it can be a little difficult. And so, yeah, most jobs are fairly, start fairly soon, but sometimes you have to wait a little while. And at the bottom of the slide you can see an example of how to submit a batch of interactive jobs to the back system. You do have to make the same requests of a number of nodes, a number of cores and walltime.
[Slide: "Batch Queues"] And it's good to know that we have the two different clusters and their batch queues are completely separate, so you do have to make sure that you know which cluster you're submitting to. And then there are also a few nodes in each system that are reserved for short jobs, for debugging and testing things out so if you make a small, you can make a smaller request and use the debug nodes.
[Slide: "Parallel Computing"] And just in general, with parallel computing, if you can take advantage of multiple nodes, that's a good thing just because you want to use the resources effectively and get your work done efficiently as possible, but it can take some time to make that actually efficient. So, just keep in mind that there could be other considerations beyond just asking for more cores or more nodes. And so there are tools like multi-threading and MPI that can allow you to take advantage of multiple nodes and multiple cores, but it can vary. And so it's something you're going to have to figure out as you go along and we can help with specific questions or you can read the documentation that we have.
[Slide: "To Take Advantage of Parallel Computing"] And so always check with the software that you're going to use to see how it takes advantage of parallel computing and make sure that you have the right input for your jobs to do those things effectively.
[Slide: "https://ondemand.osc.edu/"] And so that was the main part of the presentation.
[Slide: "Resources to Get Your Questions Answered"] I'll just say we have some more resources for other ways to check on questions that may come up for you. So, we have on our main site FAQ'S that have a lot of important information. Our how-tos are really important for doing certain activities like installing software, installing Python in our packages and things like that, setting up GLOBUS. So, those are all on our website under 'how to.' We do have some tutorial materials, if you're interested. If you want to get started using our system and follow a tutorial for submitting jobs, we have a tutorial material available on GitHub. We also have office hours every other Tuesday that you can sign up for. Those are available. You can get the link to those from our event page and and sign up for a time to talk to us through your office hours. We also have a question forum tool called Ask.ci, that we're a member of, and that's a larger cyberinfrastructure resource. But we have a section on there. Another place you can have discussions and ask questions. If you want to get updated information about our system, we have a Twitter feed, HPC notices where we post any information about problems or downtimes. Also, when you log into the system, you'll see the message of the day, which will give you any new information about any changes to our systems we need our users to know.
There are some commands that OSC has created custom versions of to be more useful to OSC users.
OSCfinger is a command developed at OSC for use on OSC's systems and is similar to the standard finger command. It allows various account information to be viewed.
owens | PITZER |
---|---|
X |
X |
OSCfinger takes the following options and parameters.
$ OSCfinger -h usage: OSCfinger.py [-h] [-e] [-g] USER positional arguments: USER optional arguments: -h, --help show this help message and exit -e Extend search to include gecos/full name (user) or category/institution (group) -g, --group Query group instead of users Query user: OSCfinger foobar Query by first or last name: OSCfinger -e Foo OSCfinger -e Bar Query group: OSCfinger -g PZS0001 Query group by category or insitituion: OSCfinger -e -g OSC
The OSCfinger command can be used to view account information given a username.
$ OSCfinger jsmith Login: xxx Name: John Smith Directory: xxx Shell: /bin/bash E-mail: xxx Primary Group: PPP1234 Groups:
The OSCfinger command can also reveal details about a project using the -g
flag.
$ OSCfinger -g PPP1234 Group: PPP1234 GID: 1234 Status: ACTIVE Type: Academic Principal Investigator: xxx Admins: NA Members: xxx Category: NA Institution: OHIO SUPERCOMPUTER CENTER Description: xxx ---
If the username is not known, a lookup can be initiated using the -e
flag.
This example is shown using the lookup for a first and last name.
$ OSCfinger -e "John Smith" Login: jsmith Name: John Smith Directory: xxx Shell: /bin/bash E-mail: NA Primary Group: PPP1234 Groups: xxx Password Changed: Jul 04 1776 15:47 (calculated) Password Expires: Aug 21 1778 12:05 AM Login Disabled: FALSE Password Expired: FALSE ---
One can also lookup users with only the last name:
$ OSCfinger -e smith Login: jsmith Name: John Smith Directory: xxx Shell: /bin/bash E-mail: NA Primary Group: PPP1234 Groups: --- Login: asmith Name: Anne Smith Directory: xxx Shell: /bin/bash E-mail: xxx Primary Group: xxx Groups: ---
Only the first name can also be used, but many accounts are likely to be returned.
$ OSCfinger -e John Login: jsmith Name: John Smith Directory: xxx Shell: /bin/bash E-mail: xxx Primary Group: PPP1234 Groups: --- Login: xxx Name: John XXX Directory: xxx Shell: /bin/bash E-mail: xxx Primary Group: xxx Groups: --- Login: xxx Name: John XXX Directory: xxx Shell: /bin/ksh E-mail: xxx Primary Group: xxx Groups: --- ...(more accounts below)...
While in a slurm environment, the OSCfinger command shows some additional information:
$ OSCfinger jsmith Login: xxx Name: John Smith Directory: xxx Shell: /bin/bash E-mail: xxx Primary Group: PPP1234 Groups: SLURM Enabled: TRUE SLURM Clusters: pitzer SLURM Accounts: PPP1234, PPP4321 SLURM Default Account: PPPP1234
It's important to note that the default account in slurm will be used if an account is not specified at job submission.
OSCgetent is a command developed at OSC for use on OSC's systems and is similar to the standard getent command. It lets one view group information.
owens | PITZER |
---|---|
X |
X |
OSCgetent takes the following options and parameters.
$ OSCgetent -h usage: OSCgetent.py [-h] {group} [name [name ...]] positional arguments: {group} name optional arguments: -h, --help show this help message and exit Query group: OSCgetent.py group PZS0708 Query multiple groups: OSCgetent.py group PZS0708 PZS0709
The OSCgetent command can be used to view group(s) members:
$ OSCgetent group PZS0712 PZS0712:*:5513:amarcum,amarcumtest,amarcumtest2,guilfoos,hhamblin,kcahill,xwang
$ OSCgetent group PZS0712 PZS0708 PZS0708:*:5509:djohnson,ewahl,kearley,kyriacou,linli,soottikkal,tdockendorf,troy PZS0712:*:5513:amarcum,amarcumtest,amarcumtest2,guilfoos,hhamblin,kcahill,xwang
OSCprojects is a command developed at OSC for use on OSC's systems and is used to view your logged in accounts project information.
owens | PITZER |
---|---|
X |
X |
OSCprojects does not take any arguments or options:
$ OSCprojects OSC projects for user amarcumtest2: Project Status Members ------- ------ ------- PZS0712 ACTIVE amarcumtest2,amarcumtest,guilfoos,amarcum,xwang PZS0726 ACTIVE amarcumtest2,xwangtest,amarcum
This command returns the current users projects, whether those projects are active/restricted and the current members of the projects.
OSCusage is command developed at OSC for use on OSC's systems. It allows for a user to see information on their project's usage, including different users and their jobs.
owens | PITZER |
---|---|
X | X |
OSCusage takes the following options and parameters.
$ OSCusage --help usage: OSCusage.py [-h] [-u USER] [-s {opt,pitzer,glenn,bale,oak,oakley,owens,ruby}] [-A] [-P PROJECT] [-q] [-H] [-r] [-n] [-v] [start_date] [end_date] positional arguments: start_date start date (default: 2021-03-16) end_date end date (default: 2021-03-17) optional arguments: -h, --help show this help message and exit -u USER, --user USER username to run as. Be sure to include -P or -A. (default: amarcum) -s {opt,pitzer,glenn,bale,oak,oakley,owens,ruby}, --system {opt,pitzer,glenn,bale,oak,oakley,owens,ruby} -A Show all -P PROJECT, --project PROJECT project to query (default: PZS0712) -q show user data -H show hours -r show raw -n show job ID -v do not summarize -J, --json Print data as JSON -C, --current-unbilled show current unbilled usage -p {month,quarter,annual}, --period {month,quarter,annual} Period used when showing unbilled usage (default: month) -N JOB_NAME, --job-name JOB_NAME Filter jobs by job name, supports substring match and regex (does not apply to JSON output)
Usage Examples: Specify start time: OSCusage 2018-01-24 Specify start and end time: OSCusage 2018-01-24 2018-01-25 View current unbilled usage: OSCusage -C -p month
Running OSCusage with no options or parameters specified will provide the usage information in Dollars for the current day.
$ OSCusage ---------------- ------------------------------------ Usage Statistics for project PZS0712 Time 2021-03-16 to 2021-03-17 PI guilfoos@osc.edu Remaining Budget -1.15 ---------------- ------------------------------------ User Jobs Dollars Status ------------ ------ --------- -------- amarcum 0 0.0 ACTIVE amarcumtest 0 0.0 ACTIVE amarcumtest2 0 0.0 ACTIVE guilfoos 0 0.0 ACTIVE hhamblin 0 0.0 ACTIVE kcahill 0 0.0 ACTIVE wouma 0 0.0 ACTIVE xwang 12 0.0 ACTIVE -- -- -- TOTAL 12 0.0
If you specify a timeframe you can get utilization information specifically for jobs that completed within that period.
$ OSCusage 2020-01-01 2020-07-01 -H ---------------- ------------------------------------ Usage Statistics for project PZS0712 Time 2020-01-01 to 2020-07-01 PI Brian Guilfoos <guilfoos@osc.edu> Remaining Budget -1.15 ---------------- ------------------------------------ User Jobs core-hours Status ------------ ------ ------------ ---------- amarcum 86 260.3887 ACTIVE amarcumtest 0 0.0 ACTIVE amarcumtest2 0 0.0 RESTRICTED guilfoos 9 29.187 ACTIVE hhamblin 1 1.01 ACTIVE kcahill 7 40.5812 ACTIVE wouma 63 841.2503 ACTIVE xwang 253 8148.2638 ACTIVE -- -- -- TOTAL 419 9320.681
Specify -q
to show only the current user's usage. This stacks with -u
to specify which user you want to see.
$ OSCusage -u xwang -q 2020-01-01 2020-07-01 -H ---- ------------------------------- Usage Statistics for user xwang Time 2020-01-01 to 2020-07-01 ---- ------------------------------- User Jobs core-hours Status ------ ------ ------------ -------- xwang 253 8148.2638 - -- -- -- TOTAL 253 8148.2638
By default, the tool shows your default (first) project. You can use -P
to specify which charge code to report on.
$ OSCusage -P PZS0200 -H ---------------- ------------------------------------ Usage Statistics for project PZS0200 Time 2020-09-13 to 2020-09-14 PI David Hudak <dhudak@osc.edu> Remaining Budget 0 ---------------- ------------------------------------ User Jobs core-hours Status ---------- ------ ------------ ---------- adraghi 0 0.0 ARCHIVED airani 0 0.0 ARCHIVED alingg 0 0.0 ARCHIVED
You can show all of your charge codes/projects at once, by using -A
.
By default, all charges are shown in the output. However, you can filter to show a particular system with -s
.
$ OSCusage -s pitzer -H ---------------- ------------------------------------ Usage Statistics for project PZS0712 Time 2021-03-16 to 2021-03-17 PI guilfoos@osc.edu Remaining Budget -1.15 ---------------- ------------------------------------ User Jobs core-hours Status ------------ ------ ------------ -------- amarcum 0 0.0 ACTIVE amarcumtest 0 0.0 ACTIVE amarcumtest2 0 0.0 ACTIVE guilfoos 0 0.0 ACTIVE hhamblin 0 0.0 ACTIVE kcahill 0 0.0 ACTIVE wouma 0 0.0 ACTIVE xwang 0 0.0 ACTIVE -- -- -- TOTAL 0 0.0
The report can show usage dollars. You can elect to get usage in core-hours using -H
or raw seconds using -r
$ OSCusage 2020-01-01 2020-07-01 -r ---------------- ------------------------------------ Usage Statistics for project PZS0712 Time 2020-01-01 to 2020-07-01 PI Brian Guilfoos <guilfoos@osc.edu> Remaining Budget -1.15 ---------------- ------------------------------------ User Jobs raw_used Status ------------ ------ ---------- ---------- amarcum 86 937397.0 ACTIVE amarcumtest 0 0.0 ACTIVE amarcumtest2 0 0.0 RESTRICTED guilfoos 9 105073.0 ACTIVE hhamblin 1 3636.0 ACTIVE kcahill 7 146092.0 ACTIVE wouma 63 3028500.0 ACTIVE xwang 253 29333749.0 ACTIVE -- -- -- TOTAL 419 33554447.0 Detailed Charges Breakdown
Specify -v
to get detailed information jobs.
You can add the -n
option to the -v
option to add the job ID to the report output. OSCHelp will need the job ID to answer any questions about a particular job record.
Please contact OSC Help with questions.
The Ohio Supercomputer Center (OSC) exists to provide state-of-the-art computing services to universities and colleges; to provide supercomputer services to Ohio scientists and engineers; to stimulate unique uses of supercomputers in Ohio; to attract students, faculty, resources and industry; to catalyze inter-institutional supercomputer research and development projects; to serve as the model for other state-sponsored technology initiatives.
OSC serves a large number and variety of users including students, faculty, staff members, and commercial clients throughout the state of Ohio. Ethical and legal standards, in particular, that apply to the use of computing facilities are not unique to the computing field. Rather, they derive directly from standards of common sense and common decency that apply to the use of any public resource. Indeed, OSC depends upon the spirit of mutual respect and cooperative attitudes.
This statement on conditions of use is published in that spirit. The purpose of this statement is to promote the responsible, ethical, and secure use of OSC resources for the protection of all users.
As a condition of use of OSC facilities, the user agrees:
In addition, users are expected to report to OSC information that they may obtain concerning instances in which the above conditions have been or are being violated.
Violations of the following conditions are certainly unethical and are possibly a criminal offense: unauthorized use of another user's account; tampering with other users' files, tapes, or passwords, harassment of other users; unauthorized alteration of computer charges; and unauthorized copying or distribution of copyrighted or licensed software or data. Therefore, when OSC becomes aware of possible violations of these conditions, it will initiate an investigation. At the same time, in order to prevent further possible unauthorized activity, OSC may suspend the authorization of computing services to the individual or account in question. In accordance with established practices, confirmation of the unauthorized use of the facilities by an individual may result in disciplinary review, expulsion from his/her university, termination of employment, and/or legal action.
Users of computing resources should be aware that although OSC provides and preserves the security of files, account numbers, and passwords, security can be breached through actions or causes beyond reasonable control. Users are urged, therefore, to safeguard their data, to take full advantage of file security mechanisms built into the computing systems, and to change account passwords frequently.
Computing resources shall be used in a manner consistent with the instructional and/or research objectives of the community, in general, and consistent with the objectives of the specified project for which such use was authorized. All uses inconsistent with these objectives are considered to be inappropriate use and may jeopardize further authorization.
Beyond the allocation of computing resources, OSC normally cannot and does not judge the value or appropriateness of any user's computing. However, the use of computing resources for playing games for purely recreational purposes, the production of output that is unrelated to the objectives of the account, and, in general, the use of computers simply to use computing resources are examples of questionable use of these resources.
When possible inappropriate use of computing resources is encountered, OSC shall notify the principal investigator responsible. The principal investigator is expected either to take action or to indicate that such use should be considered appropriate.
Should possible inappropriate use continue after notification of the principal investigator, or should unresolvable differences of opinion persist, these shall be brought to the attention of OSC staff for recommendations on further action. Upon the recommendation of OSC staff, the Director may impose limitations on continued use of computing resources.
Users are expected to use computing resources in a responsible and efficient manner consistent with the goals of the account for which the resources were approved. OSC will provide guidance to users in their efforts to achieve efficient and productive use of these resources. Novice users may not be aware of efficient and effective techniques; such users may not know how to optimize program execution; nor may such optimization necessarily lead to improved cost benefits for these users. Those who use large amounts of computing resources in production runs should attempt to optimize their programs to avoid the case where large inefficient programs deny resources to other users.
Programming, especially in an interactive environment, involves people, computers, and systems. Efficient use of certain resources, such as computers, may lead to inefficient use of other resources, such as people. Indeed, the benefits attributed to good personal or interactive computing systems are that they speed total program development and thus lower attendant development costs even though they may require more total computer resources. Even with this understanding, however, users are expected to refrain from engaging in deliberately wasteful practices, for example, performing endless unnecessary computations.
OSC has a responsibility to provide service in the most efficient manner that best meets the needs of the total user community. At certain times the process of carrying out these responsibilities may require special actions or intervention by the staff. At all other times, OSC staff members have no special rights above and beyond those of other users. OSC shall make every effort to ensure that persons in positions of trust do not misuse computing resources or take advantage of their positions to access information not required in the performance of their duties.
OSC prefers not to act as a disciplinary agency or to engage in policing activities. However, in cases of unauthorized, inappropriate, or irresponsible behavior the Center does reserve the right to take action, commencing with an investigation of the possible abuse. In this connection, OSC, with all due regard for the rights of privacy and other rights of users', shall have the authority to examine files, passwords, accounting information, printouts, tapes, or other material that may aid the investigation. Examination of users files must be authorized by the Director of OSC or his designee. Users, when requested, are expected to cooperate in such investigations. Failure to do so may be grounds for cancellation of access privileges.
Who can get an account?
Anyone can have an account with OSC, but you need access to a project to utilize our resources. If an eligible principal investigator has a current project, he/she can add the user through client protal my.osc.edu. Authorized users do not have to be located in Ohio. See https://www.osc.edu/supercomputing/support/account.
Where should a new OSC user begin?
Once you are able to connect to our HPC systems, you should start familiarizing yourself with the software and services available from the OSC, including:
Do I have to pay for supercomputer use?
It depends on the type of client and your rate of consumption. Please click here for more information.
How many supercomputers does OSC have? Which one should I use?
OSC currently has two HPC clusters: Pitzer Cluster, a 29,664 core Dell cluster with Intel Xeon proccessors, and Owens Cluster, a 23,500+ core Dell cluster with Intel Xeon processors. New users have access to Pitzer and Owens clusters. To learn more, click here.
How do I cite OSC in my publications?
Any publication of any material, whether copyrighted or not, based on or developed with OSC services, should cite the use of OSC, and the use of the specific services (where applicable). For more information about citing OSC, please visit www.osc.edu/citation.
How do I submit my publications and funding information to OSC?
You can add these to your profile in MyOSC: https://www.osc.edu/supercomputing/portals/client_portal/manage_profile_information
You can then associate them with OSC project(s).
Can I rceive a letter of support from OSC when I apply for outside funding?
OSC has a standard letter of support that you can include (electronically or in hard copy) with a proposal for outside funding. This letter does not replace the budget process. To receive the letter of support, please send your request to oschelp@osc.edu. You should provide the following information: name and address of the person/organization to whom the letter should be addressed; name(s) of the principal investigator(s) and the institution(s); title of the proposal; number of years of proposed project; budget requested per year. Please allow at least two working days to process your request.
Hardware information about the systems is available at http://www.osc.edu/supercomputing/hardware.
How do I register for a workshop?
For a complete schedule of current training offerings, please visit the OSC Training Schedule. To register or for more information, please email oschelp@osc.edu.
Where can I find documentation?
For documentation specific to software applications, see Software. For other available hardware, see Supercomputers.
My question isn't answered here. Whom can I ask for help?
Contact the OSC Help Desk. Support is available 24x7x365, but more complicated questions will need to wait for regular business hours (Monday - Friday, 9am - 5pm) to be addressed. More information on the OSC supercomputing help desk can be found on our Support Services page.
Something seems to be wrong with the OSC systems. Should I contact the help desk?
Information will be coming soon for guidelines on reporting possible system problems.
Where can I find logos for my presentations, posters, etc.?
Please see our citation webpage.
What are projects and accounts?
An eligible principal investigator heads a project. Under a project, authorized users have accounts with credentials that permit users to gain access to the HPC systems. A principal investigator can have more than one project.
How do I get/renew an account?
For information concerning accounts (i.e., how to apply, who can apply, etc.), see Accounts.
I'm a faculty member. How do I get accounts for my students?
If an eligible principal investigator is new to OSC, he/she can create a new project. If an eligible principal investigator has a current project, he/she can add the user through client protal my.osc.edu. Authorized users do not have to be located in Ohio.
ADD FEE STRUCTURE FAQ
I'm continuing the research of a student who graduated. Can I use his/her account?
Please have your PI send an email to oschelp@osc.edu for further discussions.
I'm working closely with another student. Can we share an account?
No. Each person using the OSC systems must have his/her own account. Sharing files is possible, even with separate accounts.
How do I change my password?
You can change your password through the MyOSC portal. Log in at http://my.osc.edu, and click the "change password" button at the bottom left corner of the HPC User Profile box. Please note that your password has certain requirements; these are specified on the "change password" portal. Please wait 3-5 minutes before attempting to log in with the new password. For security purposes, please note that our password change policy requires a password change every 180 days.
If your password has expired, you can update by following the "Forgot your password?" link at my.osc.edu login page.
I want to use csh instead of bash. How do I change the default shell?
You can change your default shell through the MyOSC portal. Log in at my.osc.edu, and use the "Unix Shell" drop-down menu in the HPC User Profile box to change your shell. You will need to log off the HPC system and log back on before the change goes into effect. Please note, that it will take about a few minutes for the changes to be applied.
How do I find my project budget balance?
To see usage and balance information from any system, refer to the OSCusage page.
NOTE: Accounting is updated once a day, so the account balance is for the previous day.How do I get more resources?
To request additional use of our resources, the principal investigator will need to change the budget for their project. Please see the creating budgets and projects page.
How much will my project be charged for supercomputer usage?
If the project is associated with an Ohio academic institution, see the academic fee structure page for pricing.
If the project is NOT associated with an Ohio academic institution, contact OSC Sales for information on pricing.
See Job and storage charging for how OSC calculates charges.
What is my disk quota?
Each user has a quota of 500 gigabytes (GB) of storage and 1,000,000 files. You may also have access to a project directory with a separate quota. See Available File Systems for more information.
How can I determine the total disk space used by my account?
Your quota and disk usage are displayed every time you log in. You have limits on both the amount of space you use and the number of files you have. There are separate quotas for your home directory and any project directories you have access to.
Note: The quota information displayed at login is updated twice a day, so the information may not reflect the curent usage.You may display your home directory quota information with
quota -s
.How do I get more disk space?
Your home directory quota cannot be increased. You should consider deleting, transferring, and/or compressing your files to reduce your usage.
A PI may request project space to be shared by all users on a project. Estimate the amount of disk space that you will need and the duration that you will need it. Send requests to oschelp@osc.edu.
How can I find my largest directories?
To reveal the directories in your account that are taking up the most disk space you can use the
du
,sort
andtail
commands. For example, to display the ten largest directories, change to your home directory and then run the command:du . | sort -n | tail -n 10
Why do I receive "no space left" error when writing data to my home directory?
If you receive the error "No space left on device" when you try to write data to your home directory, it indicates the disk is full. First, check your home directory quota. Each user has 500 GB quota of storage and the quota information is shown when you login to our systems. If your disk quota is full, consider reducing your disk space usage. If your disk quota isn't full (usage less than 500GB), it is very likely that your disk is filled up with 'snapshot' files, which are invisible to users and used to track fine-grained changes to your files for recovering lost/deleted files. In this case, please contact OSC Help for further assitance. To avoid this situation in future, consider running jobs that do a lot of disk I/O in the temporary filesystem ($TMPDIR or $PFSDIR) and copy the final output back at the end of the run. See Available File Systems for more information.
How can I use tar and gzip to aggregate and compress my files?
The commands
tar
andgzip
can be used together to produce compressed file archives representing entire directory structures. These allow convenient packaging of entire directory contents. For example, to package a directory structure rooted atsrc/
usetar -czvf src.tar.gz src/
This archive can then be unpackaged using
tar -xzvf src.tar.gz
where the resulting directory/file structure is identical to what it was initially.
The programs
zip
,bzip2
andcompress
can also be used to create compressed file archives. See theman
pages on these programs for more details.Tar is taking too long. Is there a way to compress quicker?
If using
tar
with the optionszcvf
is taking too long you can instead usepigz
in conjunction with tar.pigz
doesgzip
compression while taking advantage of multiple cores.tar cvf - paths-to-archive | pigz > archive.tgzpigz defaults to using eight cores, but you can have it use more or less with the -p argument.
tar cvf - paths-to-archive | pigz -n 4 > archive.tgzDue to the parallel nature of pigz, if you are using it on a login node you should limit it to using 2 cores. If you would like to use more cores you need to submit either an interactive or batch job to the queue and do the compression from within the job.Note:
pigz
does not significantly improve decompression time.
How do I change the email address OSC uses to contact me?
Please update your email on my.osc.edu, or send your new contact information to oschelp@osc.edu.
I got an automated email from OSC. Where can I get more information about it?
See the Knowledge Base.
What is Linux?
Linux is an open-source operating system that is similar to UNIX. It is widely used in High Performance Computing.
How can I get started using Linux?
See the Unix Basics tutorial for more information. There are also many tutorials available on the web.
What is SSH?
Secure Shell (SSH) is a program to log into another computer over a network, to execute commands in a remote machine, and to move files from one machine to another. It provides strong authentication and secure communications over insecure channels. SSH provides secure X connections and secure forwarding of arbitrary TCP connections.
How does SSH work?
SSH works by the exchange and verification of information, using public and private keys, to identify hosts and users. The
ssh-keygen
command creates a directory ~/.ssh and files that contain your authentication information. The public key is stored in ~/.ssh/id_rsa.pub and the private key is stored in ~/.ssh/id_rsa. Share only your public key. Never share your private key. To further protect your private key you should enter a passphrase to encrypt the key when it is stored in the file system. This will prevent people from using it even if they gain access to your files.One other important file is ~/.ssh/authorized_keys. Append your public keys to the authorized_keys file and keep the same copy of it on each system where you will make ssh connections.
In addition, on Owens the default SSH client config enables hashing of a user’s known_hosts file. So if SSH is used on Owens the remote system’s SSH key is added to ~/.ssh/known_hosts in a hashed format which can’t be unhashed. If the remote server’s SSH key changes, special steps must be taken to remove the SSH key entry:
ssh-keygen -R <hostname>Can I connect without using an SSH client?
The OSC OnDemand portal allows you to connect to our systems using your web browser, without having to install any software. You get a login shell and also the ability to transfer files.
How can I upload or download files?
Most file transfers are done using sftp (SSH File Transfer Protocol) or scp (Secure CoPy). These utilities are usually provided on Linux/UNIX and Mac platforms. Windows users should read the next section, "Where can I find SSH and SFTP clients".
Where can I find SSH and SFTP clients?
There are many SSH and SFTP clients available, both commercial and free. See Getting Connected for some suggestions.
How do I run a graphical application in an SSH session?
Graphics are handled using the X11 protocol. You’ll need to run an X display server on your local system and also set your SSH client to forward (or "tunnel") X11 connections. On most UNIX and Linux systems, the X server will probably be running already. On a Mac or Windows system, there are several choices available, both commercial and free. See our guide to Getting Connected for some suggestions.
Why do I get "connection refused" when trying to connect to a cluster?
OSC temporarily blacklists some IP addresses when multiple failed logins occur. If you are connecting from behind a NAT gateway, as is commonly used for public or campus wireless networks, and get a "connection refused" message it is likely that someone recently tried to connect multiple times and failed when connected to the same network you are on. Please contact OSC Help with your public IP address and the cluster you attempted to connect to and we will remove your IP from the blacklist. You can learn your public IP by searching for "what is my IP address" in Google.
What is a batch request?
On all OSC systems, batch processing is managed by the Simple Linux Utility for Resource Management system (Slurm). Slurm batch requests (jobs) are shell scripts that contain the same set of commands that you enter interactively. These requests may also include options for the batch system that provide timing, memory, and processor information. For more information, see our guide to Batch Processing at OSC.
How do I submit, check the status, and/or delete a batch job?
Slurm uses
sbatch
to submit,squeue
to check the status, andscancel
to delete a batch request. For more information, see our Batch-Related Command Summary.Can I be notified by email when my batch job starts or ends?
Yes. See the
--mail-type
option in our Slurm docoumentation. If you are submitting a large number of jobs, this may not be a good idea.Why won't my job run?
There are numerous reasons why a job might not run even though there appear to be processors and/or memory available. These include:
- Your account may be at or near the job count or processor count limit for an individual user.
- Your group/project may be at or near the job count or processor count limit for a group.
- The scheduler may be trying to free enough processors to run a large parallel job.
- Your job may need to run longer than the time left until the start of a scheduled downtime.
- You may have requested a scarce resource or node type, either inadvertently or by design.
See our Scheduling Policies and Limits for more information.
How can I retrieve files from unexpectedly terminated jobs?
A batch job that terminates before the script is completed can still copy files from
$TMPDIR
to the user's home directory via the use if signals handling. In the batch script, there should be an additional sbatch option added for--signals
. See Signal handling in job scripts for details.If a command in a batch script is killed for excessive memory usage (see Out-of-Memory (OOM) or Excessive Memory Usage for details) then the handler may not be able to fully execute it's commands. However, normal shell scripting can handle this situation: the exit status of a command that may possibly cause an OOM can be checked and appropriate action taken. Here is a Bourne shell example:
bla_bla_big_memory_using_command_that_may_cause_an_OOM if [ $? -ne 0 ]; then cd
$SLURM_SUBMIT_DIR;mkdir
$SLURM_JOB_ID;cp -R $TMPDIR/*
$SLURM_JOB_IDexit fi
Finally, if a node your job is running on crashes then the commands in the signal handler may not be executed. It may be possible to recover your files from batch-managed directories in this case. Contact OSC Help for assistance.
How can I delete all of my jobs on a cluster?
To delete all your jobs on one of the clusters, including those currently running, queued, and in hold, login to the cluster and run the command:
scancel -u <username>How can I determine the number of cores in use by me or my group?
# current jobs queued/running and cpus requested squeue --cluster=all --account=<proj-code> --Format=jobid,partition,name,timeLeft,timeLimit,numCPUS # or for a user squeue --cluster=all -u <username> --Format=jobid,partition,name,timeLeft,timeLimit,numCPUSHow to request GPU nodes for visualization?
By default, we don't start an X server on gpu nodes because it impacts computational performance. Add
vis
in your GPU request such that the batch system uses the GPUs for visualization. For example, on Owens, it should be--nodes=1 --ntasks-per-node=28 --gpus-per-node=1 --gres=vis
What languages are available?
Fortran, C, and C++ are available on all OSC systems. The commands used to invoke the compilers and/or loaders vary from system to system. For more information, see our Compilation Guide.
What compiler (vendor) do you recommend?
We have Intel, PGI, and gnu compilers available on all systems. Each compiler vendor supports some options that the other doesn’t, so the choice depends on your individual needs. For more information, see our Compilation Guide.
Will software built for one system run on another system?
Most serial code built on one system will run on another system, although it may run more efficiently if it is built and run on the same system. Parallel (MPI) code typically must be built on the system where it will run.
What is the difference between installing software on one's local computer and on an OSC cluster?
One major difference is that OSC users cannot install software system wide using package managers. In general, users installing software in their home directories will follow the configure/build/test paradigm that is common on Unix-like operating systems. For more information, see our HOWTO: Locally Installing Software on an OSC cluster.
What is this build error: "... relocation truncated to fit ..."?
OSC users installing software on a cluster occasionally report this error. It is related to memory addressing and is usually fixed by cleaning the current build and rebuilding with the compiler option "-mcmodel=medium". For more details, see the man page for the compiler.
What is parallel processing?
Parallel processing is the simultaneous use of more than one computer (or processor) to solve a problem. There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors or nodes (groups of processors) and between processors and memory.
What parallel processing environments are available?
On most systems, both shared-memory and distributed-memory parallel programming models can be used. Versions of OpenMP (for multithreading or shared-memory usage) and MPI (for message-passing or distributed-memory usage) are available. A summary of parallel environments will be coming soon.
What is a core?
A core is a processor. When a single chip contains multiple processors, they are called cores.
I'm not seeing the performance I expected. How can I be sure my code is running in parallel?
We are currently working on a guide for this. Please contact OSC Help for assistance.
What software applications are available?
See the Software section for more information.
Do you have a newer version of (name your favorite software)?
Check the Software section to see what versions are installed. You can also check the installed modules using the
module spider
or module avail command.How do I get authorized to use a particular software application?
Please contact OSC Help for assistance.
What math routines are available? Do you have ATLAS and LAPACK?
See the Software section for information on third-party math libraries (e.g., MKL, ACML, fftw, scalapack, etc). MKL and ACML are highly optimized libraries that include the BLAS and LAPACK plus some other math routines.
Do you have NumPy/SciPy?
The NumPy and SciPy modules are installed with the python software. See the Python software page.
OSC does not have a particular software package I would like to use. How can I request it?
Please refer to the Software Forms page. You will see a link to Request for Software Form. Download the form, complete the information, and attach the form to an e-mail to oschelp@osc.edu. The Statewide Users Group will consider the request.
You may install open source software yourself in your home directory. If you have your own license for commercial software, contact the OSC Help desk.
I have a software package that must be installed as root. What should I do?
Most packages have a (poorly documented) option to install under a normal user account. Contact the OSC Help desk if you need assistance. We generally do not install user software as root.
What are modules?
Modules are used to manage the environment variable settings associated with software packages in a shell-independent way. On OSC's systems, you will by default have modules in your environment for PBS, MPI, compilers, and a few other pieces of software. For information on using the module system, see our guide to Batch Processing at OSC.
What are MFLOPS/GFLOPS/TFLOPS/PFLOPS?
MegaFLOPS/GigaFLOPS/TeraFLOPS/PetaFLOPS are millions/billions/trillions/quadrillions of FLoating-point Operations (calculations) Per Second.
How do I find out about my code's performance?
A number of performance analysis tools are available on OSC systems. Some are general to all systems and others are specific to a particular system. See our performance analysis guide for more info.
How can I optimize my code?
There are several ways to optimize code. Key areas to consider are CPU optimization, I/O optimization, memory optimization, and parallel optimization. See our optimization strategy guide for more info.
What does "CPU time limit exceeded" mean?
Programs run on the login nodes are subject to strict CPU time limits. To run an application that takes more time, you need to create a batch request. Your batch request should include an appropriate estimate for the amount of time that your application will need. See our guide to Batch Processing at OSC for more information.
My program or file transfer died for no reason after 20 minutes. What happened?
Programs run on the login nodes are subject to strict CPU time limits. Because file transfers use encryption, you may hit this limit when transferring a large file. To run longer programs, use the batch system. To transfer larger files, connect to sftp.osc.edu instead of to a login node.
Why did my program die with a segmentation fault, address error, or signal 11?
This is most commonly caused by trying to access an array beyond its bounds -- for example, trying to access element 15 of an array with only 10 elements. Unallocated arrays and invalid pointers are other causes. You may wish to debug your program using one of the available tools such as the TotalView Debugger.
I created a batch script in a text editor on a Windows or Mac system, but when I submit it on an OSC system, almost every line in the script gives an error. Why is that?
Windows and Mac have different end-of-line conventions for text files than UNIX and Linux systems do, and most UNIX shells (including the ones interpreting your batch script) don't like seeing the extra character that Windows appends to each line or the alternate character used by Mac. You can use the following commands on the Linux system to convert a text file from Windows or Mac format to UNIX format:
dos2unix myfile.txt
mac2unix myfile.txt
I copied my output file to a Windows system, but it doesn't display correctly. How can I fix it?
A text file created on Linux/UNIX will usually display correctly in Wordpad but not in Notepad. You can use the following command on the Linux system to convert a text file from UNIX format to Windows format:
unix2dos myfile.txt
What IP ranges do I need to allow in my firewall to use OSC services?
See our knowledge base article on the topic.
(alphabetical listing)
Authorized users include the principal investigator and secondary investigators who are part of the research team on a project. For classroom accounts, authorized users are the registered students and teaching assistants.
authorized users, adding new ones to existing project
To add a new authorized user to a project, the principal investigator can invite new users or add existing users through OSC client portal
To determine your project balance (budget), please utilize MyOSC or log on to any machine and use the following command: OSCusage
To maintain a positive balance (budget), make sure to submit new budgets using this page for guidance.
A project that allows students to learn high-performance computing or to apply high-performance computing in a particular course through applications. The budget awarded is $500 and can be renewed if needed; credits cover all costs. Please see our classroom guide for more information.
A project contains one or more research activities, which may or may not be related. Each project has a number consisting of a three- or four-letter prefix and four numbers. Principal investigators may have more than one project, but they should be aware that $1,000 annual credit can only apply to one service agreement which can be applied to multiple projects.
These are authorized users other than the principal investigator. The PI is responsible for keeping OSC updated on changes in authorized users.
The Statewide Users Group comprises representatives from Ohio's colleges and universities. The members serve as an advisory body to OSC.
If your research is supported by monetary accounts from funding agencies, the Center appreciates learning of this. Such data helps the Center determine its role in Ohio's research activities.
The Center mainly categorizes projects as a classroom (fully subsidized) or Ohio academic ($1,000 annual grant per PI). There are other types of projects the Center may deem fit, such as commercial.
Unique login name of a user. Make changes to password, shell, email, project access on OSC's client portal, MyOSC (my.osc.edu).