Data storage

Resource Icon: OSC Data Storage Documentation

 

 

OSC has various storage systems to fulfill different HPC research needs. Information on the each filesystem can be found in data storage technical documentation.

Data storage overview and documentation

Review the overview of the filesystems and the storage documentation.

Protected data storage

Review information about storing data with strict security needs.

Data storage upgrades

OSC's data storage is continually updated and expanded. View the some of the major changes.

2016 storage upgrade

2020 storage upgrade

Known issues of OSC filesystems

Visit known issues and filter by the filesystem category to view current known issues with filesystems.

Overview of File Systems

OSC has several different file systems where you can create files and directories. The characteristics of those systems and the policies associated with them determine their suitability for any particular purpose. This section describes the characteristics and policies that you should take into consideration in selecting a file system to use.

The various file systems are described in subsequent sections.

Visibility

Most of our file systems are shared. Directories and files on the shared file systems are accessible from all OSC HPC systems. By contrast, local storage is visible only on the node it is located on. Each compute node has a local disk with scratch file space.

Permanence

Some of our storage environments are intended for long-term storage; files are never deleted by the system or OSC staff. Some are intended as scratch space, with files deleted as soon as the associated job exits. Others fall somewhere in between, with expected data lifetimes of a few months to a couple of years.

Backup policies

Some of the file systems are backed up to tape; some are considered temporary storage and are not backed up. Backup schedules differ for different systems.

In no case do we make an absolute guarantee about our ability to recover data. Please read the official OSC data management policies for details. That said, we have never lost backed-up data and have rarely had an accidental loss of non-backed-up data.

Size/Quota

The permanent (backed-up) and scratch file systems all have quotas limiting the amount of file space and the number of files that each user or group can use. Your usage and quota information are displayed every time you log in to one of our HPC systems. You can also check your home directory quota using the quota command. We encourage you to pay attention to these numbers because your file operations, and probably your compute jobs, will fail if you exceed them. If you have extremely large files, you will have to pay attention to the amount of local file space available on different compute nodes.

Performance

File systems have different performance characteristics including read/write speeds and behavior under heavy load. Performance matters a lot if you have I/O-intensive jobs. Choosing the right file system can have a significant impact on the speed and efficiency of your computations. You should never do heavy I/O in your home or project directories, for example.

Table overview

Each file system is configured differently to serve a different purpose:

  Home Directory Project Local Disk Scratch (global) Tape
Path /users/project/userID

/fs/project

/fs/ess

/tmp

/fs/scratch

/fs/ess/scratch

N/A
Environment Variable $HOME or ~ N/A $TMPDIR $PFSDIR N/A
Space Purpose Permanent storage Long-term storage Temporary Temporary Archive
Backed Up? Daily Daily No No Yes
Flushed No No End of job when $TMPDIR    is used End of job when $PFSDIR is used No
Visibility Login and compute nodes Login and compute nodes Compute node Login and compute nodes N/A
Quota/Allocation 500  GB  of storage and 1,000,000 files Typically 1-5  TB  of storage and 1,000,000 files Varies. Depending on node 100  TB  of storage and 25,000,000 files N/A
Total Size 900  TB

/fs/project: 3,400  TB 

/fs/ess: Varies

Varies. Depending on system

/fs/scratch: 1,000  TB

/fs/ess/scratch: Varies

5,500  TB
Bandwidth 10  GB/S 40 to 50  GB/S Varies. Depending on system 100  GB/S 3.5  GB/S
Type NetApp  WAFL  service GPFS Varies. Depending on system GPFS LTO  tape
Supercomputer: 
Service: 

2016 Storage Service Upgrades

On July 12th, 2016 OSC migrated its old GPFS and Lustre filesystems to new Project and Scratch services, respectively. We've moved 1.22 PB of data, and the new capacities are 3.4 PB for Project, and 1.1 PB for Scratch. If you store data on these services, there are a few important details to note.

Paths have changed

The Project service is now available at /fs/project, and the Scratch service is available at /fs/scratch. We have created symlinks on the Oakley and Ruby clusters to ensure that existing job scripts continue to function; however the symlinks will not be available on future systems, such as Owens. No action is required on your part to continue using your existing job scripts on current clusters.

However, you may wish to start updating your paths accordingly, in preparation for Owens being available later this year.

Data migration details

Project space allocations and Scratch space data was migrated automatically to the new services. For data on the Project service, ACLs, Xattrs, and Atimes were all preserved. However, Xattrs were not preserved for data on the Scratch service.

Additionally, Biomedical Informatics at The Ohio State University had some data moved from a temporary location to its permanent location on the Project service. We had prepared for this, and already provided symlinks so that the data appeared to be in the final location prior to the July 12th downtime, so the move should be mostly transparent to users. However, ALCs, Xattrs, and Atimes were not preserved for this data.

File system

Transfer method

ACLs preserved

Xattrs preserved

Atime preserved

/fs/project

AFM

Yes

Yes

Yes

/fs/lustre

rsync

Yes

No

Yes

/users/bmi

rsync

No

No

No

Full documentation

Full details and documentation of the new service capacities and capabilities are available at https://www.osc.edu/supercomputing/storage-environment-at-osc/

Service: 

2020 Storage Service Upgrades

In March 2020, OSC expanded the existing project and scratch storage filesystems by 8.6 petabytes. Adding the existing storage capacity at OSC, this brings the total storage capacity of OSC to ~14 petabytes.

A petabyte is equivalent to 1,024 terabytes.

New file paths

The new project and scratch storage is available using the path /fs/ess/<project-code> for project space and /fs/ess/scratch/<project-code> for scratch space. Existing data can be reached using the existing paths /fs/project and /fs/scratch.

New project storage allocation requests

Any new project storage allocation requests will be granted on the new storage, as long as the project did not have existing project space. Any new storage allocations will use the file path /fs/ess/<project-code>.

Some projects will have access to the new scratch space at /fs/ess/scratch/<project-code>. We will work with the individual group if access to /fs/ess/scratch/ is granted for that group. 

Migrating storage

Existing project and scratch storage space may be required to move to the new storage space. If this happens, then OSC can optionally setup a symlink or a redirect, so that compatibility for programs and job scripts is maintained for some time. However, redirects are not a permanent solution and will be removed after some time. The members of the project should work to make sure that once the redirect is removed, it does not negatively affect their work at OSC.

Supercomputer: 
Service: 

Protected Data Storage

Resource Icon: OSC Protected Data Storage

OSC's Protected Data Storage (PDS) is designed to address the most common security control requirements encountered by researchers while also reducing the workload on individual PIs and research teams to satisfy these requirements.

Protected Data at OSC

The OSC cybersecurity program is based upon the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-53, Revision 4 requirements for security, and reflects the additional requirements of established Information Technology (IT) security practices.

OSC currently supports the following protected data types.

  • Health Insurance Portability and Accountability Act (HIPAA)
  • Export Control data
    • International Traffic in Arms Regulations (ITAR)
    • Export Administration Regulations (EAR)
  • Personally Identifiable Information (PII) / Personal Health Information (PHI)
  • Research Health Information (RHI)
  • Proprietary Data

If you need support for a data type that is not listed, please contact OSC Help to discuss.

OSC only provides support for unclassified data processing, regardless of the specific category of that information. No support for data classified at secret or above is provided, and researchers should not, under any circumstance, transfer such data to OSC systems.

Getting started with the Protected Data Service at OSC

OSC's PDS was developed with the intent of meeting the security control requirements of your research agreements and to eliminate the burden placed on PIs who would otherwise be required to maintain their own compliance infrastructure with certification and reporting requirements.

In order to begin a project at OSC with data protection requirements, please follow these steps:

Contact OSC

Send an email to oschelp@osc.edu and describe the project's data requirements.

Consultation

You will hear back from OSC to set up an intial consultation to dicsuss your project and your data. Based on your project and the data being used, we will request the necessary documentation (data use agreements, BAA, MOU, etc).

Approval

Once OSC receives the necessary documentation, the request to store data on the PDS will be reviewed, and if appropriate, approved.

Get started

Once OSC receives the necessary documentation, the request to store data on the PDS will be reviewed, and if appropriate, approved.

Please visit the getting started documentation for starting research at OSC and the various other pages regarding protected data, including:

Important protected data notes

Keep protected data in proper locations

Do not move or copy data outside the project space /fs/ess/<project-code> without PI approval. Protected data must be stored in predetermined locations. Moving protected data to locations outside of the original /fs/ess/<project-code> path is not permitted because other locations may not have the proper controls and requirements to safely store it. To reiterate, there are many other storage locations at OSC:

  • /users/<project-code>
  • /fs/project/<project-code>
  • /fs/scratch/<project-code>
  • /fs/ess/scratch/<project-code>
none of the above locations can be used to store protected data, only /fs/ess/<project-code> dir can be used.

Project space access controls and permissions should not be altered

Do not adjust permissions of project space without PI approval.

The project space permissions where protected data will be stored was setup to prevent unauthorized access to the data. Altering these permissions without approval could lead to the data being exposed and a violation.

Keep accounts secure

Do not share passwords, ever. Sharing passwords is not authorized.

A user that logs in with another person's account is able to perform actions on behalf of that person, including unauthorized actions mentioned above.

Securely transferring files to protected data location

Securely transferring files at OSC

Files containing personal health information (PHI) must be encrypted when they are stored (at rest) and when they are transferred between networked systems (in transit).

Transferring files securely to OSC involves understanding which commands/applications to use and which directory to use.

Before transferring files, one should ensure that the proper permissions will be applied once transferred, such as verifying the permissions and acl of the dest dir for a transferred file.

FileZilla

Install filezilla client software and use the https://wiki.filezilla-project.org/FileZilla_Client_Tutorial_(en) page to transfer files.

Use the client sftp://sftp.osc.edu

Select login type as interactive, as multi-factor authentication will be required to login for protected data projects.

Make sure to use sftp option
It is connected to user's home directory by default.
Need to navigate to /fs/ess/secure_dir before starting the file transfer

Globus

There is guide for using globus on our globus page.

Command-line transfers

Files and directories can also be transferred manually on the command line.

secure copy (scp)

scp src <username>@sftp.osc.edu:/fs/ess/secure_dir

sftp

sftp <username>@sftp.osc.edu ## then run sftp transfer commands (get, put, etc.)

rsync

rsync --progress local-dir <username>@sftp.osc.edu:/fs/ess/secure_dir

providing access to protected data locations

Providing access to protected data locations

PHI data transferred to OSC will be set with permissions to restrict access to only project users. Project users are determined by group membership. For example, project PEX1234 has a protected data location at /fs/ess/PEX1234 and only users in the group PEX1234 may access data in that dir.

See access control list how-to for more information on how it works.

Grant and remove user access to protected data

See our page for invite, add, remove users.

Adding a user to a project in OSC client portal adds the group to their user account, likewise removing the user from the project, removes their group.

A user's first project cannot be the secure data project. If a user's first project was the secure data project, then removing them from the project in client portal will not take away their group for that project.