Data Storage

Resource Icon: OSC Data Storage Documentation

OSC has various storage systems to fulfill different HPC research needs. Information on the each filesystem can be found in data storage technical documentation.

Data storage overview and documentation

Review the overview of the filesystems, storage hardware, and the storage documentation.

Protected data service

Review information about storing data with strict security needs.

Data storage upgrades

OSC's data storage is continually updated and expanded. View the some of the major changes.

2016 storage upgrade

2020 storage upgrade

2022 storage upgrade

Known issues of OSC filesystems

Visit known issues and filter by the filesystem category to view current known issues with filesystems.

Overview of File Systems

OSC has several different file systems where you can create files and directories. The characteristics of those systems and the policies associated with them determine their suitability for any particular purpose. This section describes the characteristics and policies that you should take into consideration in selecting a file system to use.

The various file systems are described in subsequent sections.

Visibility

Most of our file systems are shared. Directories and files on the shared file systems are accessible from all OSC HPC systems. By contrast, local storage is visible only on the node it is located on. Each compute node has a local disk with scratch file space.

Permanence

Some of our storage environments are intended for long-term storage; files are never deleted by the system or OSC staff. Some are intended as scratch space, with files deleted as soon as the associated job exits. Others fall somewhere in between, with expected data lifetimes of a few months to a couple of years.

Backup policies

Some of the file systems are backed up to tape; some are considered temporary storage and are not backed up. Backup schedules differ for different systems.

In no case do we make an absolute guarantee about our ability to recover data. Please read the official OSC data management policies for details. That said, we have never lost backed-up data and have rarely had an accidental loss of non-backed-up data.

Size/Quota

The permanent (backed-up) and scratch file systems all have quotas limiting the amount of file space and the number of files that each user or group can use. Your usage and quota information are displayed every time you log in to one of our HPC systems. You can also check your home directory quota using the quota command. We encourage you to pay attention to these numbers because your file operations, and probably your compute jobs, will fail if you exceed them. If you have extremely large files, you will have to pay attention to the amount of local file space available on different compute nodes.

Performance

File systems have different performance characteristics including read/write speeds and behavior under heavy load. Performance matters a lot if you have I/O-intensive jobs. Choosing the right file system can have a significant impact on the speed and efficiency of your computations. You should never do heavy I/O in your home or project directories, for example.

Table overview

Each file system is configured differently to serve a different purpose:

  Home Directory Project Local Disk Scratch (global) Backup
Path /users/project/userID

/fs/ess

/tmp

/fs/scratch

N/A
Environment Variable $HOME or ~ N/A $TMPDIR $PFSDIR N/A
Space Purpose Permanent storage Long-term storage Temporary Temporary Backup; replicated in Cleveland

 

Backed Up? Daily Daily No No Yes
Flushed No No End of job when $TMPDIR    is used End of job when $PFSDIR is used No
Visibility Login and compute nodes Login and compute nodes Compute node Login and compute nodes N/A
Quota/Allocation 500  GB  of storage and 1,000,000 files Typically 1-5  TB  of storage and 100,000 files per TB. Varies. Depending on node 100  TB  of storage and 25,000,000 files N/A
Total Size 1.9  PB 

/fs/ess: 13.5 PB 

Varies. Depending on system

/fs/scratch: 3.5 PB 

Bandwidth 40  GB/S

Reads: 60 GB/S Writes: 50 GB/S

Varies. Depending on system Reads: 170 GB/S

Writes: 70 GB/S

N/A
Type NetApp  WAFL  service GPFS Varies. Depending on system GPFS  
Supercomputer: 
Service: 

Storage Hardware

The storage at OSC consists of servers, data storage subsystems, and networks providing a number of storage services to OSC HPC systems. The current configuration consists of:

  • A NetApp Network Attached Storage (NAS) for home directories (1.9 PB of storage, 40 GB/s bandwidth)
  • An IBM Elastic Scale System (ESS) to provide project and scratch storage, and add support for various protected data requirements (~16 PB of storage, bandwidth varies depending on Project and Scratch)
  • Local disk storage on each compute node
  • Two IBM tape robots for backups and archival which as of the beginning of 2022:
    • Capable of redundantly storing up to 23.5 PB of data, with copies kept both in Columbus and Cleveland area data centers.
    • Has nearly 14 PB of tapes installed in the tape backup archive, with several additional PB of tapes on hand ready to be installed as needed
    • Anticipated to be scalable via new generations of tape media and drives to over 141 PB of capacity in the coming years

 

Service: 

2016 Storage Service Upgrades

On July 12th, 2016 OSC migrated its old GPFS and Lustre filesystems to new Project and Scratch services, respectively. We've moved 1.22 PB of data, and the new capacities are 3.4 PB for Project, and 1.1 PB for Scratch. If you store data on these services, there are a few important details to note.

Paths have changed

The Project service is now available at /fs/project, and the Scratch service is available at /fs/scratch. We have created symlinks on the Oakley and Ruby clusters to ensure that existing job scripts continue to function; however the symlinks will not be available on future systems, such as Owens. No action is required on your part to continue using your existing job scripts on current clusters.

However, you may wish to start updating your paths accordingly, in preparation for Owens being available later this year.

Data migration details

Project space allocations and Scratch space data was migrated automatically to the new services. For data on the Project service, ACLs, Xattrs, and Atimes were all preserved. However, Xattrs were not preserved for data on the Scratch service.

Additionally, Biomedical Informatics at The Ohio State University had some data moved from a temporary location to its permanent location on the Project service. We had prepared for this, and already provided symlinks so that the data appeared to be in the final location prior to the July 12th downtime, so the move should be mostly transparent to users. However, ALCs, Xattrs, and Atimes were not preserved for this data.

File system

Transfer method

ACLs preserved

Xattrs preserved

Atime preserved

/fs/project

AFM

Yes

Yes

Yes

/fs/lustre

rsync

Yes

No

Yes

/users/bmi

rsync

No

No

No

Full documentation

Full details and documentation of the new service capacities and capabilities are available at https://www.osc.edu/supercomputing/storage-environment-at-osc/

Service: 

2020 Storage Service Upgrades

In March 2020, OSC expanded the existing project and scratch storage filesystems by 8.6 petabytes. Adding the existing storage capacity at OSC, this brings the total storage capacity of OSC to ~14 petabytes.

A petabyte is equivalent to 1,024 terabytes.

New file paths

The new project and scratch storage is available using the path /fs/ess/<project-code> for project space and /fs/ess/scratch/<project-code> for scratch space. Existing data can be reached using the existing paths /fs/project and /fs/scratch.

New project storage allocation requests

Any new project storage allocation requests will be granted on the new storage, as long as the project did not have existing project space. Any new storage allocations will use the file path /fs/ess/<project-code>.

Some projects will have access to the new scratch space at /fs/ess/scratch/<project-code>. We will work with the individual group if access to /fs/ess/scratch/ is granted for that group. 

Migrating storage

Existing project and scratch storage space may be required to move to the new storage space. If this happens, then OSC can optionally setup a symlink or a redirect, so that compatibility for programs and job scripts is maintained for some time. However, redirects are not a permanent solution and will be removed after some time. The members of the project should work to make sure that once the redirect is removed, it does not negatively affect their work at OSC.

Supercomputer: 
Service: 

2022 Storage Service Upgrades

In October 2022, OSC retires the Data Direct Networks (DDN) GRIDScaler system deployed in 2016 and expands the IBM Elastic Storage System (ESS) for both Project and global Scratch services. This expands the total capacity of Project and Scratch storage at OSC to ~16 petabytes with better performance.

A petabyte is equivalent to 1,024 terabytes.

File paths

All project and scratch storage is available using the path /fs/ess/<project-code> for project space and /fs/scratch/<project-code> for scratch space.

Migrating storage

OSC have been migrating all current Project data and Scratch data to the new services since September 2022, and runs the final synchronization of the data during Oct 11 2022 downtime. ACLs and extended attributes for the data are also preserved after the migration. 

During December 13 2022 downtime, OSC cleaned the scratch directories for users who used to have scratch on both DDN and ESS storage (/fs/scratch/<project-code>and /fs/ess/scratch/<project-code>). All directories under /fs/ess/scratch/ points to /fs/scratch/ so they are essentially the same storage.

OSC have setup symlinks for the data on the storage so the compatibility for programs and job scripts is maintained. Please start to update your existing scripts to replace /fs/project/<project-code> with /fs/ess/<project-code> for project; and replace /fs/ess/scratch/<project-code> with  /fs/scratch/<project-code> for scratch.

We encourage you to use /fs/ess/<project-code> for project storage and  /fs/scratch/<project-code> for scratch storage in all future job scripts.

Directories from OnDemand Files App

For users who used to have project space on the DDN storage, you will see /fs/ess/<project-code> instead of /fs/project/<project-code>. Please use the directory /fs/ess/<project-code> , which is your current project space location including all of your previous data on project. 

For users who used to have scratch on the ESS storage, you will see /fs/scratch/<project-code instead of /fs/ess/scratch/<project-code>). Please use the directory /fs/scratch/<project-code, which is your current scratch space location including all of your scratch data. 

Supercomputer: 
Service: 

Protected Data Service

Resource Icon: OSC Protected Data Storage

OSC's Protected Data Service (PDS) is designed to address the most common security control requirements encountered by researchers while also reducing the workload on individual PIs and research teams to satisfy these requirements.

Protected Data at OSC

The OSC cybersecurity program is based upon the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-53, Revision 4 requirements for security, and reflects the additional requirements of established Information Technology (IT) security practices.

OSC currently supports the following protected data types.

  • Personal Health Information (PHI)
    • data covered by Health Insurance Portability and Accountability Act (HIPAA)
  • Research Health Information (RHI)
  • Export Control data
    • International Traffic in Arms Regulations (ITAR)
    • Export Administration Regulations (EAR)
  • Personally Identifiable Information (PII)
  • Proprietary Data

If you need support for a data type that is not listed, please contact OSC Help to discuss.

OSC only provides support for unclassified data processing, regardless of the specific category of that information. No support for data classified at secret or above is provided, and researchers should not, under any circumstance, transfer such data to OSC systems.

Getting started with the Protected Data Service at OSC

OSC's PDS was developed with the intent of meeting the security control requirements of your research agreements and to eliminate the burden placed on PIs who would otherwise be required to maintain their own compliance infrastructure with certification and reporting requirements.

In order to begin a project at OSC with data protection requirements, please follow these steps:

Contact OSC

Send an email to oschelp@osc.edu and describe the project's data requirements.

Consultation

You will hear back from OSC to set up an initial consultation to discuss your project and your data. Based on your project and the data being used, we may request the necessary documentation (data use agreements, BAA, MOU, etc).

Approval

Once OSC receives the necessary documentation, the request to store data on the PDS will be reviewed, and if appropriate, approved. 

All PDS projects require multi-factor authentication (MFA). MFA will be set by OSC when the project is created. 

Get started

OSC will help set up the project and the storage used to store the projected data. Here is a list of useful links:

Manage the protected data and its access

Keep protected data in proper locations

Protected data must be stored in predetermined locations. The only locations at OSC to store protected data are /fs/ess/PDEXXXX and /fs/scratch/PDEXXXX directories.
(Only with prior approval from OSC may a protected data service project not have a project prefix of PDE).

There are other storage locations at OSC, but none of the follwing locations can be used to store protected data because they do not have the proper controls and requirements to safely store it:

  • /users/<project-code>
  • /fs/ess/<non-PDS-project>
  • /fs/scratch/<non-PDS-project>
PDS is the acronym for Protected Data Service.

Project space access controls and permissions should not be altered

The directory permissions where protected data are stored are setup to prevent changing the permissions or access control entries on the top-level directories by regular users. Only members of the project are authorized to access the data; users are not permitted to attempt to share data with unauthroized users. 

The protected data environment will be monitored for unauthorized changes to permissions and access control.

Grant and remove user access to protected data

Protected data directoires will be set with permissions to restrict access to only project users. Project users are determined by group membership. For example, project PDE1234 has a protected data location at /fs/ess/PDE1234 and only users in the group PDE1234 may access data in that directory. 

Adding a user to a project in OSC client portal adds the group to their user account, likewise removing the user from the project, removes their group. See our page for invite, add, remove users.

A user's first project cannot be the secure data project. If a user's first project was the secure data project, then removing them from the project in client portal will not take away their group for that project.

Keep accounts secure

Do not share accounts/passwords, ever. 

A user that logs in with another person's account is able to perform actions on behalf of that person, including unauthorized actions mentioned above.

Securely transferring files to protected data location

Securely transferring files at OSC

Files containing personal health information (PHI) must be encrypted when they are stored (at rest) and when they are transferred between networked systems (in transit).

Transferring files securely to OSC involves understanding which commands/applications to use and which directory to use.

Before transferring files, one should ensure that the proper permissions will be applied once transferred, such as verifying the permissions and acl of the destination directory for a transferred file.

FileZilla

Install filezilla client software and use the filezilla tutorial to transfer files.

Use the client sftp://sftp.osc.edu

Select login type as interactive, as multi-factor authentication will be required to login for protected data projects.

Make sure to use sftp option
It is connected to user's home directory by default.
Need to navigate to /fs/ess/secure_dir before starting the file transfer

Globus

There is guide for using globus on our globus page.

Protected Data Service projects must use the OSC high assurance endpoint or transfers may fail. Ensure protected data is being shared in accordance with its requirements.

Command-line transfers

Files and directories can also be transferred manually on the command line.

secure copy (scp)

scp src <username>@sftp.osc.edu:/fs/ess/secure_dir

sftp

sftp <username>@sftp.osc.edu ## then run sftp transfer commands (get, put, etc.)

rsync

rsync --progress -r local-dir <username>@sftp.osc.edu:/fs/ess/secure_dir