Supercomputing Networking Research Education Ohio Supercomputer Center Site Map Staff Directory Support

I/O Approaches for Data Intensive Computing

Description

As both computing power and storage capacity increase, scientists are able to generate and store simulation results at an unprecedented rate. These increases in capability and capacity can easily lead to situations in which researchers produce data faster than they can analyze it. This can be exacerbated by home-grown data file formats, which may be tied to a particular platform or programming language. Furthermore, some applications' data access patterns may lend themselves to more sophisticated storage services such as relational databases, parallel file systems, or hierarchical storage management.

This workshop will cover techniques for storing simulation data in formats that are portable, language independent, self- documenting, and easily annotated. Special emphasis will be placed upon approaches which are highly scalable or exhibit high performance for common usage patterns.

Topics covered will include:

  • Low-level I/O APIs
    • POSIX I/O
    • Extensions
  • I/O Middleware
    • Hierarchical storage management
    • Grid middleware
    • MPI I/O
    • Relational databases and SQL
  • High-level I/O Frameworks
    • NetCDF
    • HDF5

Each major topic area will include at least one case study describing how one of the APIs discussed in that section could be applied to a particular science area.

Prerequisites

Attendees should be familiar with parallel programming using MPI in either C or Fortran

Target Audience

Users with datasets in the tens of gigabytes to terabyte range

Method of Delivery

Lecture and hands-on laboratory

Handouts

May 2006, Troy Baer (troy@osc.edu), PDF