I/O Approaches for Data Intensive Computing
Description
As both computing power and storage capacity increase, scientists are able
to generate and store simulation results at an unprecedented rate. These
increases in capability and capacity can easily lead to situations in which
researchers produce data faster than they can analyze it. This can be
exacerbated by home-grown data file formats, which may be tied to a
particular platform or programming language. Furthermore, some applications'
data access patterns may lend themselves to more sophisticated storage
services such as relational databases, parallel file systems, or
hierarchical storage management.
This workshop will cover techniques for storing simulation data in formats
that are portable, language independent, self- documenting, and easily
annotated. Special emphasis will be placed upon approaches which are highly
scalable or exhibit high performance for common usage patterns.
Topics covered will include:
- Low-level I/O APIs
- I/O Middleware
- Hierarchical storage management
- Grid middleware
- MPI I/O
- Relational databases and SQL
- High-level I/O Frameworks
Each major topic area will include at least one case study describing how one of the APIs discussed in that section could be applied to a particular science area.
Prerequisites
Attendees should be familiar with parallel programming using MPI in either C or Fortran
Target Audience
Users with datasets in the tens of gigabytes to terabyte range
Method of Delivery
Lecture and hands-on laboratory
Handouts
May 2006, Troy Baer (troy@osc.edu),
PDF
|