Systems Research

PVFS in the WAN

Principal Investigators: Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff
Funding Source: Data Intensive Computing Environment (DICE) Program
Duration: 7/1/07 - 3/20/08

Description: Utilizing geographically separated resources via wide-area networks is a good way to take advantage of multiple computational engines and storage pools. There are numerous examples of file systems that have been adapted for use in the wide area, often relying on fast networks. A fundamental challenge of wide-area data access arises from the unavoidable communication latency. In order to alleviate the effects of limited bandwidth and high latency, we are investigating a framework for wide-area file systems that includes metadata mirroring and data caching, targeted for read-only data access at remote sites. In this common scenario, metadata access on remote sites incurs no network overhead, and frequently accessed files are cached but are kept consistent with respect to the source file system. The system exerts minimal load at the data site while keeping remote sites consistent.

This project was funded by the DICE program, and has been divided up into three parts.
Phase 1 Utilize the DICE environment to characterize file systems in the Wide Area Network

Phase 2 Implement a framework for loosely couple wide area file system access

Phase 3 Gather results and publish data.

Below we will present our initial findings from Phase 1 using the Parallel Virtual File System (PVFS2) in the Wide Area. The results were gathered using two remote clusters, in the DICE program.

As is pretty clear, the factor most limiting performance is (as we expected) the speed of the link between the two sites, largely attributed to the high latency. The minimal (4 byte) latency between the two sites was measured to be 38.6 ms.

The results gathered here confirm our initial theory and will help us toward our goal of increasing cache effectiveness and metadata performance for file systems in the WAN. We have also completed the other two phases of the project and are awaiting publication of our paper detailing the softare developed in Phase 2 and the results collected in Phase 3.

More information and results will be posted to this web page on an on-going basis as they become available.


The four graphs that follow are from Phase 1, and show the read, read & sync, write, and write & sync scenarios. One important thing to notice is that the max single thput line is very near to that of the n=1 case, this tells us that we are getting the most out of the link that we can, at least very near.