Implementing the Lustre File System with Sun Storage: High Performance Storage for High Performance Computing

Much of the focus of high performance computing (HPC) has centered on CPU performance. However, as computing requirements grow, HPC clusters are demanding higher rates of aggregate data throughput. Today's clusters feature larger numbers of nodes with increased compute speeds. The higher clock rates and operations per clock cycle create increased demand for local data on each node. In addition, InfiniBand and other high-speed, low-latency interconnects increase the data throughput available to each node.

Traditional shared file systems such as NFS have not been able to scale to meet this growing demand for data throughput on HPC clusters. Scalable cluster file systems that can provide parallel data access to hundreds of nodes and petabytes of storage are needed to provide the high data throughput required by large HPC applications, including manufacturing, electronic design, and research.

This paper describes an implementation of the Sun Lustre file system as a scalable storage cluster using Sun Fire servers, high-speed/low-latency InfiniBand interconnects, and additional networking and storage devices. Furthermore, this paper explores the use of the Sun Lustre file system at a shared government and education research site, including configuration information and details on testing that was performed on-site to evaluate the performance of Sun's scalable storage solution.