blueprint

Sun N1 Grid Engine Software and the Tokyo Institute of Technology Super Computer Grid

One of the world's leading technical institutes, the Tokyo Institute of Technology (Tokyo Tech) created the fastest supercomputer in Asia, and one of the largest outside of the United States. Using Sun x64 servers and data servers deployed in a grid architecture, Tokyo Tech built a cost-effective, flexible supercomputer that meets the demands of compute- and data-intensive applications. Built in just 35 days, the TSUBAME grid includes hundreds of systems incorporating thousands of processor cores and terabytes of memory, and delivers 47.38 trillion1 floating-point operations per second (TeraFLOPS) of sustained LINPACK benchmark performance and 1.1 petabyte of storage to users running common off-the-shelf applications. Based on the deployment architecture, the grid is expected to reach 100 TeraFLOPS in the future. This Sun BluePrints article provides an overview of the Tokyo Tech grid, named TSUBAME. The third in a series of Sun BluePrints articles on the TSUBAME grid, this document provides an overview of the overall system architecture of the grid, as well as a detailed look at the configuration of the Sun N1 Grid Engine software that makes the grid accessible to users.

Implementing the Lustre File System with Sun Storage: High Performance Storage for High Performance Computing

Much of the focus of high performance computing (HPC) has centered on CPU performance. However, as computing requirements grow, HPC clusters are demanding higher rates of aggregate data throughput. Today's clusters feature larger numbers of nodes with increased compute speeds. The higher clock rates and operations per clock cycle create increased demand for local data on each node. In addition, InfiniBand and other high-speed, low-latency interconnects increase the data throughput available to each node.

Traditional shared file systems such as NFS have not been able to scale to meet this growing demand for data throughput on HPC clusters. Scalable cluster file systems that can provide parallel data access to hundreds of nodes and petabytes of storage are needed to provide the high data throughput required by large HPC applications, including manufacturing, electronic design, and research.

This paper describes an implementation of the Sun Lustre file system as a scalable storage cluster using Sun Fire servers, high-speed/low-latency InfiniBand interconnects, and additional networking and storage devices. Furthermore, this paper explores the use of the Sun Lustre file system at a shared government and education research site, including configuration information and details on testing that was performed on-site to evaluate the performance of Sun's scalable storage solution.

Syndicate content