PC Clusters for Computational Science - Theory and Practice

This session was held at PC2000, a meeting organized by the Division of Computational Physics of the American Physical Society. The meeting was held in conjunction with the March, 2000 meeting of the APS in Minneapolis.

Networking Options for Beowulf Clusters (PowerPoint)
Thomas L. Sterling, Caltech/JPL
Beowulf clusters are parallel supercomputers built from commodity microprocessors. Their performance is dependent upon good network connectivity. Currently, Fast Ethernet, Gigabit Ethernet and Myrinet are the main options for network hardware. Bandwidth and latency achieved with these hardware options and their associated software will be reviewed. Near term future options, such as SIO, will also be discussed.

What's the Best Node for Your Cluster? (Available in various formats)
Rick Stevens Argonne National Laboratory
A well designed cluster requires a node containing the most appropriate balance of resources for the problems it will be solving. For many scientific problems, memory bandwidth or peripheral bandwidth can be a severe bottleneck, and spending extra money on a faster processor will not increase performance significantly. This talk will cover the options available for cluster nodes including processors, memory speeds and standards, and peripheral busses. There will also be a discussion of when SMP nodes should be used, and how many processors can be accomodated per node.

Lattice QCD on Linux and NT Clusters (HTML)
Kostas Orginos, University of Arizona
Calculations of hadronic properties in Quantum Chromodynamics is a non-perturbative task. Lattice QCD is the only available non-perturbative technique capable of doing such computations. Although the algorithms and the machines have significantly improved during the last years, it is still a very challenging and interesting problem to simulate QCD. Interesting because Lattice QCD is now in the position to make accurate enough computations to guide experiments to the discovery of new physics, and challenging because of the massive computer power needed for such computations. Up to now such computing power was only available through very expensive supercomputers, sometimes especially build for lattice QCD. Recently, the increasing power of the microprocessors used in personal computers, the dropping prices of commodity hardware, and the advances in network technology had made it possible to build relatively cheap clusters of workstations capable of doing large scale Lattice QCD simulations. The MILC collaboration has experience in both building and using such machines. Two Linux clusters have been built, one by Steven Gottlieb in Indiana and one by Carleton DeTar in Utah. We have also been using the NT cluster built at NCSA. The portable MPI based MILC code runs efficiently, without any fine tuning, and shows good scalability on these machines. Furthermore, the cost of $10-$25 per delivered Mflop makes them very attractive compared to the $190-$450 per delivered Mflop of the commercial supercomputers.

Scientific Applications on Workstation Clusters vs. Supercomputers (PowerPoint)
Dave Turner, Ames Laboratory, Iowa State University
The idea of building a 'supercomputer' by connecting many workstations or PCs using a fast network is clearly attractive for several reasons. The cost of such a cluster computer can be an order of magnitude cheaper than a traditional multiprocessor machine while providing the same computational power. Workstation clusters can also be grown over time by simply adding more machines. It is often difficult to efficiently use the computational power of any multiprocessor system due to the limited interprocessor communication rate. It is even more difficult for cluster computers where the bandwidth is lower and the latency higher than for traditional MPP systems. This talk will compare the performance of many applications having different computational and communication characteristics on a wide variety of MPP systems and cluster computers. These include a Cray T3E, an Intel Paragon, SGI SMP systems, and clusters of PCs connected by Fast Ethernet, an Alpha cluster connected by Gigabit Ethernet, and a cluster of dual-processor IBM Power3 systems connected by Gigabit Ethernet. The applications used for this analysis cover a broad range as far as how demanding they are on the communication system. They include classical and tight-binding molecular dynamics codes, an ab initio plane wave program, and a finite difference electromagnetic wave propagation code. This talk will conclude with a discussion of the work that is being done to overcome some of the current limitations of cluster computers.

Also available from Dave Turner is a short PowerPoint presentation on Gigabit Ethernet.

The Avalon Beowulf Cluster: A Dependable Tool for Scientific Simulation
Michael Warren, Los Alamos National Laboratory
Avalon is a 140 processor Alpha/Linux Beowulf cluster constructed entirely from commodity personal computer technology and freely available software. Computational Physics simulations performed on Avalon resulted in the award of a 1998 Gordon Bell price/performance prize for significant achievement in parallel processing. Avalon ranked as the 113th fastest computer in the world on the November 1998 TOP500 list, obtaining a result of 48.6 Gigaflops on the parallel Linpack benchmark.
The price of hardware and final assembly labor for Avalon totalled $313,000 dollars in the fall of 1998. Avalon currently provides over 15,000 node-hours of production computing time per week, split among about 10 production users. Obtaining an equivalent amount of computing through Los Alamos institutional sources would cost a minimum of $30,000 per week. The machine also supports code development for another 60 users. Significant simulations have been performed on Avalon in fields of astrophysics, molecular dynamics, nonlinear dynamics as well as other areas. The largest single simulation performed on Avalon computed a total of over 1016 floating point operations.
We will describe some of the applications which have obtained good performance on Avalon, and their characteristics. Our goal has been to provide dependable cycles for computational physics, and not to perform research into clustered computing systems. One of the main lessons learned from the Avalon project is that the details of the hardware are not nearly as important as the attitudes and expectations of the users and managers of the hardware.

If you have comments or suggestions, email Steven Gottlieb at sg@indiana.edu