A Simple Performance Model
A simple performance model of the Kogut-Susskind Conjugate Gradient
algorithm gives this bandwidth requirement to overlap communication and
floating point operations:
MB = 48 MF/ (132 L) = 0.364 MF / L ,
where MB is the achieved bandwidth in Megabytes/s, MF is the achieved floating
point speed in Megaflops/s and an L4 portion of the grid is on
A graph shows measured bandwidth for a ping-pong test for three types
of hardware and the performance model for several processor speeds.
Note that this is a log-log plot.
The messages vary in size from 800 bytes to 30 KB for problem sizes of
interest. The arrows near the bottom of the graph correspond to
different L values.
The green and blue curves come from measured performance on the Roadrunner
supercluster at the
Albuquerque High Performance
The Quadric curve comes from the Teracluster at LLNL.
The measurement is done using the Netpipe program from the
Ames Scalable Computing Laboratory
The straight red lines come from the performance model presented and
are plotted for matrix times vector speeds of 50, 100, 200 and 400 MF.
We need to run at a large enough value of L so that the measured bandwidth
is above the red line (for what ever speed our processor achieves for
the corresponding value of L).
Pushing up the communication rate for small messages is important.
It is especially nice when we don't need more expensive hardware. There is
a huge price range among Quadrics, Myrinet and Fast Ethernet.
Make the Right Choices (next slide)
Back to Outline