A Simple Performance Model

A simple performance model of the Kogut-Susskind Conjugate Gradient algorithm gives this bandwidth requirement to overlap communication and floating point operations:

MB = 48 MF/ (132 L) = 0.364 MF / L ,
where MB is the achieved bandwidth in Megabytes/s, MF is the achieved floating point speed in Megaflops/s and an L4 portion of the grid is on each node.

A graph shows measured bandwidth for a ping-pong test for three types of hardware and the performance model for several processor speeds.

  • Note that this is a log-log plot.
  • The messages vary in size from 800 bytes to 30 KB for problem sizes of interest. The arrows near the bottom of the graph correspond to different L values.
  • The green and blue curves come from measured performance on the Roadrunner supercluster at the Albuquerque High Performance Computer Center. The Quadric curve comes from the Teracluster at LLNL. The measurement is done using the Netpipe program from the Ames Scalable Computing Laboratory
  • The straight red lines come from the performance model presented and are plotted for matrix times vector speeds of 50, 100, 200 and 400 MF. We need to run at a large enough value of L so that the measured bandwidth is above the red line (for what ever speed our processor achieves for the corresponding value of L).
  • Pushing up the communication rate for small messages is important.
  • It is especially nice when we don't need more expensive hardware. There is a huge price range among Quadrics, Myrinet and Fast Ethernet.

Make the Right Choices (next slide)

Back to Outline