Single Node Performance

It is easy to waste a lot of money on poor system design. To illustrate this, we consider the variety of AMD Athlon processors available and their costs. The same considerations apply to Intel or Alpha processors. Component prices vary a great deal during their lifetime, so we give a date for the graphs that depend upon price.

Processor price is a rapidly increasing function of speed.

Dividing by the speed of the chip, we still see that the relative expense rises rapidly for the faster chips. In this case, there was an apparent sweet spot at 600 MHz. The faster chips have a higher price-performance ratio. Depending upon the costs of the other components of the system, the entire system may have a higher (undesirable) or lower (desirable) price-performance ratio.

For our QCD codes, access to memory is quite important. We demonstrate with the benchmarks below that performance does not increase in proportion to the speed of the chip. This is because memory speed is fixed when we compare 500 MHz and 600 MHz Athlons.

Results for 500 MHz Athlon
--------------------------
L= 4  nodes= 1  230.80 +/- 5.10 MF/node
L= 6  nodes= 1  128.51 +/- 0.70 MF/node
L= 8  nodes= 1  97.20 +/- 0.30 MF/node
L= 10  nodes= 1  91.85 +/- 0.11 MF/node
L= 12  nodes= 1  89.91 +/- 0.06 MF/node
L= 14  nodes= 1  88.53 +/- 0.07 MF/node

Results for 600 MHz Athlon
--------------------------
L= 4  nodes= 1  275.71 +/- 6.50 MF/node
L= 6  nodes= 1  135.16 +/- 0.67 MF/node
L= 8  nodes= 1  102.16 +/- 0.09 MF/node
L= 10  nodes= 1  96.50 +/- 0.09 MF/node
L= 12  nodes= 1  94.51 +/- 0.02 MF/node
L= 14  nodes= 1  93.43 +/- 0.15 MF/node

When comparing these two tables, we see that for L = 4, for which the problem fits in cache, there is a 19.5% speedup on the faster processor. But for all the larger problems, the speedup is only 5%. We expect that for even faster processors, the memory access will become an even greater issue and performance increases will be marginal.

Since memory access is so crucial, I have purchased a Pentium III 533B chip that uses PC133 memory. In theory, it should provide about 33% better performance than a similar chip with PC100 memory. I have tried three different motherboards using different support chips and the results are disappointing. The Gigabyte GA6VXE+ motherboard uses a VIA chipset, the Supermicro PIIISED uses the Intel 810e chipset and I also tried an Intel CC820 motherboard using the Intel 820 chipset. The results are not any better than a PII 350 chip using a BX motherboard.

::::::::::::::
Gigabyte_GA6VXE+
::::::::::::::
L= 4  nodes= 1  185.70 +/- 4.33 MF/node
L= 6  nodes= 1  106.01 +/- 0.15 MF/node
L= 8  nodes= 1  81.08 +/- 0.01 MF/node
L= 10  nodes= 1  75.78 +/- 0.01 MF/node
L= 12  nodes= 1  75.86 +/- 0.01 MF/node
L= 14  nodes= 1  73.37 +/- 0.00 MF/node
::::::::::::::
Intel_CC820
::::::::::::::
L= 4  nodes= 1  181.55 +/- 3.62 MF/node
L= 6  nodes= 1  97.57 +/- 0.09 MF/node
L= 8  nodes= 1  75.73 +/- 0.03 MF/node
L= 10  nodes= 1  71.61 +/- 0.07 MF/node
L= 12  nodes= 1  70.42 +/- 0.01 MF/node
L= 14  nodes= 1  70.15 +/- 0.00 MF/node
::::::::::::::
Supermicro_PIIISED
::::::::::::::
L= 4  nodes= 1  174.06 +/- 3.01 MF/node
L= 6  nodes= 1  93.91 +/- 0.18 MF/node
L= 8  nodes= 1  72.89 +/- 0.02 MF/node
L= 10  nodes= 1  70.12 +/- 0.02 MF/node
L= 12  nodes= 1  69.28 +/- 0.00 MF/node
L= 14  nodes= 1  68.88 +/- 0.05 MF/node

Currently, one cannot get dual processor motherboards for the Athlon processor; however, the above results show that it would be a better choice for this code if single processor motherboards will be used.

Graph of Performance Model (next slide)

Back to Outline