|
Single Node Performance
It is easy to waste a lot of money on poor system design. To illustrate
this, we consider the variety of AMD Athlon processors available and
their costs. The same considerations apply to Intel or Alpha processors.
Component prices vary a great deal during their lifetime, so we give
a date for the graphs that depend upon price.
Processor price is a rapidly increasing function of speed.
Dividing by the speed of the chip, we still see that the relative expense
rises rapidly for the faster chips. In this case, there was an apparent
sweet spot at 600 MHz. The faster chips have a higher price-performance
ratio. Depending upon the costs of the other components of the system,
the entire system may have a higher (undesirable) or lower (desirable)
price-performance ratio.
For our QCD codes, access to memory is quite important. We demonstrate with
the benchmarks below that performance does not increase in proportion to the
speed of the chip. This is because memory speed is fixed when we compare
500 MHz and 600 MHz Athlons.
When comparing these two tables, we see that for L = 4, for which the
problem fits in cache, there is a 19.5% speedup on the faster processor.
But for all the larger problems, the speedup is only 5%. We expect that
for even faster processors, the memory access will become an even greater
issue and performance increases will be marginal.
Since memory access is so crucial, I have purchased a Pentium III 533B
chip that uses PC133 memory. In theory, it should provide about 33%
better performance than a similar chip with PC100 memory.
I have tried three different
motherboards using different support chips and the results are disappointing.
The Gigabyte GA6VXE+ motherboard uses a VIA chipset, the Supermicro PIIISED
uses the Intel 810e chipset and I also tried an Intel CC820 motherboard
using the Intel 820 chipset. The results are not any better than a PII
350 chip using a BX motherboard.
Currently, one cannot get dual processor motherboards for the Athlon processor;
however, the above results show that it would be a better choice for this code
if single processor motherboards will be used.
|