Lattice QCD on the IU SP

Steven Gottlieb, Fermilab and Indiana University
  http://physics.indiana.edu/~sg/


 

QCD is the theory of the strong interaction. We can write down the theory, but there is no known analytic solution.

A numerical approach uses a regular grid of lattice points in spacetime

Quarks are represented by 3-component complex vectors at each grid point

Gluons are represented by 3 X 3 complex matrices on each link between points

Much of the computational time is spent on sparse matrix inversion

Domain decomposition is used to parallelize the code

This Spring the MILC code was modified to improve cache performance, by going from a "site major" to a "field major" organization. In site major approach all physical observables at each grid point are stored together. In field major, each observable gets its own array.

This results in excellent improvement on problems that don't fit in cache. The benchmarks presented here were run on grids of size L4.

Speedup from using field major variables on an IBM SP (375 MHz Power 3)
L old code
(MF)
new code
(MF)
speedup
new/old
4 512 663 1.29
6 458 705 1.54
8 391 682 1.74
10 215 557 2.58
12 158 528 3.35
14 135 449 3.32

For multinode benchmarks we use L4 grid points per cpu.  On the IBM SP we have results on up to 256 CPUs with both site major and field major inverters.  These were obtained on the Indiana University SP using 4-way SMP nodes.  The field major code results in a substantial increase in speed for L > 8, but for reasons not yet understood, it underperforms the site major code for small L and larger numbers of CPUs.

Performance in Megaflops per CPU on the Indiana University IBM SP using the site major code
L 1
CPU
2
CPUs
4
CPUs
8
CPUs
16
CPUs
32
CPUs
64
CPUs
128
CPUs
256
CPUs
4 432 305 280 110 84 78 72 64 54
6 438 382 369 252 208 196 183 171 151
8 375 342 340 276 239 231 223 204 181
10 235 208 157 138 127 125 122 115 108
12 153 128 81 77 73 73 72 69 67
14 133 112 67 65 63 63 62 61 59
Performance in Megaflops per CPU on the Indiana University IBM SP using the field major code
L 1
CPU
2
CPUs
4
CPUs
8
CPUs
16
CPUs
32
CPUs
64
CPUs
128
CPUs
256
CPUs
4 588 353 319 98 68 62 49 52 41
6 631 515 484 245 170 160 133 132 104
8 624 548 529 316 230 218 210 176 140
10 579 503 471 292 224 212 203 184 159
12 478 386 266 192 159 154 148 139 127
14 420 293 174 148 131 128 124 117 109

About 200,000 cpu hours on the IU SP have been used over the past 3-4 months to study B-meson decay constants.
 



Other cluster talks