[mvapich-discuss] Shared Memory Performance
Christopher Co
cco2 at cray.com
Mon Jun 15 18:03:11 EDT 2009
Hi,
I am doing performance analysis on a Cray CX1 machine. I have run the
Pallas MPI benchmark and have noticed a considerable performance
difference between MVAPICH2 and Intel MPI on all the tests when shared
memory is used. I have also run the benchmark for non-shared memory and
the two performed nearly the same (MVAPICH2 was slightly faster). Is
this slowdown on shared memory a known issue and/or are there fixes or
switches I can enable or disable to get more speed?
To give an idea of what I'm seeing, for the simple Ping Pong test for
two processes on the same chip, the numbers looks like:
Processes # repetitions
#bytes Intel MPI time (usec) MVAPICH2
time (usec)
2 1000 0 0.35 0.94
1000 1 0.44 1.24
1000 2 0.45 1.17
1000 4 0.45 1.08
1000 8 0.45 1.11
1000 16 0.44 1.13
1000 32 0.45 1.21
1000 64 0.47 1.35
1000 128 0.48 1.75
1000 256 0.51 2.92
1000 512 0.57 3.41
1000 1024 0.76 3.85
1000 2048 0.98 4.27
1000 4096 1.53 5.14
1000 8192 2.59 8.04
1000 16384 4.86 14.34
1000 32768 7.17 33.92
640 65536 11.65 43.27
320 131072 20.97 66.98
160 262144 39.64 118.58
80 524288 84.91 224.40
40 1048576 212.76 461.80
20 2097152 458.55 1053.67
10 4194304 1738.30 2649.30
Hopefully the table came out clear. MVAPICH2 always lags behind by a
considerable amount. Any insight is much appreciated. Thanks!
Chris Co
More information about the mvapich-discuss
mailing list