[mvapich-discuss] Shared Memory Performance
Christopher Co
cco2 at cray.com
Mon Jun 15 18:50:50 EDT 2009
I am using MVAPICH2 1.4 with the default configuration (since the CX-1
uses Mellanox Infiniband). I am fairly certain my CPU mapping was
on-node for both cases (curiously, is there a way for MVAPICH2 to print
out the nodes/cores running). I have the numbers for Ping Pong for the
off-node case. I should have included this in my earlier message:
Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time
(usec)
2 1000 0 4.16 3.4
1000 1 4.67 3.56
1000 2 4.21 3.56
1000 4 4.23 3.62
1000 8 4.33 3.63
1000 16 4.33 3.64
1000 32 4.38 3.73
1000 64 4.44 3.92
1000 128 5.61 4.71
1000 256 5.92 5.23
1000 512 6.52 5.79
1000 1024 7.68 7.06
1000 2048 9.97 9.36
1000 4096 12.39 11.97
1000 8192 17.86 22.53
1000 16384 27.44 28.27
1000 32768 40.32 39.82
640 65536 63.61 62.97
320 131072 109.69 110.01
160 262144 204.71 206.9
80 524288 400.72 397.1
40 1048576 775.64 776.45
20 2097152 1523.95 1535.65
10 4194304 3018.84 3054.89
Chris
Dhabaleswar Panda wrote:
> Can you tell us which version of MVAPICH2 you are using and which
> option(s) are configured? Are you using correct CPU mapping in both
> cases?
>
> DK
>
> On Mon, 15 Jun 2009, Christopher Co wrote:
>
>
>> Hi,
>>
>> I am doing performance analysis on a Cray CX1 machine. I have run the
>> Pallas MPI benchmark and have noticed a considerable performance
>> difference between MVAPICH2 and Intel MPI on all the tests when shared
>> memory is used. I have also run the benchmark for non-shared memory and
>> the two performed nearly the same (MVAPICH2 was slightly faster). Is
>> this slowdown on shared memory a known issue and/or are there fixes or
>> switches I can enable or disable to get more speed?
>>
>> To give an idea of what I'm seeing, for the simple Ping Pong test for
>> two processes on the same chip, the numbers looks like:
>>
>> Processes # repetitions
>> #bytes Intel MPI time (usec) MVAPICH2
>> time (usec)
>> 2 1000 0 0.35 0.94
>>
>> 1000 1 0.44 1.24
>>
>> 1000 2 0.45 1.17
>>
>> 1000 4 0.45 1.08
>>
>> 1000 8 0.45 1.11
>>
>> 1000 16 0.44 1.13
>>
>> 1000 32 0.45 1.21
>>
>> 1000 64 0.47 1.35
>>
>> 1000 128 0.48 1.75
>>
>> 1000 256 0.51 2.92
>>
>> 1000 512 0.57 3.41
>>
>> 1000 1024 0.76 3.85
>>
>> 1000 2048 0.98 4.27
>>
>> 1000 4096 1.53 5.14
>>
>> 1000 8192 2.59 8.04
>>
>> 1000 16384 4.86 14.34
>>
>> 1000 32768 7.17 33.92
>>
>> 640 65536 11.65 43.27
>>
>> 320 131072 20.97 66.98
>>
>> 160 262144 39.64 118.58
>>
>> 80 524288 84.91 224.40
>>
>> 40 1048576 212.76 461.80
>>
>> 20 2097152 458.55 1053.67
>>
>> 10 4194304 1738.30 2649.30
>>
>>
>> Hopefully the table came out clear. MVAPICH2 always lags behind by a
>> considerable amount. Any insight is much appreciated. Thanks!
>>
>>
>> Chris Co
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
More information about the mvapich-discuss
mailing list