[mvapich-discuss] Shared Memory Performance

Mon Jun 15 18:50:50 EDT 2009

I am using MVAPICH2 1.4 with the default configuration (since the CX-1
uses Mellanox Infiniband).  I am fairly certain my CPU mapping was
on-node for both cases (curiously, is there a way for MVAPICH2 to print
out the nodes/cores running).  I have the numbers for Ping Pong for the
off-node case.  I should have included this in my earlier message:
Processes 	# repetitions 	#bytes 	Intel MPI time (usec)] 	MVAPICH2 time
(usec)
2 	1000 	0 	4.16 	3.4

	1000 	1 	4.67 	3.56

	1000 	2 	4.21 	3.56

	1000 	4 	4.23 	3.62

	1000 	8 	4.33 	3.63

	1000 	16 	4.33 	3.64

	1000 	32 	4.38 	3.73

	1000 	64 	4.44 	3.92

	1000 	128 	5.61 	4.71

	1000 	256 	5.92 	5.23

	1000 	512 	6.52 	5.79

	1000 	1024 	7.68 	7.06

	1000 	2048 	9.97 	9.36

	1000 	4096 	12.39 	11.97

	1000 	8192 	17.86 	22.53

	1000 	16384 	27.44 	28.27

	1000 	32768 	40.32 	39.82

	640 	65536 	63.61 	62.97

	320 	131072 	109.69 	110.01

	160 	262144 	204.71 	206.9

	80 	524288 	400.72 	397.1

	40 	1048576 	775.64 	776.45

	20 	2097152 	1523.95 	1535.65

	10 	4194304 	3018.84 	3054.89

Chris

Dhabaleswar Panda wrote:
> Can you tell us which version of MVAPICH2 you are using and which
> option(s) are configured? Are you using correct CPU mapping in both
> cases?
>
> DK
>
> On Mon, 15 Jun 2009, Christopher Co wrote:
>
>   
>> Hi,
>>
>> I am doing performance analysis on a Cray CX1 machine.  I have run the
>> Pallas MPI benchmark and have noticed a considerable performance
>> difference between MVAPICH2 and Intel MPI on all the tests when shared
>> memory is used.  I have also run the benchmark for non-shared memory and
>> the two performed nearly the same (MVAPICH2 was slightly faster).  Is
>> this slowdown on shared memory a known issue and/or are there fixes or
>> switches I can enable or disable to get more speed?
>>
>> To give an idea of what I'm seeing, for the simple Ping Pong test for
>> two processes on the same chip, the numbers looks like:
>>
>>              Processes 	           # repetitions
>> #bytes 	                Intel MPI time (usec) 	                MVAPICH2
>> time (usec)
>> 2 	1000 	0 	0.35 	0.94
>>
>> 	1000 	1 	0.44 	1.24
>>
>> 	1000 	2 	0.45 	1.17
>>
>> 	1000 	4 	0.45 	1.08
>>
>> 	1000 	8 	0.45 	1.11
>>
>> 	1000 	16 	0.44 	1.13
>>
>> 	1000 	32 	0.45 	1.21
>>
>> 	1000 	64 	0.47 	1.35
>>
>> 	1000 	128 	0.48 	1.75
>>
>> 	1000 	256 	0.51 	2.92
>>
>> 	1000 	512 	0.57 	3.41
>>
>> 	1000 	1024 	0.76 	3.85
>>
>> 	1000 	2048 	0.98 	4.27
>>
>> 	1000 	4096 	1.53 	5.14
>>
>> 	1000 	8192 	2.59 	8.04
>>
>> 	1000 	16384 	4.86 	14.34
>>
>> 	1000 	32768 	7.17 	33.92
>>
>> 	640 	65536 	11.65 	43.27
>>
>> 	320 	131072 	20.97 	66.98
>>
>> 	160 	262144 	39.64 	118.58
>>
>> 	80 	524288 	84.91 	224.40
>>
>> 	40 	1048576 	212.76 	461.80
>>
>> 	20 	2097152 	458.55 	1053.67
>>
>> 	10 	4194304 	1738.30 	2649.30
>>
>>
>> Hopefully the table came out clear.  MVAPICH2 always lags behind by a
>> considerable amount.  Any insight is much appreciated.  Thanks!
>>
>>
>> Chris Co
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>     
>
>