[mvapich-discuss] Shared Memory Performance

Christopher Co cco2 at cray.com
Mon Jun 15 18:03:11 EDT 2009


Hi,

I am doing performance analysis on a Cray CX1 machine.  I have run the
Pallas MPI benchmark and have noticed a considerable performance
difference between MVAPICH2 and Intel MPI on all the tests when shared
memory is used.  I have also run the benchmark for non-shared memory and
the two performed nearly the same (MVAPICH2 was slightly faster).  Is
this slowdown on shared memory a known issue and/or are there fixes or
switches I can enable or disable to get more speed?

To give an idea of what I'm seeing, for the simple Ping Pong test for
two processes on the same chip, the numbers looks like:

             Processes 	           # repetitions 	                 
#bytes 	                Intel MPI time (usec) 	                MVAPICH2
time (usec)
2 	1000 	0 	0.35 	0.94

	1000 	1 	0.44 	1.24

	1000 	2 	0.45 	1.17

	1000 	4 	0.45 	1.08

	1000 	8 	0.45 	1.11

	1000 	16 	0.44 	1.13

	1000 	32 	0.45 	1.21

	1000 	64 	0.47 	1.35

	1000 	128 	0.48 	1.75

	1000 	256 	0.51 	2.92

	1000 	512 	0.57 	3.41

	1000 	1024 	0.76 	3.85

	1000 	2048 	0.98 	4.27

	1000 	4096 	1.53 	5.14

	1000 	8192 	2.59 	8.04

	1000 	16384 	4.86 	14.34

	1000 	32768 	7.17 	33.92

	640 	65536 	11.65 	43.27

	320 	131072 	20.97 	66.98

	160 	262144 	39.64 	118.58

	80 	524288 	84.91 	224.40

	40 	1048576 	212.76 	461.80

	20 	2097152 	458.55 	1053.67

	10 	4194304 	1738.30 	2649.30


Hopefully the table came out clear.  MVAPICH2 always lags behind by a
considerable amount.  Any insight is much appreciated.  Thanks!


Chris Co


More information about the mvapich-discuss mailing list