[mvapich-discuss] problem in intra-node paralellism with odd MPI procs

Yusuke Tamura tamura at hpc-sol.co.jp
Thu Oct 24 05:49:04 EDT 2013


Hi,

One of our customers encountered with a problem that, on certain 
conditions, his job NEVER ended and reached the batch limit. 
I'm reporting its workaround. 

The condition is;

(1) generate MVAPICH2 by using PGI compiler and 
(2) run with odd number of MPI processes per node.

And workarond is;

(1) use or upgrade PGI compiler to 13.1 or later, currently 13.9,
(2) set CFLAGS to "-O1" or lower for "configure" script.

I hope someone experts on PGI compiler fix this problem.

thanks, Yusuke

Some details follows. Sorry a little long;

a) Other compilers, Intel or GCC, had no problems.

b) Using two nodes if total 6 MPI procs are launched as (4+2), his 
program works fine. But (3+3) gets into trouble.

c) Since I found OMB run with the odd number of MPI procs on single 
node can reproduce the phenomena, I used it for searching its 
workaround.

d) Tested versions are;

mvapich2-1.9a (at the cutomer site)
mvapich2-1.9  (at our office)
mvapich2-2.0a (at our office)

PGI 12.5 (at office, needed "OpenACC" options be OFF in configure)
PGI 13.1 (at the customer site)
PGI 13.9 (at our office)

e) run the configure script in case mvapich2 is 1.9, PGI ver. 13.9, 
and CFLAGS=-O2;

./configure --prefix=${HOME}/opt/mvapich2-1.9-pgi-139-O2 \
 CC=pgcc CFLAGS=-O2 F77=pgf77 FFLAGS=-O3 FC=pgfortran FCFLAGS=-O3

f) run one of the collective OMB, eg. osu_allgather. If the number 
of MPI process is large, the probability to stack looked to increase. 
Below is the case of '-n 3', you might need to repeat some times to 
see the phenomena.

$ cd ${HOME}/opt/mvapich2-1.9a-pgi-139-O2/libexec/mvapich2/
$ time -p ${HOME}/opt/mvapich2-1.9a-pgi-139-O2/bin/mpirun -n 3 \
 ./osu_allgather
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
# OSU MPI Allgather Latency Test
# Size         Avg Latency(us)
1                         1.01
2                         1.01
4                         1.00
8                         0.99
16                        0.97
32                        1.06
64                        1.09
128                       1.20
256                       1.26
512                       1.45
1024                      1.78
2048                      1.58
4096                      2.26
8192                      3.68
16384                     5.99
32768                    12.90
65536                    34.80
^C[mpiexec at poaro01] Sending Ctrl-C to processes as requested
[mpiexec at poaro01] Press Ctrl-C again to force abort
real 24.44
user 73.22
sys 0.05
$

g) conclusion;

I coundn't find any workaround for PGI 12.5. 

For PGI 13.9, and 13.1 at our customer site, CFLAGS=-O1 or lower is 
the workaround regardless the MVAPICH2 versions (1.9, 1.9a, and 2.0a).

-- 
田村 祐介
営業部 計算科学コンサルタント
(株)HPCソリューションズ
 <http://www.hpc-sol.co.jp/> http://www.hpc-sol.co.jp/


More information about the mvapich-discuss mailing list