[mvapich-discuss] problem in intra-node paralellism with odd MPI procs

Thu Oct 24 08:06:29 EDT 2013

Jeff, 

thank you very much for your suggestion! 

I admitt to say that I'm green using MPI and GDB, it may take time 
to begin. Anyway, I have made a sudden check for disabling FORTRAN, 
but the situation is the same. 

best regards,
Yusuke 

[tamura at poaro01 mvapich2-2.0a]$ head 01-configure.sh
#!/bin/bash

# CFLAGS=-O0, CFLAGS=-O1 are OK

./configure --prefix=/home/tamura/opt/mvapich2-2.0a-pgi-139-test \
 CC=pgcc CFLAGS=-O F77=false FC=false \
 --disable-f77 \
 --disable-fc \

exit
[tamura at poaro01 mvapich2-2.0a]$ (make clean;make distclean; \
./01-configure.sh >& 01-configure.sh-test-log.txt )&
 ----
[tamura at poaro01 mvapich2-2.0a]$ nohup make -j 16 &
[1] 25970
[tamura at poaro01 mvapich2-2.0a]$ nohup: ignoring input and appending 
output to `nohup.out'
[tamura at poaro01 mvapich2-2.0a]$ nohup make install &
[1] 28809
[tamura at poaro01 mvapich2-2.0a]$ nohup: ignoring input and appending 
output to `nohup.out'
[tamura at poaro01 mvapich2-2.0a]$ cd ~/opt/mvapich2-2.0a-pgi-139-test/
libexec/mvapich2/
[tamura at poaro01 mvapich2]$ time -p ${HOME}/opt/mvapich2-2.0a-pgi-139-
test/bin/mpirun -n 3 ./osu_allgather
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
# OSU MPI Allgather Latency Test
# Size         Avg Latency(us)
1                         1.02
2                         1.04
4                         1.02
8                         0.99
16                        1.00
32                        1.05
64                        1.08
128                       1.19
256                       1.27
512                       1.43
1024                      1.76
2048                      1.58
4096                      2.29
8192                      3.75
16384                     6.03
32768                    12.91
65536                    35.11
131072                   56.84
262144                  114.53
524288                  219.16
^C[mpiexec at poaro01] Sending Ctrl-C to processes as requested
[mpiexec at poaro01] Press Ctrl-C again to force abort
real 24.21
user 72.51
sys 0.05
[tamura at poaro01 mvapich2]$

>If the bug appears with PGI but not Intel or GCC, the problem is the
>compiler and the appropriate support team to contact is PGI.
>
>You might try running the hanging tests in GDB (you can do this in
>parallel using xterm; the internet has details) and running "bt" after
>you Ctrl-C to see exactly where it stalled.
>
>You might also try after completely disabling Fortran (FC=false
>F77=false --disable-fc --disable-f77).  We observed some strange
>issues in MPICH with PGF a while back.
>
>Jeff
>
>On Thu, Oct 24, 2013 at 4:49 AM, Yusuke Tamura <tamura at hpc-sol.co.jp> wrote:
>> Hi,
>>
>> One of our customers encountered with a problem that, on certain
>> conditions, his job NEVER ended and reached the batch limit.
>> I'm reporting its workaround.
>>
>> The condition is;
>>
>> (1) generate MVAPICH2 by using PGI compiler and
>> (2) run with odd number of MPI processes per node.
>>
>> And workarond is;
>>
>> (1) use or upgrade PGI compiler to 13.1 or later, currently 13.9,
>> (2) set CFLAGS to "-O1" or lower for "configure" script.
>>
>> I hope someone experts on PGI compiler fix this problem.
>>
>> thanks, Yusuke
>>
>> Some details follows. Sorry a little long;
>>
>> a) Other compilers, Intel or GCC, had no problems.
>>
>> b) Using two nodes if total 6 MPI procs are launched as (4+2), his
>> program works fine. But (3+3) gets into trouble.
>>
>> c) Since I found OMB run with the odd number of MPI procs on single
>> node can reproduce the phenomena, I used it for searching its
>> workaround.
>>
>> d) Tested versions are;
>>
>> mvapich2-1.9a (at the cutomer site)
>> mvapich2-1.9  (at our office)
>> mvapich2-2.0a (at our office)
>>
>> PGI 12.5 (at office, needed "OpenACC" options be OFF in configure)
>> PGI 13.1 (at the customer site)
>> PGI 13.9 (at our office)
>>
>> e) run the configure script in case mvapich2 is 1.9, PGI ver. 13.9,
>> and CFLAGS=-O2;
>>
>> ./configure --prefix=${HOME}/opt/mvapich2-1.9-pgi-139-O2 \
>>  CC=pgcc CFLAGS=-O2 F77=pgf77 FFLAGS=-O3 FC=pgfortran FCFLAGS=-O3
>>
>> f) run one of the collective OMB, eg. osu_allgather. If the number
>> of MPI process is large, the probability to stack looked to increase.
>> Below is the case of '-n 3', you might need to repeat some times to
>> see the phenomena.
>>
>> $ cd ${HOME}/opt/mvapich2-1.9a-pgi-139-O2/libexec/mvapich2/
>> $ time -p ${HOME}/opt/mvapich2-1.9a-pgi-139-O2/bin/mpirun -n 3 \
>>  ./osu_allgather
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> # OSU MPI Allgather Latency Test
>> # Size         Avg Latency(us)
>> 1                         1.01
>> 2                         1.01
>> 4                         1.00
>> 8                         0.99
>> 16                        0.97
>> 32                        1.06
>> 64                        1.09
>> 128                       1.20
>> 256                       1.26
>> 512                       1.45
>> 1024                      1.78
>> 2048                      1.58
>> 4096                      2.26
>> 8192                      3.68
>> 16384                     5.99
>> 32768                    12.90
>> 65536                    34.80
>> ^C[mpiexec at poaro01] Sending Ctrl-C to processes as requested
>> [mpiexec at poaro01] Press Ctrl-C again to force abort
>> real 24.44
>> user 73.22
>> sys 0.05
>> $
>>
>> g) conclusion;
>>
>> I coundn't find any workaround for PGI 12.5.
>>
>> For PGI 13.9, and 13.1 at our customer site, CFLAGS=-O1 or lower is
>> the workaround regardless the MVAPICH2 versions (1.9, 1.9a, and 2.0a).
>>
>> --
>> 田村 祐介
>> 営業部　計算科学コンサルタント
>> （株）HPCソリューションズ
>>  <http://www.hpc-sol.co.jp/> http://www.hpc-sol.co.jp/
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>-- 
>Jeff Hammond
>jeff.science at gmail.com