[mvapich-discuss] problem in intra-node paralellism with odd MPI procs
Yusuke Tamura
tamura at hpc-sol.co.jp
Thu Oct 24 08:06:29 EDT 2013
Jeff,
thank you very much for your suggestion!
I admitt to say that I'm green using MPI and GDB, it may take time
to begin. Anyway, I have made a sudden check for disabling FORTRAN,
but the situation is the same.
best regards,
Yusuke
[tamura at poaro01 mvapich2-2.0a]$ head 01-configure.sh
#!/bin/bash
# CFLAGS=-O0, CFLAGS=-O1 are OK
./configure --prefix=/home/tamura/opt/mvapich2-2.0a-pgi-139-test \
CC=pgcc CFLAGS=-O F77=false FC=false \
--disable-f77 \
--disable-fc \
exit
[tamura at poaro01 mvapich2-2.0a]$ (make clean;make distclean; \
./01-configure.sh >& 01-configure.sh-test-log.txt )&
----
[tamura at poaro01 mvapich2-2.0a]$ nohup make -j 16 &
[1] 25970
[tamura at poaro01 mvapich2-2.0a]$ nohup: ignoring input and appending
output to `nohup.out'
[tamura at poaro01 mvapich2-2.0a]$ nohup make install &
[1] 28809
[tamura at poaro01 mvapich2-2.0a]$ nohup: ignoring input and appending
output to `nohup.out'
[tamura at poaro01 mvapich2-2.0a]$ cd ~/opt/mvapich2-2.0a-pgi-139-test/
libexec/mvapich2/
[tamura at poaro01 mvapich2]$ time -p ${HOME}/opt/mvapich2-2.0a-pgi-139-
test/bin/mpirun -n 3 ./osu_allgather
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
# OSU MPI Allgather Latency Test
# Size Avg Latency(us)
1 1.02
2 1.04
4 1.02
8 0.99
16 1.00
32 1.05
64 1.08
128 1.19
256 1.27
512 1.43
1024 1.76
2048 1.58
4096 2.29
8192 3.75
16384 6.03
32768 12.91
65536 35.11
131072 56.84
262144 114.53
524288 219.16
^C[mpiexec at poaro01] Sending Ctrl-C to processes as requested
[mpiexec at poaro01] Press Ctrl-C again to force abort
real 24.21
user 72.51
sys 0.05
[tamura at poaro01 mvapich2]$
>If the bug appears with PGI but not Intel or GCC, the problem is the
>compiler and the appropriate support team to contact is PGI.
>
>You might try running the hanging tests in GDB (you can do this in
>parallel using xterm; the internet has details) and running "bt" after
>you Ctrl-C to see exactly where it stalled.
>
>You might also try after completely disabling Fortran (FC=false
>F77=false --disable-fc --disable-f77). We observed some strange
>issues in MPICH with PGF a while back.
>
>Jeff
>
>On Thu, Oct 24, 2013 at 4:49 AM, Yusuke Tamura <tamura at hpc-sol.co.jp> wrote:
>> Hi,
>>
>> One of our customers encountered with a problem that, on certain
>> conditions, his job NEVER ended and reached the batch limit.
>> I'm reporting its workaround.
>>
>> The condition is;
>>
>> (1) generate MVAPICH2 by using PGI compiler and
>> (2) run with odd number of MPI processes per node.
>>
>> And workarond is;
>>
>> (1) use or upgrade PGI compiler to 13.1 or later, currently 13.9,
>> (2) set CFLAGS to "-O1" or lower for "configure" script.
>>
>> I hope someone experts on PGI compiler fix this problem.
>>
>> thanks, Yusuke
>>
>> Some details follows. Sorry a little long;
>>
>> a) Other compilers, Intel or GCC, had no problems.
>>
>> b) Using two nodes if total 6 MPI procs are launched as (4+2), his
>> program works fine. But (3+3) gets into trouble.
>>
>> c) Since I found OMB run with the odd number of MPI procs on single
>> node can reproduce the phenomena, I used it for searching its
>> workaround.
>>
>> d) Tested versions are;
>>
>> mvapich2-1.9a (at the cutomer site)
>> mvapich2-1.9 (at our office)
>> mvapich2-2.0a (at our office)
>>
>> PGI 12.5 (at office, needed "OpenACC" options be OFF in configure)
>> PGI 13.1 (at the customer site)
>> PGI 13.9 (at our office)
>>
>> e) run the configure script in case mvapich2 is 1.9, PGI ver. 13.9,
>> and CFLAGS=-O2;
>>
>> ./configure --prefix=${HOME}/opt/mvapich2-1.9-pgi-139-O2 \
>> CC=pgcc CFLAGS=-O2 F77=pgf77 FFLAGS=-O3 FC=pgfortran FCFLAGS=-O3
>>
>> f) run one of the collective OMB, eg. osu_allgather. If the number
>> of MPI process is large, the probability to stack looked to increase.
>> Below is the case of '-n 3', you might need to repeat some times to
>> see the phenomena.
>>
>> $ cd ${HOME}/opt/mvapich2-1.9a-pgi-139-O2/libexec/mvapich2/
>> $ time -p ${HOME}/opt/mvapich2-1.9a-pgi-139-O2/bin/mpirun -n 3 \
>> ./osu_allgather
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> # OSU MPI Allgather Latency Test
>> # Size Avg Latency(us)
>> 1 1.01
>> 2 1.01
>> 4 1.00
>> 8 0.99
>> 16 0.97
>> 32 1.06
>> 64 1.09
>> 128 1.20
>> 256 1.26
>> 512 1.45
>> 1024 1.78
>> 2048 1.58
>> 4096 2.26
>> 8192 3.68
>> 16384 5.99
>> 32768 12.90
>> 65536 34.80
>> ^C[mpiexec at poaro01] Sending Ctrl-C to processes as requested
>> [mpiexec at poaro01] Press Ctrl-C again to force abort
>> real 24.44
>> user 73.22
>> sys 0.05
>> $
>>
>> g) conclusion;
>>
>> I coundn't find any workaround for PGI 12.5.
>>
>> For PGI 13.9, and 13.1 at our customer site, CFLAGS=-O1 or lower is
>> the workaround regardless the MVAPICH2 versions (1.9, 1.9a, and 2.0a).
>>
>> --
>> 田村 祐介
>> 営業部 計算科学コンサルタント
>> (株)HPCソリューションズ
>> <http://www.hpc-sol.co.jp/> http://www.hpc-sol.co.jp/
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>--
>Jeff Hammond
>jeff.science at gmail.com
More information about the mvapich-discuss
mailing list