[mvapich-discuss] [MVAPICH2] 0.9.5 reduce_scatter
Yann K.
yann.kalemkarian at bull.net
Thu Nov 23 09:19:34 EST 2006
Hello everybody,
I was wondering if anybody had problems with IMB reduce scatter tests
using the 0.9.5 mvapich2 library ? Here are the stacks I have when my 16
processes (2x 8) in Reduce_Scatter hang. Things go well on 15 processes,
but hang on 16 processes ? Things hang as well on 16 procs spread on 4
machines.
I have 4 x 8 cores IA64 with 4x DDR Voltaire stuff.
Thanks for the feedback
Yann
10 processes are waiting after having completed the red/scat :
-------------------------------------------------------------------------------------
0x4000000000086531 in MPIDI_CH3I_SMP_read_progress??unw ()
#0 0x4000000000086531 in MPIDI_CH3I_SMP_read_progress??unw ()
#1 0x400000000007ee20 in MPIDI_CH3I_Progress??unw ()
#2 0x4000000000043a80 in MPIC_Sendrecv??unw ()
#3 0x400000000001d320 in PMPI_Barrier??unw ()
#4 0x4000000000004490 in main (argc=-57548248, argv=0x600ffffffc91e1dc)
at IMB.c:277
6 processes are stuck here :
----------------------------------------------
0x4000000000085a10 in MPIDI_CH3I_SMP_write_progress??unw ()
#0 0x4000000000085a10 in MPIDI_CH3I_SMP_write_progress??unw ()
#1 0x400000000007ee40 in MPIDI_CH3I_Progress??unw ()
#2 0x4000000000042e60 in MPIC_Recv??unw ()
#3 0x4000000000041600 in MPIR_Reduce_scatter??unw ()
#4 0x400000000003bee0 in PMPI_Reduce_scatter??unw ()
#5 0x4000000000012320 in IMB_reduce_scatter (c_info=0x6000000000013890,
size=-40771196, n_sample=1000, RUN_MODE=0x600ffffffd91e188,
time=0x600ffffffd91e220) at IMB_reduce_scatter.c:150
#6 0x4000000000004480 in main (argc=-40771032, argv=0x600ffffffd91e1dc)
at IMB.c:273
0x20000000000b2782 in pthread_spin_lock () from /lib/tls/libpthread.so.0
#0 0x20000000000b2782 in pthread_spin_lock () from /lib/tls/libpthread.so.0
#1 0x2000000000d832e0 in mthca_poll_cq (ibcq=0x600000000004f160, ne=1,
wc=0x600ffffffb2adf80) at src/cq.c:472
#2 0x40000000000a2b30 in MPIDI_CH3I_MRAILI_Cq_poll??unw ()
#3 0x40000000000806f0 in MPIDI_CH3I_read_progress??unw ()
#4 0x400000000007ed00 in MPIDI_CH3I_Progress??unw ()
#5 0x4000000000042910 in MPIC_Send??unw ()
#6 0x40000000000410c0 in MPIR_Reduce_scatter??unw ()
#7 0x400000000003bee0 in PMPI_Reduce_scatter??unw ()
#8 0x4000000000012370 in IMB_reduce_scatter (c_info=0x6000000000013890,
size=-81075836, n_sample=1000, RUN_MODE=0x600ffffffb2ae188,
time=0x600ffffffb2ae220) at IMB_reduce_scatter.c:150
#9 0x4000000000004480 in main (argc=-81075672, argv=0x600ffffffb2ae1dc)
at IMB.c:273
--
Yann Kalemkarian
HPC Software Engineer
Open Software R&D
Bull, Architect of an Open World TM
Phone: +33 4 7629 7393
www.bull.com
More information about the mvapich-discuss
mailing list