[mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD

Gregory Bauer gbauer at ncsa.uiuc.edu
Thu May 3 11:45:51 EDT 2007


We are new at running MVAPICH2 and are seeing some scaling issues with 
application start-up on an Intel dual quad-core with Infiniband 
interconnect. Using the typical MPI "hello world' we see that when 
running 512 tasks (nodes=64:ppn=8)  and greater, the application makes 
very slow progress in mpi_init(), unless MV2_ON_DEMAND_THRESHOLD is set 
to a count larger than the task count. So it appears that static 
connection start-up is faster than 'on demand' which is seems 
counter-intuitive. At task counts greater than 512, the 'on demand' 
scheme is too slow to be practical.

We must have something incorrectly configured, but what?

Thanks.
-Greg

stack trace of the hello process
#0  0x0000002a9588a0df in __read_nocancel () from /lib64/tls/libpthread.so.0
#1  0x000000000044f513 in PMIU_readline ()
#2  0x0000000000443e51 in PMI_KVS_Get ()
#3  0x00000000004158b2 in MPIDI_CH3I_SMP_init ()
#4  0x00000000004456f9 in MPIDI_CH3_Init ()
#5  0x000000000042dab6 in MPID_Init ()
#6  0x000000000040f469 in MPIR_Init_thread ()
#7  0x000000000040f0bc in PMPI_Init ()
#8  0x0000000000403347 in main (argc=Variable "argc" is not available.
) at hello.c:17

strace snipet of the associated python process
select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], left 
{5, 0})
recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8
recvfrom(7, "(dp1\nS\'cmd\'\np2\nS\'response_to_pmi"..., 94, 0, NULL, 
NULL) = 94
sendto(8, "00000094(dp1\nS\'cmd\'\np2\nS\'respons"..., 102, 0, NULL, 0) 
= 102
select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], left 
{5, 0})
recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8
recvfrom(7, "(dp1\nS\'cmd\'\npProcess 28455 detached


Configuration information:

$ /usr/local/mvapich2-0.9.8p1-gcc/bin/mpich2version
Version:           MVAPICH2-0.9.8
Device:            osu_ch3:mrail
Configure Options: --prefix=/usr/local/mvapich2-0.9.8p1-gcc 
--with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd 
--enable-romio --without-mpe

$ cat /usr/local/ofed/BUILD_ID
OFED-1.1
openib-1.1 (REV=9905)

$ dmesg | grep ib_
ib_core: no version for "struct_module" found: kernel tainted.
ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca: Initializing 0000:0a:00.0

$ uname -a
Linux honest1 2.6.9-42.0.8.EL_lustre.1.4.9smp #1 SMP Fri Feb 16 01:23:52 
MST 2007 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/*release
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)




More information about the mvapich-discuss mailing list