[mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD

Gregory Bauer gbauer at ncsa.uiuc.edu
Thu Jun 28 11:13:07 EDT 2007


Qi-

Is the patch that you provided that fixed the start-up scaling issue in 
one of the more recent bundles that are provided at
http://mvapich.cse.ohio-state.edu/nightly/mvapich2/branches/0.9.8/ ?

You had us apply the patch to mvapich2-0.9.8p2.tar.gz but could we apply 
it to mvapich2-0.9.8-2007-06-26.tar.gz for example.

We would like to be at the most recent version of mvapich2 before we 
'freeze' the software stack.

-Greg

Gregory Bauer wrote:

> Qi-
>
> Thanks for the patch. I rebuilt mvapich2-0.9.8p2 with the patch applied.
>
> Testing up to 4096 tasks looks good. I think the issue is solved.
>
> -Greg
>
> Qi Gao wrote:
>
>> Hi Greg,
>>
>> We've been looking into this problem and found that the slow startup 
>> in 'on
>> demand' case is because in that case the MPI program uses too
>> much of PMI interface to exchange startup information, while in the
>> 'all-to-all' case an advanced protocol using InfiniBand to exchange 
>> startup
>> information is used.
>>
>> I've ported this protocol to the 'on demand' case. The patch against
>> mvapich2-0.9.8p2 is in the attachment. Now it should not performance 
>> worse
>> than 'all-to-all' case.
>>
>> Let me know if you have any more problems.
>>
>> Regards,
>> --Qi
>>
>> ----- Original Message ----- From: "Gregory Bauer" 
>> <gbauer at ncsa.uiuc.edu>
>> To: <mvapich-discuss at cse.ohio-state.edu>
>> Sent: Thursday, May 03, 2007 11:45 AM
>> Subject: [mvapich-discuss] time spent in mpi_init and
>> MV2_ON_DEMAND_THRESHOLD
>>
>>
>>> We are new at running MVAPICH2 and are seeing some scaling issues with
>>> application start-up on an Intel dual quad-core with Infiniband
>>> interconnect. Using the typical MPI "hello world' we see that when 
>>> running
>>> 512 tasks (nodes=64:ppn=8)  and greater, the application makes very 
>>> slow
>>> progress in mpi_init(), unless MV2_ON_DEMAND_THRESHOLD is set to a 
>>> count
>>> larger than the task count. So it appears that static connection 
>>> start-up
>>> is faster than 'on demand' which is seems counter-intuitive. At task
>>> counts greater than 512, the 'on demand' scheme is too slow to be
>>> practical.
>>>
>>> We must have something incorrectly configured, but what?
>>>
>>> Thanks.
>>> -Greg
>>>
>>> stack trace of the hello process
>>> #0  0x0000002a9588a0df in __read_nocancel () from
>>> /lib64/tls/libpthread.so.0
>>> #1  0x000000000044f513 in PMIU_readline ()
>>> #2  0x0000000000443e51 in PMI_KVS_Get ()
>>> #3  0x00000000004158b2 in MPIDI_CH3I_SMP_init ()
>>> #4  0x00000000004456f9 in MPIDI_CH3_Init ()
>>> #5  0x000000000042dab6 in MPID_Init ()
>>> #6  0x000000000040f469 in MPIR_Init_thread ()
>>> #7  0x000000000040f0bc in PMPI_Init ()
>>> #8  0x0000000000403347 in main (argc=Variable "argc" is not available.
>>> ) at hello.c:17
>>>
>>> strace snipet of the associated python process
>>> select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], 
>>> left {5,
>>> 0})
>>> recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8
>>> recvfrom(7, "(dp1\nS\'cmd\'\np2\nS\'response_to_pmi"..., 94, 0, NULL,
>>> NULL) = 94
>>> sendto(8, "00000094(dp1\nS\'cmd\'\np2\nS\'respons"..., 102, 0, NULL, 
>>> 0) =
>>> 102
>>> select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], 
>>> left {5,
>>> 0})
>>> recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8
>>> recvfrom(7, "(dp1\nS\'cmd\'\npProcess 28455 detached
>>>
>>>
>>> Configuration information:
>>>
>>> $ /usr/local/mvapich2-0.9.8p1-gcc/bin/mpich2version
>>> Version:           MVAPICH2-0.9.8
>>> Device:            osu_ch3:mrail
>>> Configure
>>> Options: --prefix=/usr/local/mvapich2-0.9.8p1-gcc 
>>> --with-device=osu_ch3:mrail
>>>  --with-rdma=gen2 --with-pm=mpd --enable-romio --without-mpe
>>>
>>> $ cat /usr/local/ofed/BUILD_ID
>>> OFED-1.1
>>> openib-1.1 (REV=9905)
>>>
>>> $ dmesg | grep ib_
>>> ib_core: no version for "struct_module" found: kernel tainted.
>>> ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
>>> ib_mthca: Initializing 0000:0a:00.0
>>>
>>> $ uname -a
>>> Linux honest1 2.6.9-42.0.8.EL_lustre.1.4.9smp #1 SMP Fri Feb 16 
>>> 01:23:52
>>> MST 2007 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> $ cat /etc/*release
>>> Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>
>

-- 
-Greg Bauer

ph: (217) 333-2754                                                              
email: gbauer at ncsa.uiuc.edu                                                     

Performance Engineering and Computational Methods Group
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign



More information about the mvapich-discuss mailing list