[mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD

Qi Gao gaoq at cse.ohio-state.edu
Thu Jun 28 11:44:05 EDT 2007


Greg,

That patch should also apply to mvapich2-0.9.8-2007-06-26.tar.gz or later
mvapich2-0.9.8 daily tarball

Regards,
--Qi

----- Original Message ----- 
From: "Gregory Bauer" <gbauer at ncsa.uiuc.edu>
To: "Qi Gao" <gaoq at cse.ohio-state.edu>
Cc: <mvapich-discuss at cse.ohio-state.edu>
Sent: Thursday, June 28, 2007 10:13 AM
Subject: Re: [mvapich-discuss] time spent in mpi_init and
MV2_ON_DEMAND_THRESHOLD


> Qi-
>
> Is the patch that you provided that fixed the start-up scaling issue in
> one of the more recent bundles that are provided at
> http://mvapich.cse.ohio-state.edu/nightly/mvapich2/branches/0.9.8/ ?
>
> You had us apply the patch to mvapich2-0.9.8p2.tar.gz but could we apply
> it to mvapich2-0.9.8-2007-06-26.tar.gz for example.
>
> We would like to be at the most recent version of mvapich2 before we
> 'freeze' the software stack.
>
> -Greg
>
> Gregory Bauer wrote:
>
>> Qi-
>>
>> Thanks for the patch. I rebuilt mvapich2-0.9.8p2 with the patch applied.
>>
>> Testing up to 4096 tasks looks good. I think the issue is solved.
>>
>> -Greg
>>
>> Qi Gao wrote:
>>
>>> Hi Greg,
>>>
>>> We've been looking into this problem and found that the slow startup in
>>> 'on
>>> demand' case is because in that case the MPI program uses too
>>> much of PMI interface to exchange startup information, while in the
>>> 'all-to-all' case an advanced protocol using InfiniBand to exchange
>>> startup
>>> information is used.
>>>
>>> I've ported this protocol to the 'on demand' case. The patch against
>>> mvapich2-0.9.8p2 is in the attachment. Now it should not performance
>>> worse
>>> than 'all-to-all' case.
>>>
>>> Let me know if you have any more problems.
>>>
>>> Regards,
>>> --Qi
>>>
>>> ----- Original Message ----- From: "Gregory Bauer"
>>> <gbauer at ncsa.uiuc.edu>
>>> To: <mvapich-discuss at cse.ohio-state.edu>
>>> Sent: Thursday, May 03, 2007 11:45 AM
>>> Subject: [mvapich-discuss] time spent in mpi_init and
>>> MV2_ON_DEMAND_THRESHOLD
>>>
>>>
>>>> We are new at running MVAPICH2 and are seeing some scaling issues with
>>>> application start-up on an Intel dual quad-core with Infiniband
>>>> interconnect. Using the typical MPI "hello world' we see that when
>>>> running
>>>> 512 tasks (nodes=64:ppn=8)  and greater, the application makes very
>>>> slow
>>>> progress in mpi_init(), unless MV2_ON_DEMAND_THRESHOLD is set to a
>>>> count
>>>> larger than the task count. So it appears that static connection
>>>> start-up
>>>> is faster than 'on demand' which is seems counter-intuitive. At task
>>>> counts greater than 512, the 'on demand' scheme is too slow to be
>>>> practical.
>>>>
>>>> We must have something incorrectly configured, but what?
>>>>
>>>> Thanks.
>>>> -Greg
>>>>
>>>> stack trace of the hello process
>>>> #0  0x0000002a9588a0df in __read_nocancel () from
>>>> /lib64/tls/libpthread.so.0
>>>> #1  0x000000000044f513 in PMIU_readline ()
>>>> #2  0x0000000000443e51 in PMI_KVS_Get ()
>>>> #3  0x00000000004158b2 in MPIDI_CH3I_SMP_init ()
>>>> #4  0x00000000004456f9 in MPIDI_CH3_Init ()
>>>> #5  0x000000000042dab6 in MPID_Init ()
>>>> #6  0x000000000040f469 in MPIR_Init_thread ()
>>>> #7  0x000000000040f0bc in PMPI_Init ()
>>>> #8  0x0000000000403347 in main (argc=Variable "argc" is not available.
>>>> ) at hello.c:17
>>>>
>>>> strace snipet of the associated python process
>>>> select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], left
>>>> {5,
>>>> 0})
>>>> recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8
>>>> recvfrom(7, "(dp1\nS\'cmd\'\np2\nS\'response_to_pmi"..., 94, 0, NULL,
>>>> NULL) = 94
>>>> sendto(8, "00000094(dp1\nS\'cmd\'\np2\nS\'respons"..., 102, 0, NULL, 0)
>>>> =
>>>> 102
>>>> select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], left
>>>> {5,
>>>> 0})
>>>> recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8
>>>> recvfrom(7, "(dp1\nS\'cmd\'\npProcess 28455 detached
>>>>
>>>>
>>>> Configuration information:
>>>>
>>>> $ /usr/local/mvapich2-0.9.8p1-gcc/bin/mpich2version
>>>> Version:           MVAPICH2-0.9.8
>>>> Device:            osu_ch3:mrail
>>>> Configure
>>>> Options: --prefix=/usr/local/mvapich2-0.9.8p1-gcc --with-device=osu_ch3:mrail
>>>>  --with-rdma=gen2 --with-pm=mpd --enable-romio --without-mpe
>>>>
>>>> $ cat /usr/local/ofed/BUILD_ID
>>>> OFED-1.1
>>>> openib-1.1 (REV=9905)
>>>>
>>>> $ dmesg | grep ib_
>>>> ib_core: no version for "struct_module" found: kernel tainted.
>>>> ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
>>>> ib_mthca: Initializing 0000:0a:00.0
>>>>
>>>> $ uname -a
>>>> Linux honest1 2.6.9-42.0.8.EL_lustre.1.4.9smp #1 SMP Fri Feb 16
>>>> 01:23:52
>>>> MST 2007 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> $ cat /etc/*release
>>>> Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>
>>
>
> -- 
> -Greg Bauer
>
> ph: (217) 333-2754
> email: gbauer at ncsa.uiuc.edu
> Performance Engineering and Computational Methods Group
> National Center for Supercomputing Applications
> University of Illinois at Urbana-Champaign
>
>



More information about the mvapich-discuss mailing list