[mvapich-discuss] time spent in mpi_init and
MV2_ON_DEMAND_THRESHOLD
Qi Gao
gaoq at cse.ohio-state.edu
Thu May 3 15:04:32 EDT 2007
Hi Greg,
Thanks for verifying this. In both cases, the program blocks at a PMI call
PMI_KVS_Get(). We will look into this problem further and get back to you.
Thanks!
--Qi
----- Original Message -----
From: "Gregory Bauer" <gbauer at ncsa.uiuc.edu>
To: "Qi Gao" <gaoq at cse.ohio-state.edu>
Cc: <mvapich-discuss at cse.ohio-state.edu>
Sent: Thursday, May 03, 2007 2:55 PM
Subject: Re: [mvapich-discuss] time spent in mpi_init and
MV2_ON_DEMAND_THRESHOLD
> Qi-
>
> We have rebuilt with the changes you provided.
>
> The time it took for mpiexec to run MPI 'Hello, world' at 1024 cores
> (nodes=128:ppn=8) was approximately 25 minutes. (the mpdboot and mpdtrace
> took only a minute or two each or so).
>
> To compare, with MV2_ON_DEMAND_THRESHOLD set to 2100, it took only 3
> minutes for mpiexec to complete MPI 'Hello, world'.
>
> The SMP channel is no longer in the picture. Using gdb to take look where
> a MPI process, I see
>
> #0 0x0000002a9588a0df in __read_nocancel () from
> /lib64/tls/libpthread.so.0
> #1 0x0000000000444da3 in PMIU_readline ()
> #2 0x000000000043cb81 in PMI_KVS_Get ()
> #3 0x0000000000415d9f in MPIDI_CH3I_CM_Init ()
> #4 0x000000000043e410 in MPIDI_CH3_Init ()
> #5 0x0000000000428926 in MPID_Init ()
> #6 0x000000000040eca9 in MPIR_Init_thread ()
> #7 0x000000000040ea4d in PMPI_Init ()
> #8 0x00000000004030e7 in main (argc=Variable "argc" is not available.
>
> I also have strace output for that processes , and gdb and strace outpout
> for the associated python process. (I simply attach to the processes once
> in a while to see what is happening).
>
> -Greg
>
>
> Qi Gao wrote:
>
>> Hi Greg,
>>
>> Thanks for trying our MVAPICH2 and letting us know the problem. We are
>> glad to work with you to solve the problem.
>>
>> I assume this stack trace represents the place where most time is spent.
>> And it seems that it's in SMP channel initialization function, and
>> somehow it stucks at the PMI calls to process manager.
>>
>>> #3 0x00000000004158b2 in MPIDI_CH3I_SMP_init ()
>>> #4 0x00000000004456f9 in MPIDI_CH3_Init ()
>>
>>
>> To help us narrow down the problem, would you try to disable SMP channel
>> and try to see whether that will help?
>>
>> To disable SMP channel, you need to patch the default make script
>> "make.mvapich2.ofa". Here is the patch:
>>
>> ========
>> --- make.mvapich2.ofa.nosmp 2007-05-03 11:44:53.000000000 -0400
>> +++ make.mvapich2.ofa 2007-03-08 22:47:23.000000000 -0500
>> @@ -129,8 +129,6 @@
>> SHARED_LIBS=""
>> fi
>>
>> +export SMP_FLAG=""
>> +
>> export LD_LIBRARY_PATH=$OPEN_IB_LIB:$LD_LIBRARY_PATH
>> export LIBS=${LIBS:--L${OPEN_IB_LIB} ${BLCR_LIB}
>> ${RDMA_CM_LIBS} -libverbs -libumad -lpthread}
>> export FFLAGS=${FFLAGS:--L${OPEN_IB_LIB}}
>> ========
>>
>>
>> Thanks!
>>
>> --Qi
>
>
More information about the mvapich-discuss
mailing list