[mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD

Qi Gao gaoq at cse.ohio-state.edu
Thu May 3 15:04:32 EDT 2007


Hi Greg,

Thanks for verifying this. In both cases, the program blocks at a PMI call 
PMI_KVS_Get(). We will look into this problem further and get back to you.

Thanks!
--Qi

----- Original Message ----- 
From: "Gregory Bauer" <gbauer at ncsa.uiuc.edu>
To: "Qi Gao" <gaoq at cse.ohio-state.edu>
Cc: <mvapich-discuss at cse.ohio-state.edu>
Sent: Thursday, May 03, 2007 2:55 PM
Subject: Re: [mvapich-discuss] time spent in mpi_init and 
MV2_ON_DEMAND_THRESHOLD


> Qi-
>
> We have rebuilt with the changes you provided.
>
> The time it took for mpiexec to run MPI 'Hello, world' at 1024 cores 
> (nodes=128:ppn=8) was approximately 25 minutes. (the mpdboot and mpdtrace 
> took only a minute or two each or so).
>
> To compare, with  MV2_ON_DEMAND_THRESHOLD set to 2100, it took only 3 
> minutes for mpiexec to complete MPI 'Hello, world'.
>
> The SMP channel is no longer in the picture. Using gdb to take look where 
> a MPI process, I see
>
> #0  0x0000002a9588a0df in __read_nocancel () from 
> /lib64/tls/libpthread.so.0
> #1  0x0000000000444da3 in PMIU_readline ()
> #2  0x000000000043cb81 in PMI_KVS_Get ()
> #3  0x0000000000415d9f in MPIDI_CH3I_CM_Init ()
> #4  0x000000000043e410 in MPIDI_CH3_Init ()
> #5  0x0000000000428926 in MPID_Init ()
> #6  0x000000000040eca9 in MPIR_Init_thread ()
> #7  0x000000000040ea4d in PMPI_Init ()
> #8  0x00000000004030e7 in main (argc=Variable "argc" is not available.
>
> I also have strace output  for that processes , and gdb and strace outpout 
> for the associated python process. (I simply attach to the processes once 
> in a while to see what is happening).
>
> -Greg
>
>
> Qi Gao wrote:
>
>> Hi Greg,
>>
>> Thanks for trying our MVAPICH2 and letting us know the problem. We are 
>> glad to work with you to solve the problem.
>>
>> I assume this stack trace represents the place where most time is spent. 
>> And it seems that it's in SMP channel initialization function, and 
>> somehow it stucks at the PMI calls to process manager.
>>
>>> #3  0x00000000004158b2 in MPIDI_CH3I_SMP_init ()
>>> #4  0x00000000004456f9 in MPIDI_CH3_Init ()
>>
>>
>> To help us narrow down the problem, would you try to disable SMP channel 
>> and try to see whether that will help?
>>
>> To disable SMP channel, you need to patch the default make script 
>> "make.mvapich2.ofa". Here is the patch:
>>
>> ========
>> --- make.mvapich2.ofa.nosmp     2007-05-03 11:44:53.000000000 -0400
>> +++ make.mvapich2.ofa   2007-03-08 22:47:23.000000000 -0500
>> @@ -129,8 +129,6 @@
>>     SHARED_LIBS=""
>> fi
>>
>> +export SMP_FLAG=""
>> +
>> export LD_LIBRARY_PATH=$OPEN_IB_LIB:$LD_LIBRARY_PATH
>> export LIBS=${LIBS:--L${OPEN_IB_LIB} ${BLCR_LIB} 
>> ${RDMA_CM_LIBS} -libverbs -libumad -lpthread}
>> export FFLAGS=${FFLAGS:--L${OPEN_IB_LIB}}
>> ========
>>
>>
>> Thanks!
>>
>> --Qi
>
> 



More information about the mvapich-discuss mailing list