[mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD

Gregory Bauer gbauer at ncsa.uiuc.edu
Thu May 3 14:55:38 EDT 2007


Qi-

We have rebuilt with the changes you provided.

The time it took for mpiexec to run MPI 'Hello, world' at 1024 cores 
(nodes=128:ppn=8) was approximately 25 minutes. (the mpdboot and 
mpdtrace took only a minute or two each or so).

To compare, with  MV2_ON_DEMAND_THRESHOLD set to 2100, it took only 3 
minutes for mpiexec to complete MPI 'Hello, world'.

The SMP channel is no longer in the picture. Using gdb to take look 
where a MPI process, I see

#0  0x0000002a9588a0df in __read_nocancel () from /lib64/tls/libpthread.so.0
#1  0x0000000000444da3 in PMIU_readline ()
#2  0x000000000043cb81 in PMI_KVS_Get ()
#3  0x0000000000415d9f in MPIDI_CH3I_CM_Init ()
#4  0x000000000043e410 in MPIDI_CH3_Init ()
#5  0x0000000000428926 in MPID_Init ()
#6  0x000000000040eca9 in MPIR_Init_thread ()
#7  0x000000000040ea4d in PMPI_Init ()
#8  0x00000000004030e7 in main (argc=Variable "argc" is not available.

I also have strace output  for that processes , and gdb and strace 
outpout for the associated python process. (I simply attach to the 
processes once in a while to see what is happening).

-Greg


Qi Gao wrote:

> Hi Greg,
>
> Thanks for trying our MVAPICH2 and letting us know the problem. We are 
> glad to work with you to solve the problem.
>
> I assume this stack trace represents the place where most time is 
> spent. And it seems that it's in SMP channel initialization function, 
> and somehow it stucks at the PMI calls to process manager.
>
>> #3  0x00000000004158b2 in MPIDI_CH3I_SMP_init ()
>> #4  0x00000000004456f9 in MPIDI_CH3_Init ()
>
>
> To help us narrow down the problem, would you try to disable SMP 
> channel and try to see whether that will help?
>
> To disable SMP channel, you need to patch the default make script 
> "make.mvapich2.ofa". Here is the patch:
>
> ========
> --- make.mvapich2.ofa.nosmp     2007-05-03 11:44:53.000000000 -0400
> +++ make.mvapich2.ofa   2007-03-08 22:47:23.000000000 -0500
> @@ -129,8 +129,6 @@
>     SHARED_LIBS=""
> fi
>
> +export SMP_FLAG=""
> +
> export LD_LIBRARY_PATH=$OPEN_IB_LIB:$LD_LIBRARY_PATH
> export LIBS=${LIBS:--L${OPEN_IB_LIB} ${BLCR_LIB} ${RDMA_CM_LIBS} 
> -libverbs -libumad -lpthread}
> export FFLAGS=${FFLAGS:--L${OPEN_IB_LIB}}
> ========
>
>
> Thanks!
>
> --Qi




More information about the mvapich-discuss mailing list