[mvapich-discuss] Does mvapich2 really support dynamic process management?

马凯 makailove123 at 163.com
Mon Apr 20 03:16:34 EDT 2015


Hi, Hari!
    I have installed the MVAPICH2-2.1GA, and the problem changed.
    1. Output of mpiname -a:
MVAPICH2 2.1 Fri Apr 03 20:00:00 EDT 2015 ch3:mrail


Compilation
CC: gcc    -DNDEBUG -DNVALGRIND -O2
CXX: g++   -DNDEBUG -DNVALGRIND -O2
F77:  -L/lib -L/lib  
FC:   


Configuration
--enable-cuda --with-cuda=/usr/local/cuda --disable-fortran


    2. Run commands:
mpirun_rsh -np 4 -hostfile hf MV2_SUPPORT_DPM=1 MV2_SMP_USE_CMA=0 ./parent
    ( If no MV2_SMP_USE_CMA=0, MPI_Init would fail. )
    The content of hf is 192.168.2.1, which is the IP address of the IB port on the node.


    3. When i run the parent, the MPI_Comm_spawn could return successfully and the child could get a none-null communicator of it's parent.
    But, when MPI_Send was called on child to send messages to it's parent, the function would never return and it's parent would not receive it.
    The same situation happened when using port communication.
    The MPI_Open_port, MPI_Comm_accept and MPI_Comm_connect all returned MPI_SUCCESS, but messages could not be send and receive successfully, too.


    It's so weird! Could you give me some help?
    Thanks!





At 2015-04-19 23:08:34, "Hari Subramoni" <subramoni.1 at osu.edu> wrote:

Hello,


MVAPICH2 does indeed support Dynamic Process Management (DPM) :-).


We had fixed some issues with DPM in the MVAPICH2-2.1GA version. So, can you please try with MVAPICH2-2.1GA version and see if the issue persists? You can download this from the following location http://mvapich.cse.ohio-state.edu/downloads/


Can you also try launching your application with mpiexec instead of mpirun_rsh? Please refer to the following section of the userguide for more information


http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-320005.2.2



Regards,
Hari.


On Sun, Apr 19, 2015 at 10:54 AM, 马凯 <makailove123 at 163.com> wrote:

I tried the examples of parent and child in the mvapich2-2.1rc2/examples, but they didn't work.
This is my compile commands:
#mpicc -o parent parent.c
#mpicc -o child child.c


This is my run command:
#mpirun_rsh -np 4 -hostfile hf MV2_SUPPORT_DPM=1 ./parent


And the content of hf is: 192.168.2.1
This is the IP of IB port on my node.


After run the command, the parent will be stuck. Then I added some output tips, and I found the MPI_Comm_spawn would never return.
It seems that the child had been start, but the MPI_Init in child would not return.


Why this happened? Does mvapich2 really support dynamic process management or not?
Could some one give some help?
Thanks



_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150420/60af35c8/attachment.html>


More information about the mvapich-discuss mailing list