[mvapich-discuss] dynamic process connections (accept/connect or MPI_Comm_join) and Infiniband...

Jaidev Sridhar sridharj at cse.ohio-state.edu
Wed Apr 16 14:47:32 EDT 2008


Leon,

I believe you missed Dr. Panda's earlier response
(http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-April/001568.html). I've attached his response. Do let us know if have any other queries.

Thanks,
Jaidev
 
On Wed, 2008-04-16 at 10:06 +1000, leon zadorin wrote:
> Hello everyone,
> 
> It had occurred to me that my original post of my questions (to
> mvapich-commit) would be better if emailed to this list instead... so
> here it is...
> 
> I am relatively new to the whole MPI/Infiniband scene, so my apologies
> if some of the questions/thoughts of mine are naive...
> 
> I am currently experiencing difficulties with dynamic process
> connections (MPI_Comm_join) between 2 hosts (each with Infiniband and
> Ethernet card).
> 
> The setup is:
> 2 hosts, each with Ethernet (Gigabit) card and with Infiniband card
> (PCI-e), running Linux 32 bits (AMD arch).
> Hosts are connected via Infiniband switch (w.r.t Infiniband cards) and
> via Ethernet/IP network (w.r.t. Ethernet cards).
> mvapich2 has been made with "make.mvapich2.ofa"
> mpdboot has been executed and mpd daemons are running on both hosts
> 
> I would like to know if it is currently possible to achieve the following:
> 
> 1) start 1 app on 1 host (without using mpirun);
> 2) then later, after some time, start another app on 2nd host (without
> using mpirun);
> 3) make the app in step 2 automatically connect to the app started in step 1
> 
> I was able to achieve the above when running with mpich2 library,
> using sock channels and only when using 'MPI_Comm_join' call (using
> MPI_Publish_name, etc. did not work when starting apps without mpirun
> [even with all mpds being active]).
> 
> However, the MPI_Comm_join tactic fails when attempting to use
> mvapich2 (mvapich2-1.0-2008-04-10) over Infiniband... I wonder if the
> following has something to do with it:
> http://lists.openfabrics.org/pipermail/commits/2006-January/004707.html
> "
> --------------------------------------------------------------------------------
> -                              Known Deficiencies
> --------------------------------------------------------------------------------
> -
> -- The sock channel is the only channel that implements dynamic process support
> -  (i.e., MPI_COMM_SPAWN, MPI_COMM_CONNECT, MPI_COMM_ACCEPT, etc.).  All other
> -  channels will experience failures for tests exercising dynamic process
> -  functionality.
> "
> and in http://lists.openfabrics.org/pipermail/commits/2006-May/007209.html
> we have:
> "
> -- MPI_COMM_JOIN has been implemented; although like the other dynamic process
> -  routines, it is only supported by the Sock channel.
> "
> 
> Given that above quotes mentioned both the MPI_Comm_join and
> MPI_Comm_connect ... is there any way at all to currently achieve the
> above 3 steps when using Infiniband cards (and may be having Ethernet
> cards on all of the hosts as well)?
> 
> I would imagine that, albeit theoretically, it is plausible to use
> sock channel to 'bootstrap' the Infiniband channel?
> http://www.mpi-forum.org/docs/mpi-20-html/node115.htm
> "
> MPI uses the socket to bootstrap creation of the intercommunicator,
> and for nothing else.
> "
> 
> Perhaps I need to build mvapich2 not via "make.make.mvapich2.ofa" but
> something else so that both: socket and infiniband channels are
> supported?
> 
> Of course the same aforementioned link
> (http://www.mpi-forum.org/docs/mpi-20-html/node115.htm)
> says:
> "
>  Advice to users. An MPI implementation may require a specific
> communication medium for MPI communication, such as a shared memory
> segment or a special switch. In this case, it may not be possible for
> two processes to successfully join even if there is a socket
> connecting them and they are using the same MPI implementation. ( End
> of advice to users.)
> "
> 
> If this is the case here and there is no way to use MPI_Comm_join to
> achieve the originally described 3 steps (connecting apps started at
> different times and without the use of mpirun) - is that then at all
> possible (e.g. using MPI's open port, publish name, lookup name,
> accept/connect calls)? Are the limitations purely theoretical or more
> of a practical nature?
> 
> Ideally, for async. server design purposes and, given that
> MPI_Comm_accept is blocking and there is no 'test'/'poll' for it, it
> would be good to be able to use sockets channel to coordinate
> infiniband channel bootstrapping with MPI_Comm_join (even if
> MPI_Comm_join in itself is blocking, at least one can 'poll' for the
> TCP's socket's fd before calling 'accept' and subsequently
> MPI_Comm_join)...
> 
> If mvapich2 is unable to provide dynamic process connectivity over
> Infiniband... are there any other libs that could do that?
> 
> Kind regards
> Leon.
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
-------------- next part --------------
An embedded message was scrubbed...
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
Subject: [mvapich-discuss] [mvapich-commit] dynamic process connections
	(accept/connect or MPI_Comm_join) and Infiniband... (fwd)
Date: Tue, 15 Apr 2008 08:52:47 -0400 (EDT)
Size: 8794
Url: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080416/3daf718f/attachment.mht


More information about the mvapich-discuss mailing list