[mvapich-discuss] Problem with mvapich2 on a cluster connected with GigE and IB

Matthew Koop koop at cse.ohio-state.edu
Wed Jul 26 15:04:55 EDT 2006


Salvador,

When running mpdtrace, it will report the hostname of the machine, not the
hostname associated with the IP address MPD is listening on. On our
systems here, I need to run:

mpdboot -n 2 -f hosts --ifhn=d2ib

(where d2ib is a hostname that resolves to the IPoIB interface of the
machine I am running mpdboot on, which is also in the hosts file) You may
not need this parameter. You can verify that things are running over IPoIB
by using the following command on n2 before and after running mpdboot:

netstat -a | grep n1-ib

It is very important to note that changing the interface is not likely to
change performance (or solve your startup problem). MVAPICH2 only uses the
specified interface to exchange a few startup parameters over IP. After
exchanging enough information to startup native IB connections, all
further communication will go over the native IB layer -- not the IPoIB
interface.

What is the problem you are experiencing on startup? That will allow us to
better debug the problem.

Thanks,

Matthew Koop
-
Network-Based Computing Lab
Ohio-State University



On Wed, 26 Jul 2006, Salvador Ramirez wrote:

> Hello,
>
>     I recently downloaded and installed mvapich2 on a
> cluster that has two connections among the nodes: gigabit
> ethernet and infiniband. Each node has then two ip addresses
> (one for each connection of course) related to obvious names
> like n1 and n1-ib, n2 and n2-ib, et-cetera.
>
>     For the compilation I selected VAPI and everything
> compiled without problems, so the successful installation
> was on /usr/local/mvapich2. Then I created the file hostfile
> like this:
>
> n1-ib
> n2-ib
> ...
>
>     and then ran the mpdboot -n 8 -f hostfile. Everything
> fine until here but then when I checked with mpdtrace -l I
> see that the nodes are n1, n2, n3... with the IP address of
> the gigE network. So I wonder why mpd choose this address
> when in the hostfile the names are explicitly listed as
> their corresponding IB address??
>
>     Of course this has further problems since when I try to
> run a mpi program with mpiexec I received error message from
> the vapi library since the address are not over IB.
>
> Any help is very appreciated. Thanks.
>
> ---sram
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at mail.cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list