[mvapich-discuss] Problem with mvapich2 on a cluster connected with GigE and IB

Salvador Ramirez sram at profc.udec.cl
Wed Jul 26 10:42:22 EDT 2006


Hello,

    I recently downloaded and installed mvapich2 on a 
cluster that has two connections among the nodes: gigabit 
ethernet and infiniband. Each node has then two ip addresses 
(one for each connection of course) related to obvious names 
like n1 and n1-ib, n2 and n2-ib, et-cetera.

    For the compilation I selected VAPI and everything 
compiled without problems, so the successful installation 
was on /usr/local/mvapich2. Then I created the file hostfile 
like this:

n1-ib
n2-ib
...

    and then ran the mpdboot -n 8 -f hostfile. Everything 
fine until here but then when I checked with mpdtrace -l I 
see that the nodes are n1, n2, n3... with the IP address of 
the gigE network. So I wonder why mpd choose this address 
when in the hostfile the names are explicitly listed as 
their corresponding IB address??

    Of course this has further problems since when I try to 
run a mpi program with mpiexec I received error message from 
the vapi library since the address are not over IB.

Any help is very appreciated. Thanks.

---sram



More information about the mvapich-discuss mailing list