[mvapich-discuss] Problem with mvapich2 on a cluster connected with GigE and IB

Salvador Ramirez sram at profc.udec.cl
Wed Jul 26 16:23:17 EDT 2006


Matthew,

   Thanks for your answer. Here is the output when I run the 
"cpi" program example that comes with the distribution:

---------------------
~/mvapich2-0.9.3/examples> mpdtrace -l
newen_2574 (192.168.0.250)   <--- GigE IP addressess
n2_32945 (192.168.0.2)
n3_32932 (192.168.0.3)
n4_32923 (192.168.0.4)
n1_33014 (192.168.0.1)
n7_32909 (192.168.0.7)
n8_32909 (192.168.0.8)
n6_32911 (192.168.0.6)

~/mvapich2-0.9.3/examples> mpiexec -n 8 ./cpi
sched_setaffinity: Bad address
sched_setaffinity: Bad address
sched_setaffinity: Bad address
sched_setaffinity: Bad address
sched_setaffinity: Bad address
sched_setaffinity: Bad address
sched_setaffinity: Bad address
sched_setaffinity: Bad address
Process 0 of 8 is on newen
Process 4 of 8 is on n1
Process 1 of 8 is on n2
Process 3 of 8 is on n4
Process 2 of 8 is on n3
Process 6 of 8 is on n8
Process 5 of 8 is on n7
Process 7 of 8 is on n6
pi is approximately 3.1415926544231247, Error is 
0.0000000008333316
wall clock time = 0.111762
----------------------

   After what you said I think I should just ignore the 
first messages and realize that mvapich is actually working 
on all of the nodes, right? anyway what means that error 
messages? I've googled it but I just found only discussions 
about kernel development.

Thanks again.
Best regards,

---sram

Matthew Koop wrote:
> Salvador,
> 
> When running mpdtrace, it will report the hostname of the machine, not the
> hostname associated with the IP address MPD is listening on. On our
> systems here, I need to run:
> 
> mpdboot -n 2 -f hosts --ifhn=d2ib
> 
> (where d2ib is a hostname that resolves to the IPoIB interface of the
> machine I am running mpdboot on, which is also in the hosts file) You may
> not need this parameter. You can verify that things are running over IPoIB
> by using the following command on n2 before and after running mpdboot:
> 
> netstat -a | grep n1-ib
> 
> It is very important to note that changing the interface is not likely to
> change performance (or solve your startup problem). MVAPICH2 only uses the
> specified interface to exchange a few startup parameters over IP. After
> exchanging enough information to startup native IB connections, all
> further communication will go over the native IB layer -- not the IPoIB
> interface.
> 
> What is the problem you are experiencing on startup? That will allow us to
> better debug the problem.
> 
> Thanks,
> 
> Matthew Koop
> -
> Network-Based Computing Lab
> Ohio-State University
> 
> 
> 
> On Wed, 26 Jul 2006, Salvador Ramirez wrote:
> 
> 
>>Hello,
>>
>>    I recently downloaded and installed mvapich2 on a
>>cluster that has two connections among the nodes: gigabit
>>ethernet and infiniband. Each node has then two ip addresses
>>(one for each connection of course) related to obvious names
>>like n1 and n1-ib, n2 and n2-ib, et-cetera.
>>
>>    For the compilation I selected VAPI and everything
>>compiled without problems, so the successful installation
>>was on /usr/local/mvapich2. Then I created the file hostfile
>>like this:
>>
>>n1-ib
>>n2-ib
>>...
>>
>>    and then ran the mpdboot -n 8 -f hostfile. Everything
>>fine until here but then when I checked with mpdtrace -l I
>>see that the nodes are n1, n2, n3... with the IP address of
>>the gigE network. So I wonder why mpd choose this address
>>when in the hostfile the names are explicitly listed as
>>their corresponding IB address??
>>
>>    Of course this has further problems since when I try to
>>run a mpi program with mpiexec I received error message from
>>the vapi library since the address are not over IB.
>>
>>Any help is very appreciated. Thanks.
>>
>>---sram
>>
>>_______________________________________________
>>mvapich-discuss mailing list
>>mvapich-discuss at mail.cse.ohio-state.edu
>>http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
> 
> 
> 



More information about the mvapich-discuss mailing list