[mvapich-discuss] No IB device found

Christopher Tanner christopher.tanner at gatech.edu
Mon Apr 14 08:56:54 EDT 2008


Matt -

All good suggestions. I nominally use a machinefile that excludes the  
master node from executing any processes unless the rest of the  
cluster is busy. However, the master node IS in the mpd ring. If I  
need to remove it from the mpd ring altogether... let me know.

Also, I don't have a command 'ibstat' located anywhere in the /usr/ 
local/bin, /usr/bin, or /etc/infiniband directories. The prefix during  
OFED install was /usr. I didn't install the 'ibutils' package b/c  
there was an error and it would not install - is the ibstat command in  
that package?

-------------------------------------------
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tanner at gatech.edu
-------------------------------------------



On Apr 13, 2008, at 11:41 PM, Matthew Koop wrote:
> Chris,
>
> Does the master node also have an IB card on it or not? Note that by
> default the node that you start the mpd ring from (probably the  
> master) is
> also included in the ring, even if it doesn't have an IB card. You can
> either specifically configure it to not put any processes on the first
> node, use a machine file, or just start the mpd ring from a compute  
> node.
>
> To use a machine file you'd start the ring normally and then create  
> a host
> file:
>
> e.g. in 'h':
> m1
> m1
> m2
> m2
>
> mpiexec -machinefile ./h ./exec
>
> Otherwise, on each node run the 'ibstat' command and verify that it  
> shows
> an HCA is installed.
>
> Let us know if these suggestions help,
>
> Matt
>
> On Sun, 13 Apr 2008, Christopher Tanner wrote:
>
>> All -
>>
>> I have installed OFED and mvapich2 on the master and all 16 nodes. I
>> have rebooted the master, followed by rebooting all of the nodes.
>> After starting up a mpd ring, I try to execute the OSU benchmarks to
>> receive:
>>
>> libibverbs: Fatal: couldn't read uverbs ABI version.
>> Fatal error in MPI_Init:
>> Other MPI error, error stack:
>> MPIR_Init_thread(259)...........: Initialization failed
>> MPID_Init(102)..................: channel initialization failed
>> MPIDI_CH3_Init(178).............:
>> MPIDI_CH3I_RMDA_init(115).......: rdma_get_control_parameters
>> rdma_get_control_parameters(432):
>> rdma_open_hca(367)..............: No IB device found
>> rank 3 in job 1  master.cl.ae.gatech.edu_41302   caused collective
>> abort of all ranks
>>   exit status of rank 3: return code 1
>>
>> I've posted this error before, however I simply received the solution
>> to "make sure you are loading your Infiniband kernel modules" or
>> something similar. Aside from performing the install and rebooting  
>> the
>> machines - how do I initialize the Infiniband stuff? I'm new to Linux
>> so I'm pretty clueless as to the commands / procedures to do this...
>>
>> Thanks for your help guys.
>>
>> -------------------------------------------
>> Chris Tanner
>> Space Systems Design Lab
>> Georgia Institute of Technology
>> christopher.tanner at gatech.edu
>> -------------------------------------------
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>



More information about the mvapich-discuss mailing list