[mvapich-discuss] mvapich2 and RDMA CM Address error

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Nov 4 20:52:21 EST 2009


On Wed, Nov 04, 2009 at 04:09:47PM -0500, Bryan Putnam wrote:
> We've been seeing with both mvapich2-1.4rc2 and mvapich2-1.4 the following 
> error when attempting to use a large number of processors, for example 
> 256 processors; 8 processors on each of 32 nodes.
> 
> coates-a029 1005% mpiexec -np 256 ./helloc
> Hello from process 0!
> Hello from process 1!
> Hello from process 2!
> Hello from process 3!
> Hello from process 4!
> Hello from process 5!
> Hello from process 6!
> Hello from process 7!
> Hello from process 8!
> Hello from process 9!
> Hello from process 10!
> Hello from process 11!
> Hello from process 12!
> [247] Abort: RDMA CM Address error: rdma cma event 1, error -110
>  at line 341 in file rdma_cm.c
> 
> 
> I just wanted to check and see if this was a know problem with a solution.

This doesn't look like a known issue.  Is there a specific node that
isn't functioning correctly?  I see this error sometimes when opensm
isn't running or the /etc/mv2.conf file doesn't point to the ip address
of the interface that you're trying to use.  Maybe you can try using
subsets of your nodes to isolate where the problem is located.

> 
> Thanks!
> Bryan
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20091104/4caadc44/attachment.bin


More information about the mvapich-discuss mailing list