[mvapich-discuss] mvapich2 and RDMA CM Address error

Dhabaleswar Panda panda at cse.ohio-state.edu
Wed Nov 4 16:39:54 EST 2009


Hi Bryan,

I believe you are still using the MPD/mpiexec environment. We are going
away from that environment because it is not scalable. Please start using
the new mpirun-rsh-based framwork which provides fast and scalable job
startup. Let us know if you see any issues when using the mpirun-rsh-based
framework.

Thanks,

DK


On Wed, 4 Nov 2009, Bryan Putnam wrote:

> We've been seeing with both mvapich2-1.4rc2 and mvapich2-1.4 the following
> error when attempting to use a large number of processors, for example
> 256 processors; 8 processors on each of 32 nodes.
>
> coates-a029 1005% mpiexec -np 256 ./helloc
> Hello from process 0!
> Hello from process 1!
> Hello from process 2!
> Hello from process 3!
> Hello from process 4!
> Hello from process 5!
> Hello from process 6!
> Hello from process 7!
> Hello from process 8!
> Hello from process 9!
> Hello from process 10!
> Hello from process 11!
> Hello from process 12!
> [247] Abort: RDMA CM Address error: rdma cma event 1, error -110
>  at line 341 in file rdma_cm.c
>
>
> I just wanted to check and see if this was a know problem with a solution.
>
> Thanks!
> Bryan
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list