[mvapich-discuss] mvapich2 and RDMA CM Address error

Bryan Putnam bfp at purdue.edu
Wed Nov 4 17:33:18 EST 2009


On Wed, 4 Nov 2009, Dhabaleswar Panda wrote:

> Hi Bryan,
> 
> I believe you are still using the MPD/mpiexec environment. We are going
> away from that environment because it is not scalable. Please start using
> the new mpirun-rsh-based framwork which provides fast and scalable job
> startup. Let us know if you see any issues when using the mpirun-rsh-based
> framework.

Hi DK,

Actually, I'm not using MPD/mpiexec, but am using mpiexec.hydra from the 
mpich2 distribution. The mpirun_rsh (from mvapich2) does not work on our 
Chelsio cluster. I had reported the mpirun_rsh problem earlier, but I'll 
go ahead and send that in again.

Thanks,
Bryan

> 
> Thanks,
> 
> DK
> 
> 
> On Wed, 4 Nov 2009, Bryan Putnam wrote:
> 
> > We've been seeing with both mvapich2-1.4rc2 and mvapich2-1.4 the following
> > error when attempting to use a large number of processors, for example
> > 256 processors; 8 processors on each of 32 nodes.
> >
> > coates-a029 1005% mpiexec -np 256 ./helloc
> > Hello from process 0!
> > Hello from process 1!
> > Hello from process 2!
> > Hello from process 3!
> > Hello from process 4!
> > Hello from process 5!
> > Hello from process 6!
> > Hello from process 7!
> > Hello from process 8!
> > Hello from process 9!
> > Hello from process 10!
> > Hello from process 11!
> > Hello from process 12!
> > [247] Abort: RDMA CM Address error: rdma cma event 1, error -110
> >  at line 341 in file rdma_cm.c
> >
> >
> > I just wanted to check and see if this was a know problem with a solution.
> >
> > Thanks!
> > Bryan
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> 
> 




More information about the mvapich-discuss mailing list