[mvapich-discuss] can't set up mpd ring between two nodes (fwd)

jetspeed ibatis2 at 163.com
Thu Dec 27 07:17:01 EST 2007


Hi, huanwei:
	Thanks for your reply.
	1.  There is no other mpd running on the same node, I mpdallexit before I start mpd.
	2.	  there is .mpd.conf in the home directory, the two nodes share the $HOME by nfs. the content of the .mpd.conf is     secretword=1111

I want to use the InfiniBand , so I installed OFED 1.2.5,   I am not sure the InfiniBand is configured right, but the mpdcheck program between the two nodes works rightly.

I will check the README_MPICH2 you mentioned.  
Thanks again, 


On Wed, 26 Dec 2007 11:22:19 -0500 (EST)
wei huang <huanwei at cse.ohio-state.edu> wrote:

> Hi,
> 
> Thanks for using mvapich2.
> 
> > 	I installed mvapich2 , which is with the OFED 1.2.5.
> 
> So do you use the default installation coming with the OFED package?
> 
> > 	1. when I use mpdboot on a machine, I got :
> > 	  mpdboot_inode02 (handle_mpd_output 359): failed to ping mpd on inode02; recvd output={}
> 
> There are multiple reasons which can cause this failure. But there are few
> things to check first:
> 
> 1) Do you have other mpd running on the same set of nodes? (under the same
> user name)
> 
> 2) Do you have .mpd.conf in your home directory?
> 
> I also want to mention that we have already released mvapich2-1.0.1. You
> can try that by downloading the software package from our website:
> 
> http://mvapich.cse.ohio-state.edu/
> 
> There is a file called README_MPICH2 in the package. You can also read
> that for more details regarding set up mpd rings.
> 
> Please let us know if this works.
> 
> Thanks.
> 
> -- Wei
> 
> > 	2.  when I try to use mpd to set up mpd ring, as the user guide of mpich2:
> > 			mpd &                       on node02
> > 			mpd -h node02 -p port       on node01
> > 	I got:
> > on node01:  (the latter mpd)
> > inode01_33435 (connect_lhs 621): invalid challenge from inode02 32969: {}
> > inode01_33435 (enter_ring 566): lhs connect failed
> > inode01_33435 (run 233): failed to enter ring
> >
> > on node02:  (the first mpd )
> >
> > inode02_32969: mpd_uncaught_except_tb handling:
> >   exceptions.TypeError: sequence item 0: expected string, int found
> >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpdlib.py  733  handle_ring_listener_connection
> >         newsock.correctChallengeResponse = \
> >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpdlib.py  488  handle_active_streams        handler(stream,*args)
> >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd  266  runmainloop
> >         rv = self.streamHandler.handle_active_streams(timeout=8.0)
> >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd  240  run
> >         self.runmainloop()
> >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd  1344  ?
> >         mpd.run()
> >
> >
> >
> > Has anyone encountered this problem?
> > Thanks in advance.
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> 



More information about the mvapich-discuss mailing list