[mvapich-discuss] can't set up mpd ring between two nodes (fwd)

Matthew Koop koop at cse.ohio-state.edu
Thu Dec 27 11:01:56 EST 2007


Can you try changing your secret word in .mpd.conf to start with a letter
instead of a digit? (i.e. 'c1111' instead of '1111').

Thanks,

Matt

On Thu, 27 Dec 2007, jetspeed wrote:

> Hi, huanwei:
> 	Thanks for your reply.
> 	1.  There is no other mpd running on the same node, I mpdallexit before I start mpd.
> 	2.	  there is .mpd.conf in the home directory, the two nodes share the $HOME by nfs. the content of the .mpd.conf is     secretword=1111
>
> I want to use the InfiniBand , so I installed OFED 1.2.5,   I am not sure the InfiniBand is configured right, but the mpdcheck program between the two nodes works rightly.
>
> I will check the README_MPICH2 you mentioned.
> Thanks again,
>
>
> On Wed, 26 Dec 2007 11:22:19 -0500 (EST)
> wei huang <huanwei at cse.ohio-state.edu> wrote:
>
> > Hi,
> >
> > Thanks for using mvapich2.
> >
> > > 	I installed mvapich2 , which is with the OFED 1.2.5.
> >
> > So do you use the default installation coming with the OFED package?
> >
> > > 	1. when I use mpdboot on a machine, I got :
> > > 	  mpdboot_inode02 (handle_mpd_output 359): failed to ping mpd on inode02; recvd output={}
> >
> > There are multiple reasons which can cause this failure. But there are few
> > things to check first:
> >
> > 1) Do you have other mpd running on the same set of nodes? (under the same
> > user name)
> >
> > 2) Do you have .mpd.conf in your home directory?
> >
> > I also want to mention that we have already released mvapich2-1.0.1. You
> > can try that by downloading the software package from our website:
> >
> > http://mvapich.cse.ohio-state.edu/
> >
> > There is a file called README_MPICH2 in the package. You can also read
> > that for more details regarding set up mpd rings.
> >
> > Please let us know if this works.
> >
> > Thanks.
> >
> > -- Wei
> >
> > > 	2.  when I try to use mpd to set up mpd ring, as the user guide of mpich2:
> > > 			mpd &                       on node02
> > > 			mpd -h node02 -p port       on node01
> > > 	I got:
> > > on node01:  (the latter mpd)
> > > inode01_33435 (connect_lhs 621): invalid challenge from inode02 32969: {}
> > > inode01_33435 (enter_ring 566): lhs connect failed
> > > inode01_33435 (run 233): failed to enter ring
> > >
> > > on node02:  (the first mpd )
> > >
> > > inode02_32969: mpd_uncaught_except_tb handling:
> > >   exceptions.TypeError: sequence item 0: expected string, int found
> > >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpdlib.py  733  handle_ring_listener_connection
> > >         newsock.correctChallengeResponse = \
> > >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpdlib.py  488  handle_active_streams        handler(stream,*args)
> > >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd  266  runmainloop
> > >         rv = self.streamHandler.handle_active_streams(timeout=8.0)
> > >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd  240  run
> > >         self.runmainloop()
> > >     /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd  1344  ?
> > >         mpd.run()
> > >
> > >
> > >
> > > Has anyone encountered this problem?
> > > Thanks in advance.
> > >
> > >
> > > _______________________________________________
> > > mvapich-discuss mailing list
> > > mvapich-discuss at cse.ohio-state.edu
> > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list