[mvapich-discuss] can't set up mpd ring between two nodes (fwd)
jetspeed
ibatis2 at 163.com
Thu Dec 27 07:17:01 EST 2007
Hi, huanwei:
Thanks for your reply.
1. There is no other mpd running on the same node, I mpdallexit before I start mpd.
2. there is .mpd.conf in the home directory, the two nodes share the $HOME by nfs. the content of the .mpd.conf is secretword=1111
I want to use the InfiniBand , so I installed OFED 1.2.5, I am not sure the InfiniBand is configured right, but the mpdcheck program between the two nodes works rightly.
I will check the README_MPICH2 you mentioned.
Thanks again,
On Wed, 26 Dec 2007 11:22:19 -0500 (EST)
wei huang <huanwei at cse.ohio-state.edu> wrote:
> Hi,
>
> Thanks for using mvapich2.
>
> > I installed mvapich2 , which is with the OFED 1.2.5.
>
> So do you use the default installation coming with the OFED package?
>
> > 1. when I use mpdboot on a machine, I got :
> > mpdboot_inode02 (handle_mpd_output 359): failed to ping mpd on inode02; recvd output={}
>
> There are multiple reasons which can cause this failure. But there are few
> things to check first:
>
> 1) Do you have other mpd running on the same set of nodes? (under the same
> user name)
>
> 2) Do you have .mpd.conf in your home directory?
>
> I also want to mention that we have already released mvapich2-1.0.1. You
> can try that by downloading the software package from our website:
>
> http://mvapich.cse.ohio-state.edu/
>
> There is a file called README_MPICH2 in the package. You can also read
> that for more details regarding set up mpd rings.
>
> Please let us know if this works.
>
> Thanks.
>
> -- Wei
>
> > 2. when I try to use mpd to set up mpd ring, as the user guide of mpich2:
> > mpd & on node02
> > mpd -h node02 -p port on node01
> > I got:
> > on node01: (the latter mpd)
> > inode01_33435 (connect_lhs 621): invalid challenge from inode02 32969: {}
> > inode01_33435 (enter_ring 566): lhs connect failed
> > inode01_33435 (run 233): failed to enter ring
> >
> > on node02: (the first mpd )
> >
> > inode02_32969: mpd_uncaught_except_tb handling:
> > exceptions.TypeError: sequence item 0: expected string, int found
> > /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpdlib.py 733 handle_ring_listener_connection
> > newsock.correctChallengeResponse = \
> > /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpdlib.py 488 handle_active_streams handler(stream,*args)
> > /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd 266 runmainloop
> > rv = self.streamHandler.handle_active_streams(timeout=8.0)
> > /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd 240 run
> > self.runmainloop()
> > /usr/mpi/gcc/mvapich2-0.9.8-15/bin/mpd 1344 ?
> > mpd.run()
> >
> >
> >
> > Has anyone encountered this problem?
> > Thanks in advance.
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
More information about the mvapich-discuss
mailing list