[mvapich-discuss] MPD related error
yogeshwar sonawane
yogyas at gmail.com
Sat Jul 5 04:23:47 EDT 2008
Hi all,
I am trying to run 64 processes using MVAPICH2-1.0.1-uDAPL on 8 nodes.
Every node has 8 cores/cpus.
Out of 64, sometimes one or more processes gets killed or closed. The
node on which there are less than 8 processes running has following
message which comes in /var/log/messages file :-
Jul 4 13:23:05 pn02 mpdman: pn02_mpdman_12: mpd_uncaught_except_tb
handling: exceptions.AttributeError: 'int'
object has no attribute 'send_dict_msg'
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 652
handle_lhs_input self.ring.rhsSock.send_dict_msg(msg)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdlib.py 743
handle_active_streams handler(stream,*args)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 481 run
rv = self.streamHandler.handle_active_streams(timeout=5.0)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1408
launch_mpdman_via_fork mpdman.run()
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1325
run_one_cli (manPid,toManSock) =
self.launch_mpdman_via_fork(msg,man_env)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1199
do_mpdrun self.run_one_cli(lorank,msg)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 854
handle_lhs_input self.do_mpdrun(msg) /home/htdg
Can anybody give me some more info about this ?
Is this some kind of setup/settings issue on nodes ?
Thanks,
Yogeshwar
More information about the mvapich-discuss
mailing list